Imagine I have a table with purchases, where I see id of customer, id of store where he bought something (I see other cols like prices, dates and others but not that important now). I have second table where I have every customers id and a sign if the customer is VIP. I want to see what is the % of VIP customers in stores. Now, I write a query like this:
select
p.store_id,
count(distinct p.customer_id) as count_customers,
avg(v.is_vip) as percentage_vip
from schema.purchases p
left join (
select
customer_id,
is_vip
from schema.vip_customers
where data_year = 2024
) v on v.customer_id = p.customer_id
where purchase_date between '2024-1-1' and '2024-12-31'
and p.purchase_ok = 1
group by 1
For each store, only 1 or 0 is returned as AVG, even though I can see there are stores with variablity when I do not group the table or when I group it on the is_vip column. I just cant see why AVG does not work as expected (my expectations, but i guess, when you put some amounts of 0s and different amount of 1s in this function, i expect it to return something else then 1 or 0).
EDIT: Tim added perfect answer and it was really just matter of casting - the column was casted as integer, so it would really return only 1 or 0. When casted to decimal, it works as expected. See answers.
Assuming the column is_vip
be integer, then the average calculation, if also operating in integer mode, would return only 0 or 1 after truncation of the decimal component. One way to get around this might be to cast to some numeric type with a decimal component, e.g.
AVG(CAST(v.is_vip AS DECIMAL(10,2)) AS percentage_vip