I have a sheet with two columns (MONTH and NAME) and I can easily make a table of names by month, setting Dimension = MONTH and Expression = COUNT(NAME).
Now I want to calculate mean and standard deviation of names for month (i.e. over the rows of the first paragraph's table). How can I do it?
I actually tried nothing, because I don't know how could I try. But the example below shows what I expect.
This is my sheet:
Month | Name |
---|---|
January | Albert |
January | Barbara |
January | Charlie |
January | Dana |
February | Fred |
March | Harry |
March | Albert |
April | Jane |
April | Barbara |
April | Kurt |
April | Fred |
April | Larry |
April | Monica |
May | Charlie |
May | Monica |
May | Paul |
This is the simple table of names (expression = COUNT(names)) by month (dimension):
Month | COUNT(Names) |
---|---|
January | 4 |
February | 1 |
March | 2 |
April | 6 |
May | 3 |
And I want to get the results mean = (4+1+2+6+3)/5 = 3.2 and st.deviation = 1.92.
I tried a table with the expression Avg(Aggr(COUNT(Names), Month)) and it worked. But when I add a line with Month = July and Name = Albert ir returned again (4+1+2+6+3+1)/6 = 2.83, while I expected (4+1+2+6+3+0+7)/7 = 2.43, since we have 0 names for June.
Same for st.deviation.
The main issue is that there is no June
value. So if you want the calculation to include it then you'll have to add it to the dataset.
In the script below Im adding an additional table that joins to the raw data. The new table contains all possible months. This way the table now operates on the full list of months (including June
) and the avg now is 2.43
RawData:
Load * Inline [
Month, Name
January, Albert
January, Barbara
January, Charlie
January, Dana
February, Fred
March, Harry
March, Albert
April, Jane
April, Barbara
April, Kurt
April, Fred
April, Larry
April, Monica
May, Charlie
May, Monica
May, Paul
July, Albert
];
// or join back to the table if we dont want an extra Months table
// join
Months:
Load * Inline [
Month
January,
February,
March,
April,
May,
June,
July
];