I am aiming of finding the most frequent value and group them by user id using BigQuery. It should be able to count the number of language used for each user id and result should return the language that is highest. However, I found error saying that
No matching signature for aggregate function AVG for argument types: STRING. Supported signatures: AVG(INT64); AVG(FLOAT64); AVG(NUMERIC) at [3:5]
This is my code:
SELECT * FROM(
SELECT COUNT(*) AS cnt,
AVG(Language) AS mean,
APPROX_TOP_COUNT(Language, 1)[OFFSET(0)].value AS most_frequent_value
FROM `language`
WHERE Language IS NOT NULL
GROUP BY User_ID);
What should I change so the result will return the value of language preferred by the each user ID?
Stored producer:
CASE
WHEN Preferred_Language in ('EN', 'English') THEN 'EN'
ELSE Preferred_Language
END AS Preferred_Language,
Below is for BigQuery Standard SQL
#standardSQL
SELECT
User_ID,
ARRAY_AGG(Language ORDER BY cnt DESC LIMIT 1)[OFFSET(0)] most_frequent_language
FROM (
SELECT
User_ID,
Language,
COUNT(*) AS cnt
FROM `project.dataset.language`
WHERE Language IS NOT NULL
GROUP BY User_ID, Language
)
GROUP BY User_ID