google-bigquery

BigQuery giving error of expression not grouped by when order by is included in the sql statement


I am running a simple query in BigQuery.

SELECT
    p.title,
    p.id,
    SUM(view) AS Views,
    CAST(EXTRACT(DATE FROM pv.data_tstamp) AS STRING) AS dateInserted
FROM products AS p
    LEFT JOIN productViews AS pv
        ON p.sku = pv.sku
WHERE
    pv.data_tstamp >= TIMESTAMP('2022-06-20 00:00:00')
AND pv.data_tstamp <= TIMESTAMP('2022-06-29 00:00:00') 
GROUP BY
    p.title,
    p.id,
    CAST(EXTRACT(DATE FROM pv.data_tstamp) AS STRING)
ORDER BY
    p.Id

I am getting an error:

SELECT list expression references pv.data_tstamp which is neither grouped nor aggregated at [5:28]

pv.data_stamp is grouped by as can be seen in the syntax.

If I remove order by p.id it works fine, but I need the query to sort the result by Id.


Solution

  • From the BQ ORDER BY doc:

    Column aliases from a FROM clause or SELECT list are allowed.
    If a query contains aliases in the SELECT clause, those aliases override names in the corresponding FROM clause.The data type of expression must be orderable.

    In your case you have set dateInserted as the alias name of the functional output but then again in the group by clause you have used the same functional expression instead of the alias name.
    This will work when you don't have any order by clause , but as soon as you use order by the query will only recognize the either selected column names or their alias names.

    Thus in the Select statement of your query the pv.data_tstamp column name is no longer available as it has been overridden. Only title, id, views, dateInserted (and also the other columns which are not selected)column names are visible.

    To remove the error, you need to either use ordinals or alias name in group by and order by clause

    Eg:

    SELECT
        p.title,
        p.id,
        SUM(view) Views,
        CAST(EXTRACT(DATE from pv.data_tstamp) as string) dateInserted
    FROM  products  p
    LEFT JOIN  productViews  pv
        ON p.sku=pv.sku WHERE
        pv.data_tstamp >= TIMESTAMP('2022-06-20 00:00:00') AND
        pv.data_tstamp  <= TIMESTAMP('2022-06-29 00:00:00') 
        GROUP BY p.title, p.id,dateInserted  -- replacing with ordinals will also work
        order by p.Id
    
    

    You can play with this query to check what are the ways you can use both clauses together,