sqlgroup-by

SQL basic problem when trying to SELECT 4 atributes and use GROUP BY with only 2


The issue arises from the use of an aggregate function (COUNT) on one column (ct.Language) and selecting another column (cb.Population) without applying an aggregate function to it or including it in the GROUP BY clause. This violates SQL syntax rules where all non-aggregated columns in the SELECT clause must be included in the GROUP BY clause unless they are being used in an aggregate function. The question is how to include cb.Population in the query's output without violating SQL syntax rules.

Here is the code:

DROP VIEW v1;
CREATE VIEW v1 AS 
    SELECT  
        cb.Name AS Nombre_Pais, 
        COUNT(ct.Language) AS Cantidad_Idiomas_Oficiales,
        cb.Population AS Poblacion_Total, 
        SUM(ct.Percentage) AS Porcentaje_Poblacion_Idioma_Oficial 
    FROM  
        country_backup cb  
    JOIN  
        countrylanguage_backup ct ON cb.Code = ct.CountryCode 
    WHERE  
        ct.IsOfficial = 'T' 
    GROUP BY 
        cb.Name, cb.Population
    HAVING  
        COUNT(ct.Language) > 3; 

I attempted to modify the view v1 by including the cb.Population column in the SELECT clause without aggregating it or including it in the GROUP BY clause. I expected to provide a solution to the problem by including cb.Population in the query's output while adhering to SQL syntax rules. However, this approach is invalid because it violates the rules of SQL syntax, which require all non-aggregated columns in the SELECT clause to be included in the GROUP BY clause. Therefore, I need to explore alternative approaches to achieve the desired result.


Solution

  • Oracle syntax used

    Since there are no sample data nor expected result it is hard to tell what is this all about.

    We can just speculate about it and repeat some basic things about Group By clause and aggregations in SQL.

    To do it lets suppose that your sample data could be like this...

    /*  country_backup
    CODE  NAME         POPULATION
    ----  -----------  ----------
     101  Name 101            312
     102  Name 102            274
     103  Name 103            198
     104  Name 104            290
     105  Name 105            312   */
    
     /*  country_language_backup
    COUNTRY_CODE  LANGUAGE  PERCENTAGE  IS_OFFICIAL
    ------------  --------  ----------  -----------
             101  English         32.5  T
             101  French          12.0  T
             102  English         37.3  T
             102  German          22.5  T
             102  Italian         12.2  T
             103  English         62.5  T
             104  English         21.4  T
             104  German          10.0  T
             105  English         72.5  T           */
    

    The result of Group By clause is 1 row per column(s) listed in the clause.

    Select   LANGUAGE 
    From     country_language_backup 
    Group By LANGUAGE
    
    /*  Result:
    LANGUAGE
    --------
    German
    English
    French
    Italian     */
    

    ... or - with the data from both tables ...

    Select   cb.NAME, ct.LANGUAGE 
    From     country_backup cb
    Inner Join country_language_backup ct ON( cb.CODE = ct.COUNTRY_CODE )
    Where    cb.CODE = 102
    Group By cb.NAME, ct.LANGUAGE
    Order By cb.NAME, ct.LANGUAGE
    
    /*  R e s u l t :
    NAME     LANGUAGE
    -------- --------
    Name 102 English
    Name 102 German 
    Name 102 Italian   */
    

    If there is no LANGUAGE in Group By clause - the query will try to fetch 1 row for 102, 'Name 102' as CODE and NAME.
    If you don't aggregate LANGUAGE column then the engine could not know which of three values to fetch with that one CODE, NAME row.
    You aggregated the LANGUAGE column with Count() and there are no problems.

    Select   cb.NAME, Count(ct.LANGUAGE) "CANTIDAD_IDIOMAS_OFICIALES" 
    From     country_backup cb
    Inner Join country_language_backup ct ON( cb.CODE = ct.COUNTRY_CODE )
    Where    cb.CODE = 102
    Group By cb.NAME
    Order By cb.NAME
    
    /*    R e s u l t :
    NAME     CANTIDAD_IDIOMAS_OFICIALES
    -------- --------------------------
    Name 102                          3 */
    

    Now about POPULATION column - if your data is like my sample data from above (table country_backup where 1 CODE has 1 POPULATION value) then there is no reason not to include POPULATION column in Group By list.
    If CODE column values in table country_backup are not unique you should aggregate POPULATION column.
    So, your query (with the above sample data) may stay just as it is...

    Select      cb.NAME "NOMBRE_PAIS", 
                Count(ct.LANGUAGE) "CANTIDAD_IDIOMAS_OFICIALES",
                cb.POPULATION "POBLACION_TOTAL", 
                Sum(ct.PERCENTAGE) "PORCENTAJE_POBLACION_IDIOMA_OFICIAL" 
    From        country_backup cb  
    Inner Join  country_language_backup ct ON( cb.CODE = ct.COUNTRY_CODE )
    Where       ct.IS_OFFICIAL = 'T' 
    Group BY    cb.NAME, cb.POPULATION
    Having      Count(ct.LANGUAGE) > 2;
    

    ...

    /*  R e s u l t :  
    NOMBRE_PAIS CANTIDAD_IDIOMAS_OFICIALES POBLACION_TOTAL PORCENTAJE_POBLACION_IDIOMA_OFICIAL
    ----------- -------------------------- --------------- -----------------------------------
    Name 102                             3             274                                  72  */