pythonsqlsqlite

Dynamic Python SQL Statement with column names from a value of a Database


I have a table with names, year and a score/count:

 ("Andi", 2020, 40),
 ("Andi", 2021, 60),
 ("Andi", 2022, 55),
 ("Andi", 2023, 55),
 ("Alex", 2020, 14),
 ("Alex", 2021, 1),
 ("Alex", 2022, 13),
 ("Alex", 2023, 13),
 ...

I try to zip that into an Excel file with writefile() etc. And it should look like this

NAME 2020 2021 2022 2023 TOTAL
Andi 40 60 55 55 210
Alex 14 1 13 13 41

And ofc next year they will be new records added: ("Andi",2024,49), ("Alex",2024,10),... So the sqlstament should be also add that when I recreate the Excel file again next year. I have found this: How to select Column value as Column name with conditions in SQL table which is hardcoded and works how I would want it but I dont want to change it every year. My Database also starts at 2016, so that would be a very long statment. I tried to make for loops for select, from, on,... but I think there could be an easier way?

Any ideas how I can achieve that? Thank you


Solution

  • You can use a SQL query like below to pivot the columns names as year.

    SELECT name, 
    MAX(CASE WHEN year = 2020 THEN score END) AS '2020', 
    MAX(CASE WHEN year = 2021 THEN score END) AS '2021',
     MAX(CASE WHEN year = 2022 THEN score END) AS '2022',
     MAX(CASE WHEN year = 2023 THEN score END) AS '2023',
     MAX(CASE WHEN year = 2024 THEN score END) AS '2024', 
     COALESCE(MAX(CASE WHEN year = 2020 THEN score END), 0)+COALESCE(MAX(CASE WHEN year = 2021 THEN score END), 0)+COALESCE(MAX(CASE WHEN year = 2022 THEN score END), 0)+COALESCE(MAX(CASE WHEN year = 2023 THEN score END), 0)+COALESCE(MAX(CASE WHEN year = 2024 THEN score END), 0) AS total
     FROM scores
     GROUP BY name
     ORDER BY name;
    

    This can be dynamically generated as when new years get added in.

    # Get all the years
    cursor.execute("SELECT DISTINCT year FROM scores ORDER BY year")
    years = cursor.fetchall()
    
    # Generate the pivot columns as years
    columns = []
    for year in years:
        columns.append(f"MAX(CASE WHEN year = {year[0]} THEN score END) AS '{year[0]}'")
    
    columns_str = ", ".join(columns)
    
    
    
    sql = f"""
    SELECT name, {columns_str}, 
           {"+".join([f"COALESCE(MAX(CASE WHEN year = {year[0]} THEN score END), 0)" for year in years])} AS total
    FROM scores
    GROUP BY name
    ORDER BY name;
    """
    
    
    cursor.execute(sql)
    result = cursor.fetchall()
    
    df = pd.DataFrame.from_records(result, columns=['name'] + [str(year[0]) for year in years] + ['total'])
    print(df)
    

    Output

       name  2020  2021  2022  2023  2024  total
    0  Alex    14     1    13    13    10     51
    1  Andi    40    60    55    55    49    259