databasesqlitegroup-byconditional-aggregation

Summarize a column with multiple boolean columns


I have a sqlite database containing a table who looks like this, to store many to many relation :

ID_1 ID_2 TOOL
1 2 x
1 2 y
1 3 x
1 3 z
2 3 x
2 3 y
2 3 z

The two first columns contains ID of pairs of features, and the last one contains the name a tool that is related to these feature. A same pair of features can be associated with multiple tools.

I'm looking for a set of query to summarize this table so it looks like this, without modifying the database structure.

ID_1 ID_2 x y z
1 2 1 1 0
1 3 1 0 0
2 3 1 1 1

Displaying a new column for each unique TOOL name, containing 0 if the tool is not associated with the pair of ID, and 1 if it is. And having only one row per pair of ID.

Is there a combination of command that allow to do that ?

I tried using multiple CASE WHERE statements, but I'm stuck with this kind of results (here with a simplified example) :

CREATE TABLE IF NOT EXISTS test_table (
     ID_1 integer NOT NULL,
     ID_2 integer NOT NULL,
     tool string NOT NULL
)

INSERT INTO test_table (ID_1, ID_2, tool)
VALUES
    (1,2,"x"), (1,2,"y"), (1,3,"x"), (1,3,"z"), (2,3,"x"), (2,3,"y"), (2,3,"z")
    
    
SELECT *,
    CASE WHEN tool == "x" THEN 1 ELSE 0 END AS x,
    CASE WHEN tool == "y" THEN 1 ELSE 0 END AS y,
    CASE WHEN tool == "z" THEN 1 ELSE 0 END AS z
FROM test_table
ID_1 ID_2 x y z
1 2 1 0 0
1 2 0 1 0
1 3 1 0 0
1 3 0 0 1
2 3 1 0 0
2 3 0 1 0
2 3 0 0 1

Solution

  • You can use conditional aggregation:

    SELECT ID_1, ID_2,
           MAX(CASE WHEN tool = 'x' THEN 1 ELSE 0 END) x,
           MAX(CASE WHEN tool = 'y' THEN 1 ELSE 0 END) y,
           MAX(CASE WHEN tool = 'z' THEN 1 ELSE 0 END) z
    FROM test_table
    GROUP BY ID_1, ID_2;
    

    which, for SQLite, can be simplified:

    SELECT ID_1, ID_2,
           MAX(tool = 'x') x,
           MAX(tool = 'y') y,
           MAX(tool = 'z') z
    FROM test_table
    GROUP BY ID_1, ID_2;
    

    See the demo.