sqlitesubquerylimit

SQLite: how to LIMIT a subquery for GROUP_CONCAT


I have a subquery that feeds a GROUP_CONCAT. All works well except the LIMIT, which does not have the effect I want.

Here is the code:

CREATE TABLE IF NOT EXISTS Animal (
    id  TEXT    PRIMARY KEY,
    diet    TEXT
    );
INSERT INTO Animal VALUES ('monkey',    'herbivore');
INSERT INTO Animal VALUES ('goat',  'herbivore');
INSERT INTO Animal VALUES ('cat',   'carnivore');
INSERT INTO Animal VALUES ('dog',   'omnivore');
INSERT INTO Animal VALUES ('human', 'omnivore');

CREATE TABLE IF NOT EXISTS Domicile (
    id  TEXT    PRIMARY KEY REFERENCES Animal,
    home    TEXT
    );

INSERT INTO Domicile VALUES ('monkey',  'jungle');
INSERT INTO Domicile VALUES ('goat',    'farmyard');
INSERT INTO Domicile VALUES ('cat', 'house');
INSERT INTO Domicile VALUES ('dog', 'garden');
INSERT INTO Domicile VALUES ('human',   'house');

CREATE TABLE IF NOT EXISTS Food (
    name    TEXT    PRIMARY KEY
    );

INSERT INTO Food VALUES ('fruit');
INSERT INTO Food VALUES ('meat');
INSERT INTO Food VALUES ('fish');
INSERT INTO Food VALUES ('bread');
INSERT INTO Food VALUES ('sausages');

CREATE TABLE IF NOT EXISTS Eats (
    id  TEXT    REFERENCES Animal,
    name    TEXT    REFERENCES Food
    );

INSERT INTO Eats VALUES ('monkey',  'fruit');
INSERT INTO Eats VALUES ('monkey',  'bread');
INSERT INTO Eats VALUES ('goat',    'fruit');
INSERT INTO Eats VALUES ('cat',     'meat');
INSERT INTO Eats VALUES ('cat',     'sausages');
INSERT INTO Eats VALUES ('dog',     'meat');
INSERT INTO Eats VALUES ('dog',     'bread');
INSERT INTO Eats VALUES ('dog',     'sausages');
INSERT INTO Eats VALUES ('human',   'fruit');
INSERT INTO Eats VALUES ('human',   'meat');
INSERT INTO Eats VALUES ('human',   'fish');
INSERT INTO Eats VALUES ('human',   'bread');
INSERT INTO Eats VALUES ('human',   'sausages');

.mode column

SELECT 'TEST 4: JOIN Eats using a subquery to order the names.';
SELECT      Animal.id,
        Animal.diet,
        GROUP_CONCAT (Eats.name) as names,
        Domicile.home
FROM        Animal
        INNER JOIN Domicile USING (id)
        INNER JOIN (
            SELECT * FROM Eats
            ORDER BY name
            ) as Eats USING (id)
GROUP BY    Animal.id
ORDER BY    Animal.id
;
SELECT 'TEST 4 result: names in order, but not limited to 3.';

SELECT 'TEST 5: LIMIT Eats subquery to 3.';
SELECT      Animal.id,
        Animal.diet,
        GROUP_CONCAT (Eats.name) as names,
        Domicile.home
FROM        Animal
        INNER JOIN Domicile USING (id)
        INNER JOIN (
            SELECT * FROM Eats
            ORDER BY name
            LIMIT 3
            ) as Eats USING (id)
GROUP BY    Animal.id
ORDER BY    Animal.id
;
SELECT 'TEST 5 result: LIMIT has applied to wrong SELECT?.';

What I want is output like this:

id      diet       names                home
------  ---------  -------------------  --------
cat     carnivore  meat,sausages        house
dog     omnivore   bread,meat,sausages  garden
goat    herbivore  fruit                farmyard
human   omnivore   bread,fish,fruit     house
monkey  herbivore  bread,fruit          jungle

Notes:

  1. Rows ordered by 'id'.
  2. Not every possible combination is present in Eats. (eg goat only eats fruit).
  3. 'names' column in alphabetical order and limited to 3 elements (else 'human' would be bread,fish,fruit,meat,sausages).

What I get from the two tests is this:

'TEST 4: JOIN Eats using a subquery to order the names.'
--------------------------------------------------------
TEST 4: JOIN Eats using a subquery to order the names.  
id      diet       names                           home    
------  ---------  ------------------------------  --------
cat     carnivore  meat,sausages                   house   
dog     omnivore   bread,meat,sausages             garden  
goat    herbivore  fruit                           farmyard
human   omnivore   bread,fish,fruit,meat,sausages  house   
monkey  herbivore  bread,fruit                     jungle  
'TEST 4 result: names in order, but not limited to 3.'
------------------------------------------------------
TEST 4 result: names in order, but not limited to 3.  
'TEST 5: LIMIT Eats subquery to 3.'
-----------------------------------
TEST 5: LIMIT Eats subquery to 3.  
id      diet       names  home  
------  ---------  -----  ------
dog     omnivore   bread  garden
human   omnivore   bread  house 
monkey  herbivore  bread  jungle
'TEST 5 result: LIMIT has applied to wrong SELECT?.'
----------------------------------------------------
TEST 5 result: LIMIT has applied to wrong SELECT?.  

The only difference between the source of tests 4 and 5 is the LIMIT in the test 5 subquery.

Is there a way to limit the names to, say, 3 elements? (I would want the limit to be easily changeable.)


Solution

  • I believe the following is a solution (that is hopefully relatively easy to understand):-

    WITH
        cte1 AS (
            SELECT 
                animal.*,
                domicile.home,
                eats.name 
            FROM animal 
            JOIN eats ON eats.id = animal.id 
            JOIN domicile ON domicile.id = animal.id
            ORDER BY animal.id,eats.id
            ),
        cte2 AS (
            SELECT *, (
                SELECT count(*) FROM cte1 AS cte1_c 
                WHERE cte1_c.name <= cte1.name AND cte1_c.id = cte1.id ORDER BY id,name
            ) AS c 
            FROM cte1
        )
    SELECT id,diet,group_concat(name) AS names,home FROM cte2 WHERE c <= 3 GROUP BY id;
    

    Which results in:-

    enter image description here

    How many names is easily changed by altering the number in the final WHERE clause WHERE c <= 3

    e.g. amending the data where your first select produces:-

    enter image description here

    Using the solution but with WHERE c <= 5 then:-

    enter image description here

    Using 1 results in:-

    enter image description here

    The flaw is if using less than 1 in which case no data will be output.

    Additional

    This is very much the same as the answer but amended to:-

    1. hopefully make it easier to understand (comments and name of the subquery for the count changed), and
    2. to correctly order the food names for the group_concat

    :-

    WITH
        cte1 AS (
            SELECT 
                animal.*,
                domicile.home,
                eats.name 
            FROM animal 
            JOIN eats ON eats.id = animal.id 
            JOIN domicile ON domicile.id = animal.id
            ORDER BY animal.id,eats.id
            ),
        cte2 AS (
            SELECT 
                *, 
                /* Use a subquery to ascertain the number (count) of names in cte1 that 
                    a) have the the same name as the name being processed, OR 
                    c) a name that is less than the current name being processed, AND
                    b) have the same id as the currrent id being processed 
                */
                (
                    SELECT 
                        count(*) 
                        FROM cte1 AS query_of_all_of_cte1
                    WHERE 
                        query_of_all_of_cte1.name <= cte1.name /* cte1.name is the name from the current row being processed */ 
                    AND 
                        query_of_all_of_cte1.id = cte1.id /* cte1.id is the id from the current row being processed */
                
                ) 
                /* the column name rather than defaulting to count(*) is aliaseed as c */
                    AS c 
            FROM cte1 ORDER BY id,name
        )
    SELECT id,diet,group_concat(name) AS names,home FROM cte2 WHERE c <= 3 GROUP BY id;