sqlpostgresqlpostgresql-9.6postgresql-jsonarray-agg

Adding LIMIT to ARRAY_TO_JSON or ARRAY_AGG


In a 2-player game using PostgreSQL 9.6.6 as backend I have defined the following custom stored function for retrieving chat messages for a user:

CREATE OR REPLACE FUNCTION words_get_user_chat(
                in_uid integer
        ) RETURNS jsonb AS
$func$
        SELECT COALESCE(
                JSONB_OBJECT_AGG(gid, ARRAY_TO_JSON(y)),
                '{}'::jsonb
        ) FROM (
                SELECT  c.gid,
                        ARRAY_AGG(
                                JSON_BUILD_OBJECT(
                                        'created', EXTRACT(EPOCH FROM c.created)::int,
                                        'uid',     c.uid,
                                        'msg',     c.msg
                                )
                                ORDER BY c.created ASC
                        ) AS y
                FROM      words_chat c
                LEFT JOIN words_games g
                USING     (gid)
                WHERE     in_uid in (g.player1, g.player2)
                AND       (g.finished IS NULL OR g.finished > CURRENT_TIMESTAMP - INTERVAL '1 day')
                GROUP BY  c.gid
                /* LIMIT 10 */
        ) AS x;

$func$ LANGUAGE sql STABLE;

It joins words_games and words_chat tables and produces the following JSON-object (with game id "9" as a string key) holding a JSON-array with messages:

 # select words_get_user_chat(6);



                                                                                                                   words_get_user_chat                        




--------------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------
 {"9": [{"msg": "test2", "uid": 6, "created": 1516445342}, {"msg": "test3", "uid": 6, "created": 1516445358}, {"msg": "test4", "uid": 6, "created": 1516445369
}, {"msg": "test5", "uid": 6, "created": 1516445381}, {"msg": "test6", "uid": 6, "created": 1516445405}, {"msg": "test7", "uid": 6, "created": 1516445415}, {"
msg": "test8", "uid": 6, "created": 1516445508}, {"msg": "test9", "uid": 6, "created": 1516445539}, {"msg": "test10", "uid": 6, "created": 1516445743}, {"msg"
: "test11", "uid": 6, "created": 1516445989}, {"msg": "test12", "uid": 6, "created": 1516446101}, {"msg": "test13", "uid": 6, "created": 1516446125}, {"msg": 
"test14", "uid": 6, "created": 1516446145}, {"msg": "test15", "uid": 6, "created": 1516446227}, {"msg": "test16", "uid": 6, "created": 1516446652}, {"msg": "t
est17", "uid": 6, "created": 1516446999}, {"msg": "test18", "uid": 6, "created": 1516447168}, {"msg": "test19", "uid": 6, "created": 1516447229}, {"msg": "tes
t20", "uid": 6, "created": 1516447493}, {"msg": "test21", "uid": 6, "created": 1516447532}, {"msg": "test22", "uid": 6, "created": 1516447555}, {"msg": "test2
3", "uid": 6, "created": 1516448017}, {"msg": "test24", "uid": 6, "created": 1516448062}]}
(1 row)

This works well, but I would like to add LIMIT 10 to the number of array elements - as a measure against chat flooding.

I have tried adding it to the function (please see the commented line above), but it had no effect.

Could you please suggest the correct spot for LIMIT 10?

I send the JSON-objects over WebSockets to Android app and would like to prevent malicious users from blowing up the size of such objects by flooding chats.

UPDATE:

I am trying Mike's suggestion:

CREATE OR REPLACE FUNCTION words_get_user_chat(
                in_uid integer
        ) RETURNS jsonb AS
$func$
        SELECT COALESCE(
                JSONB_OBJECT_AGG(gid, ARRAY_TO_JSON(y)),
                '{}'::jsonb
        ) FROM (
                SELECT  c.gid,
                        ROW_NUMBER() OVER (PARTITION BY c.gid) AS rn,
                        ARRAY_AGG(
                                JSON_BUILD_OBJECT(
                                        'created', EXTRACT(EPOCH FROM c.created)::int,
                                        'uid',     c.uid,
                                        'msg',     c.msg
                                )
                                ORDER BY c.created ASC
                        ) AS y
                FROM      words_chat c
                LEFT JOIN words_games g
                USING     (gid)
                WHERE     in_uid in (g.player1, g.player2)
                AND       (g.finished IS NULL OR g.finished > CURRENT_TIMESTAMP - INTERVAL '1 day')
                AND       rn < 10
                GROUP BY  c.gid
        ) AS x;

$func$ LANGUAGE sql STABLE;

but unfortunately get the syntax error:

ERROR:  42703: column "rn" does not exist
LINE 24:                 AND       rn < 10
                                   ^
LOCATION:  errorMissingColumn, parse_relation.c:3194

UPDATE 2:

Here are the both tables I use, sorry for not including that info before -

#\d words_chat
                                   Table "public.words_chat"
 Column  |           Type           |                        Modifiers                         
---------+--------------------------+----------------------------------------------------------
 cid     | bigint                   | not null default nextval('words_chat_cid_seq'::regclass)
 created | timestamp with time zone | not null
 gid     | integer                  | not null
 uid     | integer                  | not null
 msg     | text                     | not null
Indexes:
    "words_chat_pkey" PRIMARY KEY, btree (cid)
Foreign-key constraints:
    "words_chat_gid_fkey" FOREIGN KEY (gid) REFERENCES words_games(gid) ON DELETE CASCADE
    "words_chat_uid_fkey" FOREIGN KEY (uid) REFERENCES words_users(uid) ON DELETE CASCADE

# \d words_games
                                   Table "public.words_games"
  Column  |           Type           |                         Modifiers                         
----------+--------------------------+-----------------------------------------------------------
 gid      | integer                  | not null default nextval('words_games_gid_seq'::regclass)
 created  | timestamp with time zone | not null
 finished | timestamp with time zone | 
 player1  | integer                  | not null
 player2  | integer                  | 
 played1  | timestamp with time zone | 
 played2  | timestamp with time zone | 
 state1   | text                     | 
 state2   | text                     | 
 hint1    | text                     | 
 hint2    | text                     | 
 score1   | integer                  | not null
 score2   | integer                  | not null
 hand1    | character(1)[]           | not null
 hand2    | character(1)[]           | not null
 pile     | character(1)[]           | not null
 letters  | character(1)[]           | not null
 values   | integer[]                | not null
 bid      | integer                  | not null
Indexes:
    "words_games_pkey" PRIMARY KEY, btree (gid)
Check constraints:
    "words_games_check" CHECK (player1 <> player2)
    "words_games_score1_check" CHECK (score1 >= 0)
    "words_games_score2_check" CHECK (score2 >= 0)
Foreign-key constraints:
    "words_games_bid_fkey" FOREIGN KEY (bid) REFERENCES words_boards(bid) ON DELETE CASCADE
    "words_games_player1_fkey" FOREIGN KEY (player1) REFERENCES words_users(uid) ON DELETE CASCADE
    "words_games_player2_fkey" FOREIGN KEY (player2) REFERENCES words_users(uid) ON DELETE CASCADE
Referenced by:
    TABLE "words_chat" CONSTRAINT "words_chat_gid_fkey" FOREIGN KEY (gid) REFERENCES words_games(gid) ON DELETE CASCADE
    TABLE "words_moves" CONSTRAINT "words_moves_gid_fkey" FOREIGN KEY (gid) REFERENCES words_games(gid) ON DELETE CASCADE
    TABLE "words_scores" CONSTRAINT "words_scores_gid_fkey" FOREIGN KEY (gid) REFERENCES words_games(gid) ON DELETE CASCADE

Solution

  • Your function should work like this:

    CREATE OR REPLACE FUNCTION words_get_user_chat(in_uid integer)
      RETURNS jsonb AS
      LANGUAGE sql STABLE
    $func$
    SELECT COALESCE(jsonb_object_agg(gid, y), '{}')
    FROM  (
       SELECT gid, jsonb_agg((SELECT j FROM (SELECT created, uid, msg) j)) AS y
       FROM  (
          SELECT DISTINCT gid  -- DISTINCT may be redundant
          FROM   words_games
          WHERE (finished IS NULL
              OR finished > (CURRENT_TIMESTAMP - INTERVAL '1 day'))
          AND    in_uid IN (player1, player2)
          ) g
       CROSS JOIN LATERAL (
          SELECT EXTRACT(EPOCH FROM created)::int AS created
               , uid
               , msg
          FROM   words_chat c
          WHERE  c.gid = g.gid
          ORDER  BY c.created DESC
          LIMIT  10                        --  HERE !!
          ) c 
       GROUP  BY 1
       ) x
    $func$;
    

    Do not aggregate all rows, just to discard the surplus later. Would be a waste. Place the LIMIT after ORDER BY in a subquery.

    You need to identify qualifying gid from words_games first for this and then use a LATERAL join to a subquery on words_chat. Should be correct and faster, too.

    Since c.created is defined NOT NULL, you don't need to add NULLS LAST in the ORDER BY clause. This matching multicolumn index should yield best read performance:

    CREATE INDEX ON words_chat(gid, created DESC);
    

    And maybe some index on words_games. Depends on cardinalities and value frequencies.

    While being at it, I also streamlined construction the jsonb result.

    Related: