sqlpostgresqlplpgsqldynamic-sql

Macros / Meta-programming in Postgres queries


In the case that I have the same example data as in this question and additionally declare the following two functions:

CREATE OR REPLACE FUNCTION example.markout_666_example_666_price_table_666_price(_symbol text, _time_of timestamptz, _start interval, _duration interval)
  RETURNS float8
  LANGUAGE sql STABLE STRICT PARALLEL SAFE AS  -- !
$func$
SELECT p.price
FROM   example.price_table p
WHERE  p.symbol = _symbol
AND    p.time_of >= _time_of + _start
AND    p.time_of <= _time_of + _start + _duration
ORDER  BY p.time_of
LIMIT  1;
$func$;

CREATE OR REPLACE FUNCTION example.markout_666_example_666_price_table_666_volume(_symbol text, _time_of timestamptz, _start interval, _duration interval)
  RETURNS float8
  LANGUAGE sql STABLE STRICT PARALLEL SAFE AS  -- !
$func$
SELECT p.volume
FROM   example.price_table p
WHERE  p.symbol = _symbol
AND    p.time_of >= _time_of + _start
AND    p.time_of <= _time_of + _start + _duration
ORDER  BY p.time_of
LIMIT  1;
$func$;

These two functions are similar but reference different columns. In a more general case they might also reference different tables. I state two different functions however as inputting a column name (or a different table name) to a function seems to be regarded as an anti-pattern in writing Postgres functions.

I can use both of these functions in a query like:

SELECT symbol, time_of, example.markout_666_example_666_price_table_666_price(symbol, time_of, '3 hours', '24 hours') as markout_price,
                        example.markout_666_example_666_price_table_666_price(symbol, time_of, '25 hours', '24 hours') as markout_price_2,
                        example.markout_666_example_666_price_table_666_volume(symbol, time_of, '3 hours', '24 hours') as markout_volume
from example.interesting_times it; 

This is quite verbose however and we need to write symbol and time_of several times. If we have functions declared for more tables and more functions of these tables the queries can get quite complex. Is it possible to instead write something like:

SELECT symbol, time_of, example.markout('example.price_table', 'price', '3 hours', '24 hours') as markout_price,
                        example.markout('example.price_table', 'price', '25 hours', '24 hours') as markout_price_2,
                        example.markout('example.price_table', 'volume', '3 hours', '24 hours') as markout_volume
from example.interesting_times it; 

where example.markout is a macro/metaprogramming type construct and have this function be evaluated the same as if we used the more vebose syntax? Is there any metaprogramming-like technique that can be used here?

All I can find searching is sql_macro in oracle database and this page on "macro commands" in an out of date version of Postgres which is no longer in the Postgres manual.


Solution

  • To make the function work for different tables and different (sets of) columns, you need dynamic SQL. Makes the design more sophisticated. You need to know your PL/pgSQL and beware of SQL injection!

    If you are not so sure, and there are just a couple of lookup-tables, rather create one dedicated function per table, returning the super-set of possible columns. Even I would do that.

    That said, here is a perfectly safe and optimized function.
    There are multiple advanced concepts at work.

    CREATE OR REPLACE FUNCTION f_markout(_tbl regclass
                                       , _symbol text
                                       , _time_of timestamptz
                                       , _start interval
                                       , _duration interval
                                       , VARIADIC _cols text[]  -- last IN param!
                                       , OUT _rec record        -- short syntax
                                        )
      LANGUAGE plpgsql STABLE STRICT PARALLEL SAFE AS
    $func$
    BEGIN
       EXECUTE format(
          $q$
          SELECT %1$s
          FROM   %2$s p
          WHERE  p.symbol = $1
          AND    p.time_of >= $2
          AND    p.time_of <= $3
          ORDER  BY p.time_of
          LIMIT  1;
          $q$
        , (SELECT string_agg(quote_ident(c), ', ') FROM unnest(_cols) c)  -- %1 (quoted as identifiers!)
        , _tbl                                                            -- %2 (auto-quoted!)                                                    
          )
       USING _symbol                        -- $1
           , _time_of + _start              -- $2
           , _time_of + _start + _duration  -- $3
       INTO _rec;
    END
    $func$;
    

    fiddle

    Call:

    SELECT *
    FROM f_markout('price_table', 'GME', '2016-01-02 00:30+0', '3h', '24h', 'price', 'volume') AS p(p1 float8, v1 float8);  -- !!!
    

    This is one of the rare cases where a function returning anonymous records actually makes sense.
    Note how it demands a column definition list in the call. Use any column names, but data types must match!

    Your query:

    SELECT i.symbol, i.time_of, m1.*, m2.*
    FROM   interesting_times i
         , f_markout('price_table' , i.symbol, i.time_of, '3 h', '24 h', 'price', 'volume')     AS m1(price1 float8, volume float8)
         , f_markout('price_table2', i.symbol, i.time_of, '3 h', '24 h', 'price', 'Clown Item') AS m2(price2 float8, "Clown Item" text);
    

    Note how I call the function in the FROM list. The comma is effectively short syntax for CROSS JOIN LATERAL - which is safe for my function. (Wouldn't be safe for a "table-function", which can return 0 rows, thereby killing all results. So we'd use LEFT JOIN instead.) About LATERAL:

    This way, each function is called once only. If you'd put the function in the SELECT list and decompose directly, that would result in multiple function calls for multiple result columns. See:

    This way we can access each table once per time frame. Doing it multiple times for multiple result columns would also multiply the cost.

    You want to be able to pass any number of column names. At the same time we do not want to pass that as concatenated string, which would be wide open to SQL injection. The clean and elegant solution is a VARIADIC parameter. Must be the last one in the list of IN parameters to be unambiguous.

    Before concatenating, I make sure each column name is double-quoted where needed, thereby making SQL-injection completely impossible. Column names must be passed case-sensitively! See:

    The table name is passed as type regclass. Takes care of proper quoting automatically, and fails immediately for non-existent tables. Also allows to schema-qualify the input or not. See:

    I pass values as values to EXECUTE with the USING clause. Makes SQL-injection impossible, and also avoids cost and potential errors from casting input to text, concatenating and casting back in the query.

    Plain SQL

    As a reminder: plain SQL will still be slightly faster. More verbose, but less error-prone.
    The equivalent to above query:

    SELECT i.symbol, i.time_of, p1.*, p2.*
    FROM   interesting_times i
    LEFT   JOIN LATERAL (
       SELECT p.price AS price1, volume AS volume1
       FROM   price_table p
       WHERE  p.symbol = i.symbol
       AND    p.time_of >= i.time_of + interval '3h'
       AND    p.time_of <= i.time_of + interval '27h'
       ORDER  BY p.time_of
       LIMIT  1
       ) p1 ON true
    LEFT   JOIN LATERAL (
       SELECT p.price AS price2, p."Clown Item"
       FROM   price_table2 p
       WHERE  p.symbol = i.symbol
       AND    p.time_of >= i.time_of + interval '3h'
       AND    p.time_of <= i.time_of + interval '27h'
       ORDER  BY p.time_of
       LIMIT  1
       ) p2 ON true
    ORDER  BY 1, 2;
    

    Note the LEFT JOIN in this case.

    fiddle