Is it safe to select multiple columns with jsonb_array_elements in a single SELECT
statement? Is there a guarantee that the order of the expanded elements in one column is the same as the order in the second column?
Example: My table contains a column data
that contains a json array. Each element of the array is an object with two properties (id
and name
):
data
--------------------------------------------------------------------
[{"id": "11", "name": "entry11"}, {"id": "12", "name": "entry12"}]
[{"id": "21", "name": "entry21"}, {"id": "22", "name": "entry22"}]
If I run
with my_table(data) as (values
('[{"id": "11", "name":"entry11"},{"id": "12", "name": "entry12"}]'::jsonb),
('[{"id": "21", "name":"entry21"},{"id": "22", "name": "entry22"}]'::jsonb)
)
select
jsonb_array_elements(data)->'id' as id,
jsonb_array_elements(data)->'name' as name
from my_table
;
I get the expected result:
id | name
------+-----------
"11" | "entry11"
"12" | "entry12"
"21" | "entry21"
"22" | "entry22"
My Question: Is there a risk that the name
entry22 could end up in the row with id
21 as the two invocations of jsonb_array_elements
are handled independently by the database? My experiments (also with larger tables) suggest that it always works. But as relational databases usually don't have a stable ordering of rows I wonder if I can rely on that result.
Yes, there is a guarantee. But it's a weak one.
id
and name
will stay in sync (come from the same nested object) because the two set-returning functions are evaluated in "lockstep". The manual:
If there is more than one set-returning function in the query's select list, the behavior is similar to what you get from putting the functions into a single
LATERAL ROWS FROM( ... ) FROM
-clause item. For each row from the underlying query, there is an output row using the first result from each function, then an output row using the second result, and so on. If some of the set-returning functions produce fewer outputs than others, null values are substituted for the missing data, so that the total number of rows emitted for one underlying row is the same as for the set-returning function that produced the most outputs. Thus the set-returning functions run “in lockstep” until they are all exhausted, and then execution continues with the next underlying row.
Bold emphasis mine.
Since both your calls are guaranteed to return the same number of rows, this even works reliably before Postgres 10 (where the behavior of multiple set-returning functions in the SELECT
list was reformed (sanitized). See:
Unlike json
, jsonb
has a deterministic sort order of nested elements. But the only relevant aspect here is the order of array items, and that is always significant and preserved in json
and jsonb
alike.
Elaborating on what we just learned about the internal workings, your query:
SELECT jsonb_array_elements(data) -> 'id' AS id
, jsonb_array_elements(data) -> 'name' AS name
FROM tbl;
... is equivalent to:
SELECT t.id, x.i->'id' AS id, x.n->'name' AS name
FROM tbl t
CROSS JOIN LATERAL ROWS FROM (jsonb_array_elements(t.data), jsonb_array_elements(t.data)) x(i, n);
Makes it even more obvious, that it should be simplified to a single function call in a subquery. Then fields cannot get out of sync to begin with:
SELECT t.id, x->'id' AS id, x->'name' AS name
FROM tbl t
CROSS JOIN LATERAL jsonb_array_elements(t.data) x;
Or, with minimal syntax:
SELECT id, x->'id' AS id, x->'name' AS name
FROM tbl, jsonb_array_elements(data) x;
When returning multiple, separate fields from the same function call, move the function call to a subquery as demonstrated. Else you risk (costly) repeated evaluation. See:
SELECT
listAll of the above harbor a trap. If a SRF in the SELECT
list does not return a row (like when jsonb_array_elements()
finds an empty array), the row is removed from the result. When there are multiple SRF in the SELECT
list, the row is removed when all of them come up empty. This can lead to surprising results. From my experience, few are aware of the subtle implications.
If that side effect is intended, it's much clearer spelled out as CROSS JOIN
as demonstrated above.
If that side effect is not intended, and you'd rather preserve all rows, use LEFT JOIN LATERAL ... ON true
instead:
SELECT t.id, x->'id' AS id, x->'name' AS name
FROM tbl t
LEFT JOIN LATERAL jsonb_array_elements(t.data) x ON true;
Use this query, unless you know better. See:
jsonb_populate_recordset()
for this particular queryThere is a more efficient way for your particular query with jsonb_populate_recordset()
: faster, and with implicit type casts.
Create a fitting composite type once if you don't have one:
CREATE TYPE my_rowtype AS (id int, name text);
Then:
SELECT t.id, x.id, x.name
FROM tbl t
LEFT JOIN LATERAL jsonb_populate_recordset(null::my_rowtype, t.data) x ON true
Related: