In our Postgres 10 database we have a view, which has subselects like SELECT *, lower(mix_cased_field) AS mix_cased_field
, where *
already includes a column called mix_cased_field
.
My expectation would be that the conflicting columns, i.e. mix_cased_field
, would be merged and the latter would survive, i.e. I would only have the lowercase field in my final resultset.
The actual result, is that I have two fields in my resultset: mix_cased_field
and mix_cased_field_1
. When I look at the view's definition in the pg_views
table, I can see that the latter field has been renamed with the suffix by Postgres, despite my explicit naming of the column.
The query for the view looks roughly like the following:
-- table and column names have been renamed to protect the innocent
SELECT
*,
'TopmostThing' AS __type,
(
SELECT COALESCE(jsonb_agg(sub_things), '[]'::jsonb) AS sub_things
FROM (
SELECT
*,
'SubThing' AS __type,
json_date(created_at) AS created_at,
lower(mix_cased_field) AS mix_cased_field,
(
SELECT COALESCE(jsonb_agg(yet_more_things), '[]'::jsonb) AS yet_more_things
FROM (
SELECT
*,
'YetMoreThing' AS __type,
lower(mix_cased_field) AS mix_cased_field
FROM yet_more_things
WHERE yet_more_things.subthing_id = subthings.id
) AS yet_more_things
)
FROM sub_things
WHERE sub_things.topmost_thing_id = topmost_things.id
) AS sub_things
)
FROM topmost_things
When I run the query used to construct the view, the resultset matches my expectation. Is this expected behaviour and I'm clueless, or is this a bug?
A SELECT
is allowed to return multiple columns with identical names (even if some clients do not allow the same).
But CREATE VIEW
does not allow to persist this ambiguity, a table or view must return distinct column names.
Postgres does not automatically rename / merge / overwrite columns with duplicate names. The expected error message for the case would be:
ERROR: column "mix_case_column" specified more than once
However, your actual query does something else entirely: It uses jsonb_agg(sub_things) AS sub_things
to aggregate a jsonb
value from the subquery returning duplicate output column names. And in the construction of a jsonb
value, a later addition (reading the SELECT
list left-to-right) of the same key name overwrites an earlier one. Only the output column name "sub_things" remains at the SQL level.
Notably, json_agg(sub_things)
would construct a json
value retaining duplicate keys as given. The manual:
Because the
json
type stores an exact copy of the input text, it will preserve semantically-insignificant white space between tokens, as well as the order of keys within JSON objects. Also, if a JSON object within the value contains the same key more than once, all the key/value pairs are kept. (The processing functions consider the last value as the operative one.) By contrast,jsonb
does not preserve white space, does not preserve the order of object keys, and does not keep duplicate object keys. If duplicate keys are specified in the input, only the last value is kept.