postgresqljsonpathjsonb-array-elements

Convert an object to array of size 1 in PostgreSQL jsonb and transform the json with nested arrays to rows


I have a two part question

We have a PostgreSQL table with a jsonb column. The values in jsonb are valid jsons, but they are such that for some rows a node would come in as an array whereas for others it will come as an object.

for example, the json we receive could either be like this ( node4 I just an object )

"node1": {
    "node2": {
        "node3": {
            "node4": {
                "attr1": "7d181b05-9c9b-4759-9368-aa7a38b0dc69",
                "attr2": "S1",
                "UserID": "WebServices",
                "attr3": "S&P 500*",
                "attr4": "EI",
                "attr5": "0"
            }
        }
    }
}

Or like this ( node4 is an array )

"node1": {
    "node2": {
        "node3": {
            "node4": [
                {
                    "attr1": "7d181b05-9c9b-4759-9368-aa7a38b0dc69",
                    "attr2": "S1",
                    "UserID": "WebServices",
                    "attr3": "S&P 500*",
                    "attr4": "EI",
                    "attr5": "0"
                },
                {
                    "attr1": "7d181b05-9c9b-4759-9368-aa7a38b0dc69",
                    "attr2": "S1",
                    "UserID": "WebServices",
                    "attr3": "S&P 500*",
                    "attr4": "EI",
                    "attr5": "0"
                }
            ]
        }
    }
}

And I have to write a jsonpath query to extract, for example, attr1, for each PostgreSQL row containing this json. I would like to have just one jsonpath query that would always work irrespective of whether the node is object or array. So, I want to use a path like below, assuming, if it is an array, it will return the value for all indices in that array.

jsonb_path_query(payload, '$.node1.node2.node3.node4[*].attr1')#>> '{}' AS "ATTR1"

I would like to avoid checking whether the type in array or object and then run a separate query for each and do a union.

Is it possible?

A sub-question related to above - Since I needed the output as text without the quotes, and somewhere I saw to use #>> '{}' - so I tried that and it is working, but can someone explain, how that works?

The second part of the question is - the incoming json can have multiple sets of nested arrays and the json and the number of nodes is huge. So other part I would like to do is flatten the json into multiple rows. The examples I found were one has to identify each level and either use cross join or unnest. What I was hoping is there is a way to flatten a node that is an array, including all of the parent information, without knowing which, if any, if its parents are arrays or simple object. Is this possible as well?

Update

I tried to look at the documentation and tried to understand the #>> '{}' construct, and then I came to realise that '{}' is the right hand operand for the #>> operator which takes a path and in my case the path is the current attribute value hence {}. Looking at examples that had non-empty single attribute path helped me realise that.

Thank you


Solution

  • You can use a "recursive term" in the JSON path expression:

    select t.some_column,
           p.attr1
    from the_table t
      cross join jsonb_path_query(payload, 'strict $.**.attr1') as p(attr1)
    

    Note that the strict modifier is required, otherwise, each value will be returned multiple times.

    This will return one row for each key attr1 found in any level of the JSON structure.

    For the given sample data, this would return:

    attr1                                 
    --------------------------------------
    "7d181b05-9c9b-4759-9368-aa7a38b0dc69"
    "7d181b05-9c9b-4759-9368-aa7a38b0dc69"
    "7d181b05-9c9b-4759-9368-aa7a38b0dc69"