I want to extract only val2
from the below column1
of a table, I'm not sure how to do it with SQL though I've tried with regexp_subtr
/instr
, the length of column values are dynamic but the contents of val2
is the one that's to be extracted
Input:
column1 |
---|
[{'val1': '54', 'val2': 'luis long, F (Peter)', 'val3': 'xxx@email.com', 'val4': 'xxxxyyy://somevalue', 'val5': 'Category', 'val6': 'some other value'}] |
Output:
column 1 | column 2 |
---|---|
[{'val1': '54', 'val2': 'luis long, F (Peter)', 'val3': 'xxx@email.com', 'val4': 'xxxxyyy://somevalue', 'val5': 'Category', 'val6': 'some other value'}]luis long, F (Peter) | luis long, F (Peter) |
Rather than using regex to parse JSON, which is not fool-proof, you should rely on Redshift's JSON API here:
SELECT
col1,
JSON_EXTRACT_PATH_TEXT(JSON_EXTRACT_ARRAY_ELEMENT_TEXT(col1, 0), 'val2') AS col2
FROM yourTable;
Edit:
Based on your feedback in the comments section, it appears that Redshift's JSON parser strictly expects keys and values to be enclosed in double quotes, not single quotes. One possible workaround here (with a caveat given below) would be to simply do a blanket replacement of all single quotes to double quotes. Hence, the following might work:
SELECT
col1,
JSON_EXTRACT_PATH_TEXT(
JSON_EXTRACT_ARRAY_ELEMENT_TEXT(REPLACE(col1, '''', '"'), 0), 'val2') AS col2
FROM yourTable;
The caveat here is that replacing all single quotes might alter the JSON structure if it contains literal single quotes. The best fix here would he to go back to your JSON source and fix it there.