sqljsonstringamazon-redshiftextract

Extract specific conditional string SQL Redshift


I want to extract only val2 from the below column1 of a table, I'm not sure how to do it with SQL though I've tried with regexp_subtr/instr, the length of column values are dynamic but the contents of val2 is the one that's to be extracted

Input:

column1
[{'val1': '54', 'val2': 'luis long, F (Peter)', 'val3': 'xxx@email.com', 'val4': 'xxxxyyy://somevalue', 'val5': 'Category', 'val6': 'some other value'}]

Output:

column 1 column 2
[{'val1': '54', 'val2': 'luis long, F (Peter)', 'val3': 'xxx@email.com', 'val4': 'xxxxyyy://somevalue', 'val5': 'Category', 'val6': 'some other value'}]luis long, F (Peter) luis long, F (Peter)

Solution

  • Rather than using regex to parse JSON, which is not fool-proof, you should rely on Redshift's JSON API here:

    SELECT
        col1,
        JSON_EXTRACT_PATH_TEXT(JSON_EXTRACT_ARRAY_ELEMENT_TEXT(col1, 0), 'val2') AS col2
    FROM yourTable;
    

    Edit:

    Based on your feedback in the comments section, it appears that Redshift's JSON parser strictly expects keys and values to be enclosed in double quotes, not single quotes. One possible workaround here (with a caveat given below) would be to simply do a blanket replacement of all single quotes to double quotes. Hence, the following might work:

    SELECT
        col1,
        JSON_EXTRACT_PATH_TEXT(
            JSON_EXTRACT_ARRAY_ELEMENT_TEXT(REPLACE(col1, '''', '"'), 0), 'val2') AS col2
    FROM yourTable;
    

    The caveat here is that replacing all single quotes might alter the JSON structure if it contains literal single quotes. The best fix here would he to go back to your JSON source and fix it there.