amazon-redshiftjson-extract

Json Extract from column in amazon redshift


my query:

SELECT 
    CAST(employee_id AS INT) AS employee_id, 
    employee_email, 
    gender,
    custom_fields_json
FROM 
    xxxxxx_employee
WHERE 
    employee_id IS NOT NULL
ORDER BY 
    employee_id DESC

custom_fields_json has value in format : {"efcustom_text_userid":["XXXXXXX_00000000"]}

I want to extract XXXXXXX_00000000.

How can I do it?

tried reading json extract on aws site but of no use, chat gpt doesn't know tried searching forums but of no use.


Solution

  • An important distinction to keep in mind is whether the value of custom_fields_json is of some text type Or is being stored as a SUPER value. I'm guessing that this value is text based on the question but it isn't clear. You should really provide the DDL of the tables you are referencing along with the queries in these posts.

    So assuming this is text AND that this text is always in this format you can get this value by

    SELECT JSON_EXTRACT_ARRAY_ELEMENT_TEXT(JSON_EXTRACT_PATH_TEXT(custom_fields_json,'efcustom_text_userid',),0)
    FROM 
        xxxxxx_employee
    WHERE 
        employee_id IS NOT NULL;
    

    This first extracts the value of the key efcustom_text_userid. Then extracts the 0th element of the array that the inner function returns.

    If you need something a bit smarter than a fixed parsing of text please update with the constraints of this higher requirement.

    ===============================================================

    Update: 'custom_fields_json' is stored as SUPER

    To parse a SUPER you just need to navigate into its values. See https://docs.aws.amazon.com/redshift/latest/dg/query-super.html

    Like this:

    SELECT custom_fields_json.efcustom_text_userid[0]
    FROM 
        xxxxxx_employee
    WHERE 
        employee_id IS NOT NULL;