jsonamazon-athenajson-extract

json_extract_scalar is not extracting value from json string with single quote


I have an athena record with a json string as below.

{'is_referred': False, 'landing_page': '/account/register'}

I tried to extract the landing_page value with query below but it returns null value.

select 
    json_string,
    json_extract_scalar(json_string, '$.landing_page') as landing_page
from
    my_table

I have also tried to replace the single quote to double quote but still can't retrieve the value.

select 
    json_string,
    replace(json_string, '''', '"') as double_quoted_json_string
    json_extract_scalar(double_quoted_json_string, '$.landing_page') as landing_page
from
    my_table

Any advice, please?


Solution

  • That is not valid json. Single quotes should be double quotes and boolean values should start with a lower case.

    To fix that one sample you provided you will need nested replace() inside of the json_extract_scalar() function. One replace for the single to double quote and another to fix the upper case False.

    Here is a working example:

    --CTE with test data
    with my_table as (
        select *
        From (
                values(
                        '{''is_referred'': False, ''landing_page'': ''/account/register''}'
                    )
            ) as my_table (json_string)
    )
    select *,
        json_extract_scalar(
            replace(replace(json_string, '''', '"'), ': False', ': false'),
            '$.landing_page'
        )
    from my_table
    

    Giving a result of: enter image description here

    That only works for this one sample. If you have data with boolean values of True you would need to do another replace.

    Without knowing your full data set there could be other anomalies that this would not work for.