I have a Redshift table that contains columns with json objects. I am running into json parser failures while trying to execute queries on this table that applies specific filters on the json object content. While I am able to use the json_extract_path_text() in the select statements, the same fails when used in where clause.
Following is the error I see: Amazon Invalid operation: JSON parsing error;
When I look at the STL_ERROR table for more details, this is what I see in the error details: errcode: 8001 context: JSON parsing error error: invalid json object null
Following is an example of the content in one such json column:
{"att1":"att1_val","att2":"att2_val","att3":[{"att3_sub1_1":"att3_sub1_1_val","att3_sub1_2":"att3_sub1_2_val"},{"att3_sub2_1":"att3_sub2_1_val","att3_sub2_2":"att3_sub2_2_val"}],"att4":"att4_val","att5":"att5_val"}
Now when I run the following query, it executes without any issues:
select
json_extract_path_text(col_with_json_obj,'att4') as required_val
from table_with_json_data;
Now when I use the json_extract_path_text() in the where clause it fails with the above error:
select
json_extract_path_text(col_with_json_obj,'att4') as required_val
from table_with_json_data
where json_extract_path_text(col_with_json_obj,'att4') = 'att4_val';
Is there anything that I am using incorrectly or missing here?
P.S: I have another table with similar schema and the same queries run just fine on that. The only difference between the two tables is the way the data is loaded - one uses jsonpaths file in the copy options and the other uses json 'auto'.
This is the error you would receive if table_with_json_data
contained even a single row in which the value of col_with_json_obj
was the four-character string "null".
To avoid errors like this in general I'd recommend creating a Redshift UDF for validating JSON. The is_json() method described at http://discourse.snowplowanalytics.com/t/more-robust-json-parsing-in-redshift-with-python-udfs/197 has worked well for me:
create or replace function is_json(j varchar(max))
returns boolean
stable as $$
import json
try:
json_object = json.loads(j)
except ValueError, e:
return False
return True
$$ language plpythonu;
Then you could add a and where is_json(col_with_json_obj)
clause to your query and this class of errors can be avoided entirely.