pythonjsonamazon-dynamodb

Transform empty strings when loading JSON


Is there any way to populate an empty string at any position (without knowing the json's structure) in a json received from a certain endpoint before inserting it into DynamoDB? As you all know it has issues with floats that you must transform them into Decimals, but can't seem to figure out an easy way to populate the empty string such as "full_name": "" with a value like "N/A".

I'm looking for something like json.loads(json.dumps(data), parse_float=Decimal), as for the parse_float thing but for empty strings. Something clean and easy to use. I've seen you can use a custom cls class for that but I don't quite get it how to do it properly especially without knowing the structure of the json which might vary.

JSON example:

{
  "campaign_id": "9c1c6cd7-fd4d-480b-8c80-07091cdd4103",
  "creation_date": 1530804132,
  "objects": [
     {
        "id": 12345,
        "full_name": ""
     },
     ...
  ],
  ...
}

Solution

  • You can do this by defining an object_hook to pass to json.loads.

    From the docs:

    object_hook is an optional function that will be called with the result of any object literal decoded (a dict). The return value of object_hook will be used instead of the dict.

    Given this dict:

    >>> pprint(d)
    {'campaign_id': '9c1c6cd7-fd4d-480b-8c80-07091cdd4103',
     'creation_date': 1530804132,
     'float': 1.2345,
     'objects': [{'full_name': '', 'id': 12345}],
     'strs': ['', 'abc', {'a': ''}],
     'top_str': ''}
    

    This pair of functions will recurse over the result of json.loads and change instance of the empty string to 'N/A'.

    def transform_dict(mapping=None):
        if mapping is None:
            mapping = {}
        for k, v in mapping.items():
            if v == '':
                mapping[k] = 'N/A'
            elif isinstance(v, dict):
                mapping[k] = transform_dict(v)
            elif isinstance(v, list):
                mapping[k] = transform_list(v)
            else:
                # Make it obvious that we aren't changing other values
                pass
        return mapping
    
    
    def transform_list(lst):
        for i, x in enumerate(lst):
            if x == '':
                lst[i] = 'N/A'
            elif isinstance(x, dict):
                lst[i] = transform_dict(x)
            elif isinstance(x, list):
                lst[i] = transform_list(x)
            else:
                # Make it obvious that we aren't changing other values
                pass
        return lst
    
    >>> res = json.loads(
            json.dumps(d), 
            parse_float=decimal.Decimal, 
            object_hook=transform_dict,
        )
    >>> pprint(res)
    {'campaign_id': '9c1c6cd7-fd4d-480b-8c80-07091cdd4103',
     'creation_date': 1530804132,
     'float': Decimal('1.2345'),
     'objects': [{'full_name': 'N/A', 'id': 12345}],
     'strs': ['N/A', 'abc', {'a': 'N/A'}],
     'top_str': 'N/A'}
    

    Note that this approach depends on the input json being a json object ({...}).