I have a set of ndJOSN dataset like the below:
{'ADDRESS_CITY': 'Whittier', 'ADDRESS_LINE_1': '905 Greenleaf Avenue', 'ADDRESS_STATE': 'CA', 'ADDRESS_ZIP': '90402',},
{'ADDRESS_CITY': 'Cedar Falls', 'ADDRESS_LINE_1': '93323 Maplewood Dr', 'ADDRESS_STATE': 'CA', 'ADDRESS_ZIP': '95014'}
I need to pass values from above into an api request, specifically the body in the format below.
data=[
{
"addressee":"Greenleaf Avenue",
"street":"905 Greenleaf Avenue",
"city":"Whittier",
"state":"CA",
"zipcode":"90402",
},
{
"addressee":"93323",
"street":"Maplewood Dr",
"city":"Cedar Falls",
"state":"CA",
"zipcode":"95014",
}
]
As you can see, the Key's are different so I need to change the Key's to align with the correct data and pass them in with the new key names (ie address_line_1 goes to addressee) - and there are going to be 10k addresses in this request.
I did not note it in my first example, but there is an ID associated with each address - I have to remove to make the request,and then add back in. So I ended up solving with the below - anything more pythonic, these feels not so eloquent to me...?
addresses = ndjson.loads(addresses)
data = json.loads(json.dumps(addresses).replace('"ADDRESS_CITY"','"city"').replace('"ADDRESS_LINE_1"','"street"').replace('"ADDRESS_STATE"','"state"').replace('"ADDRESS_ZIP"','"zipcode"'))
ids = []
for i in data:
i['candidates'] = 1
ids.append(i["ID"])
del i["ID"]
response = requests.request("POST", url, json=data)
resp_data = response.json()
a = 0
for i in resp_data:
i['ID'] = ids[a]
x = i['ID'] = ids[a]
a = a + 1
If you want to make things a bit easier for yourself, I would suggest using data classes to model your input data. The main benefit of this is that you can use dot .
access for attributes, and you don't need to work with dictionaries which have dynamic keys. You also benefit from type hinting, so your IDE should be able to better assist you as well.
In this case, I would suggest pairing it with a JSON serialization library such as the dataclass-wizard, which actually supports this use case perfectly. As of the latest version - v0.15.0, it should also support excluding fields from the serialization / dump process.
Here is a straightforward example that I put together, which uses the desired key mapping from above:
import json
from dataclasses import dataclass, field
# note: for python 3.9+, you can import this from `typing` instead
from typing_extensions import Annotated
from dataclass_wizard import JSONWizard, json_key
@dataclass
class AddressInfo(JSONWizard):
"""
AddressInfo dataclass
"""
city: Annotated[str, json_key('ADDRESS_CITY')]
street: Annotated[str, json_key('ADDRESS_LINE_1')]
state: Annotated[str, json_key('ADDRESS_STATE')]
# pass `dump=False`, so we exclude the field in serialization.
id: Annotated[int, json_key('ID', dump=False)]
# you could also annotate the below like `Union[str, int]`
# if you want to retain it as a string.
zipcode: Annotated[int, json_key('ADDRESS_ZIP')]
# exclude this field from the constructor (and from the
# de-serialization process)
candidates: int = field(default=1, init=False)
And sample usage of the above:
input_obj = [{'ADDRESS_CITY': 'Whittier', 'ADDRESS_LINE_1': '905 Greenleaf Avenue',
'ADDRESS_STATE': 'CA', 'ADDRESS_ZIP': '90402',
'ID': 111},
{'ADDRESS_CITY': 'Cedar Falls', 'ADDRESS_LINE_1': '93323 Maplewood Dr',
'ADDRESS_STATE': 'CA', 'ADDRESS_ZIP': '95014',
'ID': 222}]
addresses = AddressInfo.from_list(input_obj)
print('-- Addresses')
for a in addresses:
print(repr(a))
out_list = [a.to_dict() for a in addresses]
print('-- To JSON')
print(json.dumps(out_list, indent=2))
# alternatively, with the latest version (0.15.1)
# print(AddressInfo.list_to_json(addresses, indent=2))
Note: you can still access the id
for each address as normal, even though this field is omitted from the JSON result.