I've run into a strange issue with an API I am calling. I'm getting my data as a JSON with a string containing a JSON inside of it. Instead of being nested, for some reason it comes across as a string field inside the JSON. I've gotten to a point where I can change this original JSON into a Pandas Dataframe that can be seen below, but I need to eventually drop this into a table in a SQL database, but the JSON I am working with is significantly longer than the sample I am working with here, and as a result, the JSON is being cut off due to the maximum size of a varchar I can work with in SQL.
Basically, is there any way for me to work with this dataframe in Python to translate the JSON into additional columns and rows on the dataframe?
This is the Dataframe I am currently working with:
Department | Value | json |
---|---|---|
1 | A | [{"employeeID":123,"name":"Jenny"}, {"employeeID":456,"name":"Mike"}, {"employeeID":789,"name":"Ricky"}] |
2 | B | [{"employeeID":735,"name":"Todd", "badgeNo":84639}, {"employeeID":223,"name":"Greg", "badgeNo":93649}] |
3 | C | [] |
4 | D | [{"employeeID":947,"name":"Cardi"}, {"employeeID":284,"name":"Tom"}] |
I am trying to get my Dataframe to look like this
Department | Value | employeeID | name | badgeNo |
---|---|---|---|---|
1 | A | 123 | Jenny | |
1 | A | 456 | Mike | |
1 | A | 789 | Ricky | |
2 | B | 735 | Todd | 84639 |
2 | B | 223 | Greg | 93649 |
3 | C | |||
4 | D | 947 | Cardi | |
4 | D | 284 | Tom |
Any and all help is appreciated
If the values in column 'json'
are strings, first convert them to JSON with json.loads
.
Then you can explode
the values into new columns for each JSON object, and use json_normalize
to convert them into dataframes.
Finally, concat
(on axis=1
) the original dataframe minus column 'json'
+ the newly created dataframe.
df["json"] = df["json"].apply(json.loads)
df = df.explode("json").reset_index(drop=True)
out = pd.concat([df.drop(columns="json"), pd.json_normalize(df["json"])], axis=1)
Department Value employeeID name badgeNo
0 1 A 123.0 Jenny NaN
1 1 A 456.0 Mike NaN
2 1 A 789.0 Ricky NaN
3 2 B 735.0 Todd 84639.0
4 2 B 223.0 Greg 93649.0
5 3 C NaN NaN NaN
6 4 D 947.0 Cardi NaN
7 4 D 284.0 Tom NaN