I want to transform it into a pandas dataframe like this:
Basically i'm trying to normalize the dataset to populate a sql table.
I have used json_normalize to create a separate dataset from genres column but I'm at a loss over how to transform both the columns as shown in the above depiction.
Some suggestions would be highly appreciated.
If the genre_id
is the only numeric value (as shown in the picture), you can use the following:
#find all occurrences of digits in the column and convert the list items to comma separated string.
df['genre_id'] = df['genres'].str.findall(r'(\d+)').apply(', '.join)
#use pandas.DataFrame.explode to generate new genre_ids by comma separating them.
df = df.assign(genre_id = df.genre_id.str.split(',')).explode('genre_id')
#finally remove the extra space
df['genre_id'] = df['genre_id'].str.lstrip()
#if required create a new dataframe with these 2 columns only
df = df[['id','genre_id']]