I'm trying to combine all URL fields returned by an id in the same DF into a comma delimitted field. What I can't figure out is why my assign is not working and my for loop is..
The table looks like as follows:
Ref | Option1_Ref | URL |
---|---|---|
1 | 2,3,4 | /path1 |
2 | 1,4,5 | /path2 |
3 | 1,6 | /path3 |
4 | 1,5 | /path4 |
4 | 2,5 | /path5 |
5 | 3,1 | /path6 |
7 | 2,5 | /path7 |
using assign :
df=df.assign(options_url1=df.loc[df['Ref'].isin(df["Option1_Ref"].str.split(',')),['URL','Ref']].drop_duplicates('Ref')['URL'].str.join(','))
This is just returning an empty column.
for loop:
for x in df.index: df.loc[x,'options_url1']=','.join(df.loc[df['Ref'].isin(df.loc[x,"Option1_Ref"].split(',')),['URL','Ref']].drop_duplicates('Ref')['URL'].array)
This is working as expected
I'm expecting:
Ref | Option1_Ref | URL | options_url1 |
---|---|---|---|
1 | 2,3,4 | /path1 | /path2,/path3,/path4 |
2 | 1,4,5 | /path2 | /path1,/path4,/path6 |
3 | 1,6 | /path3 | /path1 |
4 | 1,5 | /path4 | /path1,/path6 |
4 | 2,5 | /path5 | /path2,/path6 |
5 | 3,1 | /path6 | /path1,/path3 |
7 | 2,5 | /path5 | /path2,/path6 |
Create Series
by remove duplicates by DataFrame.drop_duplicates
, convert Ref
to strings and map splitted values with DataFrame.explode
by Series.map
, last aggregate join
:
s = df.drop_duplicates('Ref').astype({'Ref':str}).set_index('Ref')['URL']
df['options_url1'] = (df["Option1_Ref"].str.split(',').explode().map(s).dropna()
.groupby(level=0).agg(','.join))
print (df)
Ref Option1_Ref URL options_url1
0 1 2,3,4 /path1 /path2,/path3,/path4
1 2 1,4,5 /path2 /path1,/path4
2 3 1 /path3 /path1
3 4 1,5 /path4 /path1
4 6 2,5 /path5 /path2
Or use list comprhension for mapping and join
, if possible some values not exist add if
statement:
s = df.drop_duplicates('Ref').astype({'Ref':str}).set_index('Ref')['URL']
df['options_url1'] = [','.join(s.get(y) for y in x.split(',') if y in s)
for x in df["Option1_Ref"]]
print (df)
Ref Option1_Ref URL options_url1
0 1 2,3,4 /path1 /path2,/path3,/path4
1 2 1,4,5 /path2 /path1,/path4
2 3 1 /path3 /path1
3 4 1,5 /path4 /path1
4 6 2,5 /path5 /path2