I have a dataframe that I'd like to expand with a new column which would contain/match the list of all ids if they fully contain the row string_value
id string_value
1 The quick brown fox
2 The quick brown fox jumps
3 The quick brown fox jumps over
4 The quick brown fox jumps over the lazy dog
5 The slow
6 The slow brown fox
Desired output
id string_value new_columns
1 The quick brown fox [2, 3, 4]
2 The quick brown fox jumps [3, 4]
3 The quick brown fox jumps over [4]
4 The quick brown fox jumps over the lazy dog []
5 The slow [6]
6 The slow brown fox []
Thanks
Here's another custom function you can consider. Assuming df
is this:
id string_value
0 1 The quick brown fox
1 2 The quick brown fox jumps
2 3 The quick brown fox jumps over
3 4 The quick brown fox jumps over the lazy dog
4 5 The slow
5 6 The slow brown fox
The custom function is
def match_string(string_value):
idx_list = []
for idx, strg in list(zip(df['id'], df['string_value'])):
if strg == string_value:
continue
if string_value in strg:
idx_list.append(idx)
return idx_list
Then use lambda
function:
df['new_columns'] = df['string_value'].apply(lambda x: match_string(x))
print(df)
id string_value new_columns
0 1 The quick brown fox [2, 3, 4]
1 2 The quick brown fox jumps [3, 4]
2 3 The quick brown fox jumps over [4]
3 4 The quick brown fox jumps over the lazy dog []
4 5 The slow [6]
5 6 The slow brown fox []