I have a list of lists and I need to create groups: each group should be start by a pattern (the word "START" in this case) and should be end the line before the next pattern, here below the example:
lst = [
["abc"],
["START"],
["cdef"],
["START"],
["fhg"],
["cdef"],
]
group_a = [
["START"],
["cdef"],
]
group_b = [
["START"],
["fhg"],
["cdef"],
]
I tried with numpy and pandas too without any success. Many thanks in advance for your support. Regards Tommaso
if you want to use pandas you could create a boolean mask
to identify the occurences of START
. Then take the cumsum()
to assign a unique group number to each occurrence. Then groupby
the group number, excluding all groups before the first occurrence of START
:
import pandas as pd
import numpy as np
lst = [
["abc"],
["START"],
["cdef"],
["START"],
["fhg"],
["cdef"],
]
df = pd.DataFrame(lst, columns=['Input'])
#create boolean mask
mask = df['Input'].eq('START')
#Intermediate Result
0 False
1 True
2 False
3 True
4 False
5 False
Name: Input, dtype: bool
#assign group number to each occurrence of start
df['Group'] = mask.cumsum()
#Intermediate Result
Input Group
0 abc 0
1 START 1
2 cdef 1
3 START 2
4 fhg 2
5 cdef 2
#create list for each group in groupby excluding groups before the
#first occurrence of 'START'
grouped_lists = [group['Input'].tolist() for _, group in df[df['Group'] > 0].groupby('Group')]
print(grouped_lists)
[['START', 'cdef'], ['START', 'fhg', 'cdef']]