[SOLVED] Group list of lists based on a condition

Group list of lists based on a condition

I have a list of lists and I need to create groups: each group should be start by a pattern (the word "START" in this case) and should be end the line before the next pattern, here below the example:

lst = [
    ["abc"],
    ["START"],
    ["cdef"],
    ["START"],
    ["fhg"],
    ["cdef"],
]

group_a = [
    ["START"],
    ["cdef"],
]

group_b = [
    ["START"],
    ["fhg"],
    ["cdef"],
]

I tried with numpy and pandas too without any success. Many thanks in advance for your support. Regards Tommaso

Solution

if you want to use pandas you could create a boolean mask to identify the occurences of START . Then take the cumsum() to assign a unique group number to each occurrence. Then groupby the group number, excluding all groups before the first occurrence of START :

import pandas as pd
import numpy as np

lst = [
    ["abc"],
    ["START"],
    ["cdef"],
    ["START"],
    ["fhg"],
    ["cdef"],
]

df = pd.DataFrame(lst, columns=['Input'])

#create boolean mask
mask = df['Input'].eq('START')

#Intermediate Result
0    False
1     True
2    False
3     True
4    False
5    False
Name: Input, dtype: bool



#assign group number to each occurrence of start
df['Group'] = mask.cumsum()

#Intermediate Result
 Input  Group
0    abc      0
1  START      1
2   cdef      1
3  START      2
4    fhg      2
5   cdef      2




#create list for each group in groupby excluding groups before the 
#first occurrence of 'START'
grouped_lists = [group['Input'].tolist() for _, group in df[df['Group'] > 0].groupby('Group')]



print(grouped_lists)
[['START', 'cdef'], ['START', 'fhg', 'cdef']]