I would like to select all regions with value above 1 if they are connected to an element with value above 5. Two values are not connected if they are separated by a 0.
For the following data set,
pd.Series(data = [0,2,0,2,3,6,3,0])
the output should be
pd.Series(data = [False,False,False,True,True,True,True,False])
Well, looks like I have found a one-liner, using pandas groupby function:
import pandas as pd
ts = pd.Series(data = [0,2,0,2,3,6,3,0])
# The flag column allows me to identify sequences. Here 0s are included
# in the "sequence", but as you can see in next line doesn't matter
df = pd.concat([ts, (ts==0).cumsum()], axis = 1, keys = ['val', 'flag'])
# val flag
#0 0 1
#1 2 1
#2 0 2
#3 2 2
#4 3 2
#5 6 2
#6 3 2
#7 0 3
# For each group (having the same flag), I do a boolean AND of two conditions:
# any value above 5 AND value above 1 (which excludes zeros)
df.groupby('flag').transform(lambda x: (x>5).any() * x > 1)
#Out[32]:
# val
#0 False
#1 False
#2 False
#3 True
#4 True
#5 True
#6 True
#7 False
If you are wondering, you can collapse everything in one line:
ts.groupby((ts==0).cumsum()).transform(lambda x: (x>5).any() * x > 1).astype(bool)
I still leave for reference my first approach:
import itertools
import pandas as pd
def flatten(l):
# Util function to flatten a list of lists
# e.g. [[1], [2,3]] -> [1,2,3]
return list(itertools.chain(*l))
ts = pd.Series(data = [0,2,0,2,3,6,3,0])
#Get data as list
values = ts.values.tolist()
# From what I understand the 0s delimit subsequences (so numbers are not
# connected if separated by a 0
# Get location of zeros
gap_loc = [idx for (idx, el) in enumerate(values) if el==0]
# Re-create pandas series
gap_series = pd.Series(False, index = gap_loc)
# Get values and locations of the subsequences (i.e. seperated by zeros)
valid_loc = [range(prev_gap+1,gap) for prev_gap, gap in zip(gap_loc[:-1],gap_loc[1:])]
list_seq = [values[prev_gap+1:gap] for prev_gap, gap in zip(gap_loc[:-1],gap_loc[1:])]
# list_seq = [[2], [2, 3, 6, 3]]
# Verify your condition
check_condition = [[el>1 and any(map(lambda x: x>5, sublist)) for el in sublist]
for sublist in list_seq]
# Put results back into a pandas Series
valid_series = pd.Series(flatten(check_condition), index = flatten(valid_loc))
# Put everything together:
result = pd.concat([gap_series, valid_series], axis = 0).sort_index()
#result
#Out[101]:
#0 False
#1 False
#2 False
#3 True
#4 True
#5 True
#6 True
#7 False
#dtype: bool