I have pandas dataframe and I am trying to select multiple columns (column range starting from Test
to Bio Ref
). Selection has to start from column Test
to any column whose name starts with Bio
. Below is the sample dataframe.
In reality it can contain:
Test
column,Test
& Bio Ref
like 2,3,4,5 etc.Bio Ref
.Bio Ref
column can contain suffix in it but Bio Ref
will be there as start of column name always.df_chunk = pd.DataFrame({
'Waste':[None,None],
'Test':['something', 'something'],
'2':[None,None],
'3':[None,None],
'Bio Ref':['2-50','15-100'],
'None':[None,None]})
df_chunk
Waste Test 2 3 Bio Ref None
0 None something None None 2-50 None
1 None something None None 15-100 None
I have tried below codes that work:
df_chunk.columns.str.startswith('Bio')
df_chunk[df_chunk.columns[pd.Series(df_chunk.columns).str.startswith('Bio')==1]]
Issue: But when I try to use them for multiple column Selection then it doesn't work:
df_chunk.loc[:, 'Test':df_chunk.columns.str.startswith('Bio')]
You can creates masks for boolean indexing:
m1 = np.maximum.accumulate(df_chunk.columns=='Test')
# array([False, True, True, True, True, True])
m2 = np.maximum.accumulate(df_chunk.columns.str.startswith('Bio')[::-1])[::-1]
# array([ True, True, True, True, True, False])
# m1 & m2
# array([False, True, True, True, True, False])
out = df_chunk.loc[:, (m1&m2)]
Or identify the correct names to build a slice:
start = 'Test'
end = next(iter(df_chunk.columns[df_chunk.columns.str.startswith('Bio')]), None)
out = df_chunk.loc[:, slice(start, end)]
Output:
Test 2 3 Bio Ref
0 something None None 2-50
1 something None None 15-100