sample_data = {
'Class': ['TEST', 'NOT TEST', 'TEST 123', 'SOMETHING ELSE', 'TEST'],
'Values': ['1.3 / 4.5', '1.5', '0.7 / 5.2', '123', '1.5 / 4.0']
}
df = pd.DataFrame(sample_data)
mask = df['Class'].str.startswith('TEST')
# I WANT TO DO SOMETHING LIKE THIS -->
df.loc[mask, ['COL1', 'COL2']] = df.loc[mask, 'Values'].str.split(' / ', expand=True)
df.head()
I am getting Nan for COL1 and COL2 for all rows. My expectation would be that I would get the split values for only the rows matching the mask and Nan for others.
I have read the pandas documentation (not enough apparently) and have been working with AI, but it is sending me in circles. Clearly I am lost in the weeds and not understanding something that I think should be very simple. Maybe something with copies and such.
I would expect, and what I want, is that I would get values in COL1 and COL2 for just rows which start with 'TEST'. Other rows would be NaN for COL1 and COL2.
I understand why the below code doesn't work (am working on a copy), but I don't understand why the original snipet doesn't work:
df_subset = df[mask]
df_subset[['COL1', 'COL2']] = df_subset['Values'].str.split(' / ', expand=True)
df.head() #NOTHING CHANGED AS EXPECTED BUT df_subset is as expected
Since the column names are not aligned with the output of split
(columns are 0/1), you should use the underlying numpy array:
df.loc[mask, ['COL1', 'COL2']] = df.loc[mask, 'Values'].str.split(' / ', expand=True).values
Output:
Class Values COL1 COL2
0 TEST 1.3 / 4.5 1.3 4.5
1 NOT TEST 1.5 NaN NaN
2 TEST 123 0.7 / 5.2 0.7 5.2
3 SOMETHING ELSE 123 NaN NaN
4 TEST 1.5 / 4.0 1.5 4.0
Output of df.loc[mask, 'Values'].str.split(' / ', expand=True)
:
0 1 # not aligned with COL1/COL2
0 1.3 4.5
2 0.7 5.2
4 1.5 4.0