I am trying to extract a cell value from a dataframe, then why I always get a series instead of a value.
For example:
df_test=pd.DataFrame({'Well':['test1','test2','test3'],'Region':['east','west','east']})
df_test
Well Region
0 test1 east
1 test2 west
2 test3 eas
well='test2'
region_thiswell=df_test.loc[df_test['Well']==well,'Region']
region_thiswell
1 west
Name: Region, dtype: object
I am expecting variable of region_thiswell is equal to 'west' string only. Why I am getting a series?
Thanks
A potential issue with item
/values
/iloc
is that it will yield an exception of there is no match. squeeze
will return an empty Series:
df_test.loc[df_test["Well"] == 'test999', "Region"].item()
# ValueError: can only convert an array of size 1 to a Python scalar
df_test.loc[df_test["Well"] == 'test999', "Region"].values[0]
# IndexError: index 0 is out of bounds for axis 0 with size 0
df_test.loc[df_test["Well"] == 'test999', "Region"].iloc[0]
# IndexError: single positional indexer is out-of-bounds
df_test.loc[df_test["Well"] == 'test999', "Region"].squeeze()
# Series([], Name: Region, dtype: object)
One robust approach to get a scalar would be to use next
+iter
:
next(iter(df_test.loc[df_test["Well"] == 'test2', "Region"]), None)
# 'west'
next(iter(df_test.loc[df_test["Well"] == 'test999', "Region"]), None)
# None
In case of multiple matches you'll get the first one:
next(iter(df_test.loc[df_test["Well"].str.startswith('test'), "Region"]), None)
# 'east'
Alternatively, but more verbose and less efficient:
df_test.loc[df_test["Well"] == 'test999', "Region"].reset_index().get(0)