pythonpandasindexing

how to extract a cell value from a dataframe?


I am trying to extract a cell value from a dataframe, then why I always get a series instead of a value.

For example:

df_test=pd.DataFrame({'Well':['test1','test2','test3'],'Region':['east','west','east']})
df_test

Well    Region
0   test1   east
1   test2   west
2   test3   eas

well='test2'
region_thiswell=df_test.loc[df_test['Well']==well,'Region']
region_thiswell
1    west
Name: Region, dtype: object

I am expecting variable of region_thiswell is equal to 'west' string only. Why I am getting a series?

Thanks


Solution

  • A potential issue with item/values/iloc is that it will yield an exception of there is no match. squeeze will return an empty Series:

    df_test.loc[df_test["Well"] == 'test999', "Region"].item()
    # ValueError: can only convert an array of size 1 to a Python scalar
    
    df_test.loc[df_test["Well"] == 'test999', "Region"].values[0]
    # IndexError: index 0 is out of bounds for axis 0 with size 0
    
    df_test.loc[df_test["Well"] == 'test999', "Region"].iloc[0]
    # IndexError: single positional indexer is out-of-bounds
    
    df_test.loc[df_test["Well"] == 'test999', "Region"].squeeze()
    # Series([], Name: Region, dtype: object)
    

    One robust approach to get a scalar would be to use next+iter:

    next(iter(df_test.loc[df_test["Well"] == 'test2', "Region"]), None)
    # 'west'
    
    next(iter(df_test.loc[df_test["Well"] == 'test999', "Region"]), None)
    # None
    

    In case of multiple matches you'll get the first one:

    next(iter(df_test.loc[df_test["Well"].str.startswith('test'), "Region"]), None)
    # 'east'
    

    Alternatively, but more verbose and less efficient:

    df_test.loc[df_test["Well"] == 'test999', "Region"].reset_index().get(0)