pythonpandasstringdataframe

How to find integer index of a string in a column in pandas dataframe?


I am importing a csv file in pandas containing data like this. Referring to following code, I want to get integer index of row containing name_to_search in column name in df1.

name, ColB, ColC, ColD
P1, 1,1,1
P2, 0,1,0
P3, 1,1,0
...

df1 = pd.read_csv(filepath_or_buffer='file.csv', header=[0])
df1['name'].str.lower()
name_to_search = 'p1'

row_indx1 = df1.index.get_loc(df1[df1['name'] == name_to_search].index[0])  # error line

However, I am getting error IndexError: index 0 is out of bounds for axis 0 with size 0 in error line row. Any idea how to fix?


Solution

  • If default index use next with iter, with default value if no match, here -1:

    name_to_search = 'p1'
    row_indx1 = next(iter(df1[df1['name'].str.lower() == name_to_search].index), -1)
    print (row_indx1)
    0
    
    name_to_search = 'tmp'
    row_indx1 = next(iter(df1[df1['name'].str.lower() == name_to_search].index), -1)
    print (row_indx1)
    -1
    

    Or compare indices, faster if large DateFrames:

    row_indx1 = next(iter(df1.index[df1['name'].str.lower() == name_to_search]), -1)
    

    Any idea what is going wrong in original code?

    First problem is no assigned back lowercase values, so compared P2:

    df1['name'].str.lower()
    print (df1['name'])
    0    P1
    1    P2
    2    P3
    Name: name, dtype: object
    

    Second problem is if no match, is not possible select first index values of empty DataFrame and IndexError raised.

    print (df1[df1['name'] == name_to_search])
    Empty DataFrame
    Columns: [name, ColB, ColC, ColD]
    Index: []
    
    print (df1[df1['name'] == name_to_search].index)
    Index([], dtype='int64')
    
    print (df1[df1['name'] == name_to_search].index[0])
    IndexError: index 0 is out of bounds for axis 0 with size 0
    

    If default index Index.get_loc return same value, it is used for position of non RangeIndex:

    print (df1.index.get_loc(2))
    2
    
    df = pd.DataFrame({'a':range(3)}, index=['E','W','T'])
    print (df)
       a
    E  0
    W  1
    T  2
    
    print (df.index.get_loc('E'))
    0