pythonpandasdataframe

How do i find the iloc of a row in pandas dataframe?


I have an indexed pandas dataframe. By searching through its index, I find a row of interest. How do I find out the iloc of this row?

Example:

dates = pd.date_range('1/1/2000', periods=8)
df = pd.DataFrame(np.random.randn(8, 4), index=dates, columns=['A', 'B', 'C', 'D'])
df
                   A         B         C         D
2000-01-01 -0.077564  0.310565  1.112333  1.023472
2000-01-02 -0.377221 -0.303613 -1.593735  1.354357
2000-01-03  1.023574 -0.139773  0.736999  1.417595
2000-01-04 -0.191934  0.319612  0.606402  0.392500
2000-01-05 -0.281087 -0.273864  0.154266  0.374022
2000-01-06 -1.953963  1.429507  1.730493  0.109981
2000-01-07  0.894756 -0.315175 -0.028260 -1.232693
2000-01-08 -0.032872 -0.237807  0.705088  0.978011

window_stop_row = df[df.index < '2000-01-04'].iloc[-1]
window_stop_row
Timestamp('2000-01-08 00:00:00', offset='D')
#which is the iloc of window_stop_row?

Solution

  • Generally speaking, pass the named index value to index.get_loc:

    df.index.get_loc(row_of_interest_named_index)
    

    Since you’re dealing with dates it may be more convenient to retrieve the index value with .name:

    In [131]:
    dates = pd.date_range('1/1/2000', periods=8)
    df = pd.DataFrame(np.random.randn(8, 4), index=dates, columns=['A', 'B', 'C', 'D'])
    df
    
    Out[131]:
                       A         B         C         D
    2000-01-01  0.095234 -1.000863  0.899732 -1.742152
    2000-01-02 -0.517544 -1.274137  1.734024 -1.369487
    2000-01-03  0.134112  1.964386 -0.120282  0.573676
    2000-01-04 -0.737499 -0.581444  0.528500 -0.737697
    2000-01-05 -1.777800  0.795093  0.120681  0.524045
    2000-01-06 -0.048432 -0.751365 -0.760417 -0.181658
    2000-01-07 -0.570800  0.248608 -1.428998 -0.662014
    2000-01-08 -0.147326  0.717392  3.138620  1.208639
    
    In [133]:    
    window_stop_row = df[df.index < '2000-01-04'].iloc[-1]
    window_stop_row.name
    
    Out[133]:
    Timestamp('2000-01-03 00:00:00', offset='D')
    
    In [134]:
    df.index.get_loc(window_stop_row.name)
    
    Out[134]:
    2
    

    get_loc returns the ordinal position of the label in your index which is what you want:

    In [135]:    
    df.iloc[df.index.get_loc(window_stop_row.name)]
    
    Out[135]:
    A    0.134112
    B    1.964386
    C   -0.120282
    D    0.573676
    Name: 2000-01-03 00:00:00, dtype: float64
    

    if you just want to search the index then so long as it is sorted then you can use searchsorted:

    In [142]:
    df.index.searchsorted('2000-01-04') - 1
    
    Out[142]:
    2