pythonpandas

How to access pandas DataFrame datetime index using strings


This is a very simple and practical question. I have the feeling that it must be a silly detail and that there should be similar questions. I wasn't able to find them tho. If someone does I'll happily delete this one.

The closest I found were these: pandas: iterating over DataFrame index with loc

How to select rows within a pandas dataframe based on time only when index is date and time

anyway, the thing is, I have a datetime indexed panda dataframe as follows:

In[81]: y
Out[81]: 
            PETR4  CSNA3  VALE5
2008-01-01    0.0    0.0    0.0
2008-01-02    1.0    1.0    1.0
2008-01-03    7.0    7.0    7.0

In[82]: y.index
Out[82]: DatetimeIndex(['2008-01-01', '2008-01-02', '2008-01-03'], dtype='datetime64[ns]', freq=None)

Oddly enough, I can't access its values using none of the following methods:

In[83]: y[datetime.datetime(2008,1,1)]
In[84]: y['2008-1-1']
In[85]: y['1/1/2008']

I get the KeyError error.

Even more weird is that the following methods DO work:

In[86]: y['2008']
Out[86]: 
            PETR4  CSNA3  VALE5
2008-01-01    0.0    0.0    0.0
2008-01-02    1.0    1.0    1.0
2008-01-03    7.0    7.0    7.0
In[87]: y['2008-1']
Out[87]: 
            PETR4  CSNA3  VALE5
2008-01-01    0.0    0.0    0.0
2008-01-02    1.0    1.0    1.0
2008-01-03    7.0    7.0    7.0

I'm fairly new to pandas so maybe I'm missing something here?


Solution

  • pandas is taking what's inside the [] and deciding what it should do. If it's a subset of column names, it'll return a DataFrame with those columns. If it's a range of index values, it'll return a subset of those rows. What is does not handle is taking a single index value.

    Solution

    Two work around's

    1.Turn the argument into something pandas interprets as a range.

    df['2008-01-01':'2008-01-01']
    

    2.Use the method designed to give you this result. loc[]

    df.loc['2008-01-01']
    

    Link to the documentation