pythonpandasdataframe

Indexing Pandas data frames: integer rows, named columns


Say df is a pandas dataframe.

When referencing rows, df.ix[row_idx, ] only wants to be given names. e.g.

df = pd.DataFrame({'a' : ['one', 'two', 'three','four', 'five', 'six'],
                   '1' : np.arange(6)})
df = df.ix[2:6]
print(df)

   1      a
2  2  three
3  3   four
4  4   five
5  5    six

df.ix[0, 'a']

throws an error, it doesn't give return 'two'.

When referencing columns, iloc is prefers integers, not names. e.g.

df.ix[2, 1]

returns 'three', not 2. (Although df.idx[2, '1'] does return 2).

Oddly, I'd like the exact opposite functionality. Usually my column names are very meaningful, so in my code I reference them directly. But due to a lot of observation cleaning, the row names in my pandas data frames don't usually correspond to range(len(df)).

I realize I can use:

df.iloc[0].loc['a'] # returns three

But it seems ugly! Does anyone know of a better way to do this, so that the code would look like this?

df.foo[0, 'a'] # returns three

In fact, is it possible to add on my own new method to pandas.core.frame.DataFrames, so e.g. df.idx(rows, cols) is in fact df.iloc[rows].loc[cols]?


Solution

  • A very late answer but it amazed me that pandas still doesn't have such a function after all these years. If it irks you a lot, you can monkey-patch a custom indexer into the DataFrame:

    class XLocIndexer:
        def __init__(self, frame):
            self.frame = frame
        
        def __getitem__(self, key):
            row, col = key
            return self.frame.iloc[row][col]
    
    pd.core.indexing.IndexingMixin.xloc = property(lambda frame: XLocIndexer(frame))
    
    # Usage
    df.xloc[0, 'a'] # one