pythonpandasdataframeindexing

How to predict the resulting type after indexing a Pandas DataFrame


I have a Pandas DataFrame, as defined here:

df = pd.DataFrame({'Name': ['Alice', 'Bob', 'Aritra'],
                   'Age': [25, 30, 35],
                   'Location': ['Seattle', 'New York', 'Kona']},
                  index=([10, 20, 30]))

However, when I index into this DataFrame, I can't accurately predict what type of object is going to result from the indexing:

# (1) str
df.iloc[0, df.columns.get_loc('Name')]
# (2) Series
df.iloc[0:1, df.columns.get_loc('Name')]

# (3) Series
df.iloc[0:2, df.columns.get_loc('Name')]
# (4) DataFrame
df.iloc[0:2, df.columns.get_loc('Name'):df.columns.get_loc('Age')]

# (5) Series
df.iloc[0, df.columns.get_loc('Name'):df.columns.get_loc('Location')]
# (6) DataFrame
df.iloc[0:1, df.columns.get_loc('Name'):df.columns.get_loc('Location')]

Note that each of the pairs above contain the same data. (e.g. (2) is a Series that contains a single string, (4) is a DataFrame that contains a single column, etc.)

Why do they output different types of objects? How can I predict what type of object will be output?

Given the data, it looks like the rule is based on how many slices (colons) you have in the index:

However, I'm not sure if this is always true, and even if it is always true, I want to know the underlying mechanism as to why it is like that.

I've spent a while looking at the indexing documentation, but it doesn't seem to describe this behavior clearly. The documentation for the iloc function also doesn't describe the return types.

I'm also interested in the same question for loc instead of iloc, but, since loc is inclusive, the results aren't quite as bewildering. (That is, you can't get pairs of indexes with different types where the indexes should pull out the exact same data.)


Solution

  • You got the general idea. To make it simple, what matter is not the number of items but the type of indexer.

    You can index as 0D (with a scalar), let's just consider the index for now:

    df.iloc[0]
    
    df.loc[0]
    

    or 1D (with a slice or iterable):

    df.loc[[0]]
    
    df.loc[1:2]
    
    df.loc[:0]
    

    Then the rule is simple, consider both axes, if both are 0D you get a scalar (here a string), if both are 1D you get a DataFrame, else a Series:

    columns      0D         1D
    index                     
    0D       scalar     Series
    1D       Series  DataFrame
    

    Some examples to illustrate this:

    type(df.iloc[1:2, 1:2])        # 1D / 1D
    # pandas.core.frame.DataFrame
    
    type(df.iloc[:0, :0])          # 1D / 1D
    # pandas.core.frame.DataFrame  (EMPTY DataFrame)
    
    type(df.iloc[[], []])          # 1D / 1D
    # pandas.core.frame.DataFrame  (EMPTY DataFrame)
    
    type(df.iloc[[1,2], 0])        # 1D / 0D
    # pandas.core.series.Series
    
    type(df.iloc[0, [0]])          # 0D / 1D
    # pandas.core.series.Series
    
    type(df.iloc[0, 0])            # 0D / 0D
    # str