pythonarrayspandaslist

Pandas indexing


Can someone explain what is meant by

Both loc and iloc [in Pandas] are row-first, column-second. This is the opposite of what we do in native Python, which is column-first, row-second.

Because I thought when accessing arrays or lists of lists, the first index always represents the row:

matrix = [
    [1,2,3], # row 1, index 0
    [4,5,6], # row 2, index 1
    [7,8,9] # row 3, index 2
]
print(matrix[1][2]) # Output = 6

Solution

  • I would say that statement is incorrect or, at least, very misleading and likely to cause confusion.

    Both iloc and loc are row-first & column-second, but this is exactly the same as how indexing works in native Python and your example. First index refers to the row, and the second index refers to the column.

    Your example in pandas using iloc/loc also outputs 6:

    import pandas as pd
    
    data = [
        [1, 2, 3], # row 0
        [4, 5, 6], # row 1
        [7, 8, 9]  # row 2
    ]
    
    df = pd.DataFrame(data)
    
    print(df.iloc[1, 2])
    
    # Output: 6
    

    There has already been some discussion about this exact statement in this Kaggle discussion, but to me is still not clear to what the author was referring to.

    As per Siraz Naorem understanding, the statement might be referring to the creation of DataFrames from column-oriented data, e.g. dictionaries, where each list or array represents a column, not a row.

    If we replicate again your example but create the DataFrame from a dictionary like this:

    df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 9]})
    
    print(df)
    # Output:    
    #    A  B  C
    # 0  1  4  7
    # 1  2  5  8
    # 2  3  6  9
    

    Now, when we access index [1,2], we do not get 6:

    print(df.iloc[1, 2]) 
    # Output: 8
    
    print(df.iloc[2, 1]) 
    # Output: 6
    

    In this case, the row and column indices might seem reversed and may lead to the mistaken idea that indexing is different: iloc[1,2] give us now 8, and we have to use iloc[2,1] to get the value 6.

    However, iloc/loc indexing itself has not changed, is still row-first & column-second, and what is different is the structure of the DataFrame, since pandas internally has treated each list in the dictionary as a column.