Can someone explain what is meant by
Both
loc
andiloc
[in Pandas] are row-first, column-second. This is the opposite of what we do in native Python, which is column-first, row-second.
Because I thought when accessing arrays or lists of lists, the first index always represents the row:
matrix = [
[1,2,3], # row 1, index 0
[4,5,6], # row 2, index 1
[7,8,9] # row 3, index 2
]
print(matrix[1][2]) # Output = 6
I would say that statement is incorrect or, at least, very misleading and likely to cause confusion.
Both iloc
and loc
are row-first & column-second, but this is exactly the same as how indexing works in native Python and your example. First index refers to the row, and the second index refers to the column.
Your example in pandas using iloc/loc
also outputs 6:
import pandas as pd
data = [
[1, 2, 3], # row 0
[4, 5, 6], # row 1
[7, 8, 9] # row 2
]
df = pd.DataFrame(data)
print(df.iloc[1, 2])
# Output: 6
There has already been some discussion about this exact statement in this Kaggle discussion, but to me is still not clear to what the author was referring to.
As per Siraz Naorem understanding, the statement might be referring to the creation of DataFrames from column-oriented data, e.g. dictionaries, where each list or array represents a column, not a row.
If we replicate again your example but create the DataFrame from a dictionary like this:
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 9]})
print(df)
# Output:
# A B C
# 0 1 4 7
# 1 2 5 8
# 2 3 6 9
Now, when we access index [1,2]
, we do not get 6:
print(df.iloc[1, 2])
# Output: 8
print(df.iloc[2, 1])
# Output: 6
In this case, the row and column indices might seem reversed and may lead to the mistaken idea that indexing is different: iloc[1,2]
give us now 8, and we have to use iloc[2,1]
to get the value 6.
However, iloc/loc
indexing itself has not changed, is still row-first & column-second, and what is different is the structure of the DataFrame, since pandas internally has treated each list in the dictionary as a column.