I am just starting to learn Pandas, and in a piece of code, there is a call to df.iloc[[1][0]]
(where df
is a pd.DataFrame
with a shape
of (60935, 54)
). From the context of the code, df.iloc[[1][0]]
seems to represent a row of df
. However, how should one interpret [[1][0]]
? Why does iloc[]
allow two adjacent lists as parameters? How does iloc[]
handle this parameters internally? This clearly is not indexing both rows and columns. Additionally, I noticed that when the second number is neither 0 nor -1, an index out-of-range error occurs. Why is this?
Here are some experiments I conducted:
mydict = [{'a': 1, 'b': 2, 'c': 3, 'd': 4},
{'a': 100, 'b': 200, 'c': 300, 'd': 400},
{'a': 1000, 'b': 2000, 'c': 3000, 'd': 4000}]
df = pd.DataFrame(mydict)
print(df.iloc[[0][-1]].shape) # Outputs (4,)
print(df.iloc[[0][0]].shape) # Outputs (4,)
print(df.iloc[[0]].shape) # Outputs (1, 4)
print(df.iloc[[0][1]].shape) # Raises IndexError: list index out of range
print(type(df.iloc[[0]])) # Outputs <class 'pandas.core.frame.DataFrame'>
print(type(df.iloc[[0][0]])) # Outputs <class 'pandas.core.series.Series'>
I think this is a bit confusing programming style. Let me break it down for you.
[1]
creates a list with one element (namely the number 1).
[1][0]
then accesses the first (or 0th) element of said list, thus returning 1.
Thus, df.iloc[[1][0]]
is equivalent to df.iloc[1]
.
And similarly for the remaining indexes. The -1 returns the first item from the back of the given list. Since the list is just one element long, it will return the first element again.
df.iloc[[0]]
is requesting a list of rows (but just one row, namely the 0th element). This will result in a dataframe.
If instead, you were calling df.iloc[0]
, you would be requesting exactly one element and not a list, leading to a pd.Series being returned.
Alternatively, you could also request something like df.iloc[[0:2]]
, which would return the first two rows (and thus a pd.DataFrame again.