pythonpandasdataframeloops

How can I iterate over rows in a Pandas DataFrame?


I have a pandas dataframe, df:

   c1   c2
0  10  100
1  11  110
2  12  120

How do I iterate over the rows of this dataframe? For every row, I want to access its elements (values in cells) by the name of the columns. For example:

for row in df.rows:
    print(row['c1'], row['c2'])

I found a similar question, which suggests using either of these:

But I do not understand what the row object is and how I can work with it.


Solution

  • DataFrame.iterrows is a generator which yields both the index and row (as a Series):

    import pandas as pd
    
    df = pd.DataFrame({'c1': [10, 11, 12], 'c2': [100, 110, 120]})
    df = df.reset_index()  # make sure indexes pair with number of rows
    
    for index, row in df.iterrows():
        print(row['c1'], row['c2'])
    
    10 100
    11 110
    12 120
    

    Obligatory disclaimer from the documentation

    Iterating through pandas objects is generally slow. In many cases, iterating manually over the rows is not needed and can be avoided with one of the following approaches:

    • Look for a vectorized solution: many operations can be performed using built-in methods or NumPy functions, (boolean) indexing, …
    • When you have a function that cannot work on the full DataFrame/Series at once, it is better to use apply() instead of iterating over the values. See the docs on function application.
    • If you need to do iterative manipulations on the values but performance is important, consider writing the inner loop with cython or numba. See the enhancing performance section for some examples of this approach.

    Other answers in this thread delve into greater depth on alternatives to iter* functions if you are interested to learn more.