pythonpandas

Pandas: What is wrong with this use of DataFrame.apply for finding maximum of other columns


I have a DataFrame with a handful of date columns, I want to create a new column "MaxDate" that contains the maximum date. I tried using apply, but my various code patterns for the lambda function yield errors.

import pandas as pd
import datetime as dt
df=pd.DataFrame(
   [ [
      dt.date(2025,6,5), dt.date(2025,6,6) ],[
      dt.date(2025,6,7), dt.date(2025,6,8) ]
   ],
   columns=['A','B'], index=['Row1','Row2']
)

# Explicitly find maximum of row 0 (WORKS)
max( df.loc[ df.index[0], ['A','B'] ] )

# None of the following 3 code patterns work for "apply"

if False:

   df['MaxDate'] = df.apply( lambda row: max(
      row.loc[ row.index[0], ['A','B'] ]
   )  )

   # IndexingError: "Too many indexers"

elif False:

   df['MaxDate'] = df.apply( lambda row: max(
      row['A','B']
   )  )

   # KeyError:
   # "key of type tuple not found and not a MultiIndex"

elif False:

   df['MaxDate'] = df.apply( lambda row: max(
      row['A'],row['B']
   )  )

   # KeyError: 'A'

I tried determining whether the row variable was a DataFrame or a Series, but the result was nan

# Querying class of "row" yields a column of "nan"
df['MaxDate'] = df.apply( lambda row: type(row) )

Of the 3 code patterns above, I would like to avoid th3 last one because it requires too many repetitions of the word row, making my code "noisy".

What am I doing wrong?

Others have cited this duplicate question, which I appreciate. To me, however, this is more than just a question of how to achieve the end effect. It is also sussing out my understand of the apply method. What is wrong with my use of its mechanics? And why doesn't type(row) show the class of the row object? Without visibility into its type, it's hard to smartly come up with code patterns that are likely to work. I've re-titled the question to reflect this.


Solution

  • The mechanics of apply():

    Here's the corrected code:

    import pandas as pd
    import datetime as dt
    
    df = pd.DataFrame(
       [ [
          dt.date(2025,6,5), dt.date(2025,6,6) ],[
          dt.date(2025,6,7), dt.date(2025,6,8) ]
       ],
       columns=['A','B'], index=['Row1','Row2']
    )
    
    # Correct way to use apply
    df['MaxDate'] = df.apply(lambda row: max(row), axis=1)
    
    # Or more simply
    df['MaxDate'] = df[['A', 'B']].max(axis=1)