dataframepyspark

Maximum of two columns in Pyspark


this should be quite simple but I still didn't find a way. I have to compute a new column with a value of maximum of columns col1 and col2. So if col1 is 2 and col2 is 4, the new_col should have 4. And so on. It's in a Pyspark dataframe. I tried df=df.withColumn("new_col",max("col1","col2")), but got the error "_() takes 1 positional argument but 2 were given". So what would be the correct way? Thanks in advance.


Solution

  • you can try with greatest:

    from pyspark.sql import functions as F
    output = df.withColumn("new_col", F.greatest("col1","col2"))