this should be quite simple but I still didn't find a way. I have to compute a new column with a value of maximum of columns col1 and col2. So if col1 is 2 and col2 is 4, the new_col should have 4. And so on. It's in a Pyspark dataframe.
I tried df=df.withColumn("new_col",max("col1","col2"))
, but got the error "_() takes 1 positional argument but 2 were given". So what would be the correct way?
Thanks in advance.
you can try with greatest
:
from pyspark.sql import functions as F
output = df.withColumn("new_col", F.greatest("col1","col2"))