javaapache-sparkspark-java

How to create a struct column from a list of column names in Spark with Java?


I have a DataFrame with multiple columns, e.g.

root
 |-- playerName
 |-- country
 |-- bowlingAvg
 |-- bowlingSR
 |-- wickets
 |-- battingAvg
 |-- battingSR
 |-- runs

I also have a list of the column names which corresponds to bowling stats:

List bowlingParams = new ArrayList(Arrays.asList("bowlingAvg", "bowlingSR", "wickets"));

Expected Schema:

root
 |-- playerName
 |-- country
 |-- bowlingAvg
 |-- bowlingSR
 |-- wickets
 |-- battingAvg
 |-- battingSR
 |-- runs
 |-- bowlingStats 
       |-- bowlingAvg
       |-- bowlingSR
       |-- wickets

I can do it like this

playerDF = playerDF.withColumn("bowlingStats", functions.struct("bowlingAvg", "bowlingSR", "wickets"))

However, I want to use the list to dynamically select the column for struct.

I know we can do it like this in Scala

playerDF = playerDF.select(struct(bowlingParams.map(col): _*))

and, I have also found a reference on how to do this in Python

Is there a way we can do this in Java with Spark?


Solution

  • For java this solution worked for me,