scalaapache-sparkapache-spark-sqlsparkcore

spark-scala: Transform the dataframe to generate new column gender and vice versa


Table1:

class   male    female
1   2   1
2   0   2
3   2   0

table2:

class   gender
1   m
1   f
1   m
2   f
2   f
3   m
3   m

Using spark-scala take the data from table1 and dump into another table in the format of table2 as given.Also please do vice-versa

Please help me in this guys.

Thanks in Advance


Solution

  • You can use udf and explode function like below.

      import org.apache.spark.sql.functions._
      import spark.implicits._
    
      val df=Seq((1,2,1),(2,0,2),(3,2,0)).toDF("class","male","female")
    
    //Input Df
    
    +-----+----+------+
    |class|male|female|
    +-----+----+------+
    |    1|   2|     1|
    |    2|   0|     2|
    |    3|   2|     0|
    +-----+----+------+
    
      val getGenderUdf=udf((x:Int,y:Int)=>List.fill(x)("m")++List.fill(y)("f"))
      val df1=df.withColumn("gender",getGenderUdf(df.col("male"),df.col("female"))).drop("male","female").withColumn("gender",explode($"gender"))
      df1.show()
    
    +-----+------+
    |class|gender|
    +-----+------+
    |    1|     m|
    |    1|     m|
    |    1|     f|
    |    2|     f|
    |    2|     f|
    |    3|     m|
    |    3|     m|
    +-----+------+
    

    Reverse of df1

    val df2=df1.groupBy("class").pivot("gender").agg(count("gender")).na.fill(0).withColumnRenamed("m","male").withColumnRenamed("f","female")
    
      df2.show()
    
    //Sample Output: 
    
    +-----+------+----+
    |class|female|male|
    +-----+------+----+
    |    1|     1|   2|
    |    3|     0|   2|
    |    2|     2|   0|
    +-----+------+----+