scalaapache-sparkscala-ide

Rename column names of a dataframe with respect to another dataframe using scala


I am trying to rename the columns of a data frame based on another dataframe. How can i achieve this using Scala?

Essentially my data looks like

DataFrame1

A    B    C   D
1    2    3   4

I have another table that looks like this DataFrame2

Col1    Col2
A       E
B       Q
C       R
D       Z

I want to rename the columns of my first data frame with respect to other dataframe. so that expected output should look like this:

E    Q    R    Z
1    2    3    4

I have tried the code using PySpark (copied from this answer by user8371915) and this is working fine:

name_dict = dataframe2.rdd.collectAsMap()

dataframe1.select([dataframe[c].alias(name_dict.get(c, c)) for c in dataframe1.columns]).show()

Now, how can i achieve this using Scala?


Solution

  • For spark 1.6 as required

    import org.apache.spark.sql.SparkSession
    import org.apache.spark.sql.functions.col
    
    object ColumnNameChange {
      def main(args: Array[String]): Unit = {
    
        val spark = SparkSession
          .builder()
          .appName("SparkSessionExample")
          .config("spark.master", "local")
          .getOrCreate()
    
        import spark.implicits._
    
        val df1 = Seq((1, 2, 3, 4)).toDF("A","B","C","D")
        val df2 = Seq(("A", "E"),("B","Q"), ("C", "R"),("D","Z")).toDF("Col1","Col2")
    
    
        val name_dict : scala.collection.Map[String,String] = df2.map(row => { row.getAs[String]("Col1") -> row.getAs[String]("Col2") }).collectAsMap()
    
        val df3 = df1.select(df1.columns.map(c => col(c).as(name_dict.getOrElse(c, c))): _*)
        df3.show()
    
    
      }
    
    }