I am trying to rename the columns of a data frame based on another dataframe. How can i achieve this using Scala?
Essentially my data looks like
DataFrame1
A B C D
1 2 3 4
I have another table that looks like this DataFrame2
Col1 Col2
A E
B Q
C R
D Z
I want to rename the columns of my first data frame with respect to other dataframe. so that expected output should look like this:
E Q R Z
1 2 3 4
I have tried the code using PySpark (copied from this answer by user8371915) and this is working fine:
name_dict = dataframe2.rdd.collectAsMap()
dataframe1.select([dataframe[c].alias(name_dict.get(c, c)) for c in dataframe1.columns]).show()
Now, how can i achieve this using Scala?
For spark 1.6 as required
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.functions.col
object ColumnNameChange {
def main(args: Array[String]): Unit = {
val spark = SparkSession
.builder()
.appName("SparkSessionExample")
.config("spark.master", "local")
.getOrCreate()
import spark.implicits._
val df1 = Seq((1, 2, 3, 4)).toDF("A","B","C","D")
val df2 = Seq(("A", "E"),("B","Q"), ("C", "R"),("D","Z")).toDF("Col1","Col2")
val name_dict : scala.collection.Map[String,String] = df2.map(row => { row.getAs[String]("Col1") -> row.getAs[String]("Col2") }).collectAsMap()
val df3 = df1.select(df1.columns.map(c => col(c).as(name_dict.getOrElse(c, c))): _*)
df3.show()
}
}