scalaazure-data-lakeazure-databricks

How do I rename the file that was saved on a datalake in Azure


I tried to merge two files in a Datalake using scala in data bricks and saved it back to the Datalake using the following code:

val df =sqlContext.read.format("com.databricks.spark.csv").option("header", "true").option("inferSchema", "true").load("adl://xxxxxxxx/Test/CSV") 
df.coalesce(1).write.
              format("com.databricks.spark.csv").
              mode("overwrite").
              option("header", "true").
save("adl://xxxxxxxx/Test/CSV/final_data.csv")

However the file final_data.csv is saved as a directory instead of a file with multiple files and the actual .csv file is saved as 'part-00000-tid-dddddddddd-xxxxxxxxxx.csv'.

How do I rename this file so that I can move it to another directory?


Solution

  • Got it. It can be renamed and placed into another destination using the following code. Also current files that were merged will be deleted.

    val x = "Source"
    val y = "Destination"
    val df = sqlContext.read.format("csv")
            .option("header", "true").option("inferSchema", "true")
            .load(x+"/")
    df.repartition(1).write.
       format("csv").
       mode("overwrite").
       option("header", "true").
       save(y+"/"+"final_data.csv")
    dbutils.fs.ls(x).filter(file=>file.name.endsWith("csv")).foreach(f => dbutils.fs.rm(f.path,true))
    dbutils.fs.mv(dbutils.fs.ls(y+"/"+"final_data.csv").filter(file=>file.name.startsWith("part-00000"))(0).path,y+"/"+"data.csv")
    dbutils.fs.rm(y+"/"+"final_data.csv",true)