Spark Dataframe :How to add a index Column : Aka Distributed Data Index

I read data from a csv file ,but don't have index.

I want to add a column from 1 to row's number.

What should I do,Thanks (scala)


  • With Scala you can use:

    import org.apache.spark.sql.functions._ 

    You can refer to this exemple and scala docs.

    With Pyspark you can use:

    from pyspark.sql.functions import monotonically_increasing_id 
    df_index ="*").withColumn("id", monotonically_increasing_id())