I have a dataframe 'df', and I want to add an 'Ident' numeric column where the values are continuous. I tried with monotonically_increasing_id() but the values are not continuous. As it description says: "The generated ID is guaranteed to be monotonically increasing and unique, but not consecutive. "
So, my question is, how could I do that?
You could try something like this,
df = df.rdd.zipWithIndex().map(lambda x: [x[1]] + [y for y in x[0]]).toDF(['Ident']+df.columns)
This will give you first column as your identifier which will have consecutive values starting from 0 to N-1, where N is total number of records in df.