Is there a way to convert a Spark DF (not RDD) to a Pandas DF?
I tried the following:
var some_df = Seq(
("A", "no"),
("B", "yes"),
("B", "yes"),
("B", "no")
).toDF(
"user_id", "phone_number")
Code:
%pyspark
pandas_df = some_df.toPandas()
Error:
NameError: name 'some_df' is not defined
Any suggestions.
following should work
Sample DataFrame
some_df = sc.parallelize([
("A", "no"),
("B", "yes"),
("B", "yes"),
("B", "no")]
).toDF(["user_id", "phone_number"])
Converting DataFrame to Pandas DataFrame
pandas_df = some_df.toPandas()