I found some tips about converting a pyspark dataframe to R, but I need to perform the opposite task: convert a R dataframe to pyspark
Anyone knows how to do it?
You can use the same approach as for other languages - use createOrReplaceTempView function to register your dataframe, and then use spark.sql
from another language to access its content.
For example. If R side looks as following:
%r
library(SparkR)
id <- c(rep(1, 3), rep(2, 3), 3)
desc <- c('New', 'New', 'Good', 'New', 'Good', 'Good', 'New')
df <- data.frame(id, desc)
df <- createDataFrame(df)
createOrReplaceTempView(df, "test_df")
head(df)
id desc
1 1 New
2 1 New
3 1 Good
4 2 New
5 2 Good
6 2 Good
then you can access these data from Python:
df = spark.sql("select * from test_df")
df.show()
+---+----+
| id|desc|
+---+----+
|1.0| New|
|1.0| New|
|1.0|Good|
|2.0| New|
|2.0|Good|
|2.0|Good|
|3.0| New|
+---+----+