rapache-sparksparklyr

How to connect RStudio Cloud to Spark?


I am using RStudio Cloud and I want to connect to Spark using sparklyr package. I tried a local master and a yarn master. The code is as below.

library(sparklyr)
library(dplyr)
sc <- spark_connect(master = "local")
sc <- spark_connect(master = "yarn")
# Error in system2(file.path(spark_home, "bin", "spark-submit"), "--version", : error in running command

Neither worked. I don't know how to set up the Spark environments further. Any help would be much appreciated.


Solution

  • This could be a problem with the version of Spark.

    This works fine for me, on a new project on RStudio Cloud:

    install.packages("sparklyr")
    library(sparklyr)
    spark_install(version = "3.0.0")
    sc <- spark_connect(master = "local")
    

    enter image description here