rapache-sparksparklyr

Sparklyr support for Spark 2.3.1


I downloaded the spark version 2.3.1 and i get the following error :

Error in spark_version_from_home(spark_home, default = spark_version) : 
Failed to detect version from SPARK_HOME or SPARK_HOME_VERSION. Try passing the spark version explicitly.

On using spark_available_versions() the last result is 2.3.0.

Is 2.3.1 not supported with sparklyR yet? Is there any way I can bypass this or use any other explicit code to get over this?


Solution

  • Well I am working on windows 7, first verify that the environment variables are defined. SPARK_HOME = c:\spark and Path = C:\spark\bin. Then check the following commands.

    Sys.getenv('SPARK_HOME')
     "C:\\spark"
    spark_version_from_home(Sys.getenv('SPARK_HOME'))
     "2.3.1"
    system('spark-submit --version')
    Welcome to
          ____              __
         / __/__  ___ _____/ /__
        _\ \/ _ \/ _ `/ __/  '_/
       /___/ .__/\_,_/_/ /_/\_\   version 2.3.1
          /_/
    
    Using Scala version 2.11.8, Java HotSpot(TM) 64-Bit Server VM, 1.8.0_151
    Branch 
    Compiled by user vanzin on 2018-06-01T20:37:04Z
    Revision 
    Url 
    Type --help for more information.
    

    Finally make the connection with spark.

    sc <- spark_connect(master = "local") # work
    

    Now as soon as your sparklyr question 0.8.4 supports spark 2.3.1, yes and no. Well in my case when using: (throws an error)

    sc <- spark_connect(master = "local", version = "2.3.1") #it does not work
    Error in spark_install_find(version, hadoop_version, latest = FALSE, hint = TRUE) : 
      Spark version not installed. To install, use spark_install(version = "2.3.1")
    

    If we verify the following dates, the last version of apache spark 2.3.1 was released (Jun 08 2018), while the latest update of sparklyr 0.8.4 was (May 25 2018) that is, it was launched a month earlier (spark 2.3.1 did not exist). Also when using the following commands:

    spark_install(version = "2.3.1")
    Error in spark_install_find(version, hadoop_version, installed_only = FALSE,:
    spark_available_versions()
       spark
    1  1.6.3
    2  1.6.2
    3  1.6.1
    4  1.6.0
    5  2.0.0
    6  2.0.1
    7  2.0.2
    8  2.1.0
    9  2.1.1
    10 2.2.0
    11 2.2.1
    12 2.3.0
    

    I think that the support solution to spark 2.3.1 completely (not partial as now) is to wait for the launch of sparklyr 0.9.0 or to communicate with maintenance manager package Javier Luraschi.