windowsgitscalaapache-sparkapache-spark-standalone

winutils spark windows installation env_variable


I am trying to install Spark 1.6.1 on windows 10 and so far I have done the following...

  1. Downloaded spark 1.6.1, unpacked to some directory and then set SPARK_HOME
  2. Downloaded scala 2.11.8, unpacked to some directory and then set SCALA_HOME
  3. Set the _JAVA_OPTION env variable
  4. Downloaded the winutils from https://github.com/steveloughran/winutils.git by just downloading the zip directory and then set HADOOP_HOME env variable. (Not sure if this was incorrect, I could not clone the directory because of permission denied).

When I go to spark home and run bin\spark-shell I get

'C:\Program' is not recognized as an internal or external command, operable program or batch file.

I must be missing something, I don't see how I could be running the bash scripts anyway from windows environment. But hopefully I don't need to understand just to get this working. I have been following this guy's tutorial - https://hernandezpaul.wordpress.com/2016/01/24/apache-spark-installation-on-windows-10/ . Any help would be appreciated.


Solution

  • You need to download the winutils executable, not source code.

    You can download it here, or if you really want the entire Hadoop distribution you can find the 2.6.0 binaries here. Then, you need to set HADOOP_HOME to the directory containing winutils.exe.

    Also, make sure the directory you place Spark in is a directory that doesn't contain whitespaces, this is extremely important otherwise it won't work.

    Once you've set it up, you don't start spark-shell.sh, you start spark-shell.cmd:

    C:\Spark\bin>spark-shell
    log4j:WARN No appenders could be found for logger (org.apache.hadoop.metrics2.lib.MutableMetricsFactory).
    log4j:WARN Please initialize the log4j system properly.
    log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
    Using Spark's repl log4j profile: org/apache/spark/log4j-defaults-repl.properties
    To adjust logging level use sc.setLogLevel("INFO")
    Welcome to
          ____              __
         / __/__  ___ _____/ /__
        _\ \/ _ \/ _ `/ __/  '_/
       /___/ .__/\_,_/_/ /_/\_\   version 1.6.1
          /_/
    
    Using Scala version 2.10.5 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_91)
    Type in expressions to have them evaluated.
    Type :help for more information.
    Spark context available as sc.
    16/05/18 19:31:56 WARN General: Plugin (Bundle) "org.datanucleus" is already registered. Ensure you dont have multiple JAR versions of the same plugin in the classpath. The URL "file:/C:/Spark/lib/datanucleus-core-3.2.10.jar" is already registered, and you are trying to register an identical plugin located at URL "file:/C:/Spark/bin/../lib/datanucleus-core-3.2.10.jar."
    16/05/18 19:31:56 WARN General: Plugin (Bundle) "org.datanucleus.api.jdo" is already registered. Ensure you dont have multiple JAR versions of the same plugin in the classpath. The URL "file:/C:/Spark/lib/datanucleus-api-jdo-3.2.6.jar" is already registered, and you are trying to register an identical plugin located at URL "file:/C:/Spark/bin/../lib/datanucleus-api-jdo-3.2.6.jar."
    16/05/18 19:31:56 WARN General: Plugin (Bundle) "org.datanucleus.store.rdbms" is already registered. Ensure you dont have multiple JAR versions of the same plugin in the classpath. The URL "file:/C:/Spark/lib/datanucleus-rdbms-3.2.9.jar" is already registered, and you are trying to register an identical plugin located at URL "file:/C:/Spark/bin/../lib/datanucleus-rdbms-3.2.9.jar."
    16/05/18 19:31:56 WARN Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)
    16/05/18 19:31:56 WARN Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)
    16/05/18 19:32:01 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
    16/05/18 19:32:01 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException
    16/05/18 19:32:07 WARN General: Plugin (Bundle) "org.datanucleus" is already registered. Ensure you dont have multiple JAR versions of the same plugin in the classpath. The URL "file:/C:/Spark/lib/datanucleus-core-3.2.10.jar" is already registered, and you are trying to register an identical plugin located at URL "file:/C:/Spark/bin/../lib/datanucleus-core-3.2.10.jar."
    16/05/18 19:32:07 WARN General: Plugin (Bundle) "org.datanucleus.api.jdo" is already registered. Ensure you dont have multiple JAR versions of the same plugin in the classpath. The URL "file:/C:/Spark/lib/datanucleus-api-jdo-3.2.6.jar" is already registered, and you are trying to register an identical plugin located at URL "file:/C:/Spark/bin/../lib/datanucleus-api-jdo-3.2.6.jar."
    16/05/18 19:32:07 WARN General: Plugin (Bundle) "org.datanucleus.store.rdbms" is already registered. Ensure you dont have multiple JAR versions of the same plugin in the classpath. The URL "file:/C:/Spark/lib/datanucleus-rdbms-3.2.9.jar" is already registered, and you are trying to register an identical plugin located at URL "file:/C:/Spark/bin/../lib/datanucleus-rdbms-3.2.9.jar."
    16/05/18 19:32:07 WARN Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)
    16/05/18 19:32:08 WARN Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)
    16/05/18 19:32:12 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
    16/05/18 19:32:12 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException
    SQL context available as sqlContext.
    
    scala>