hadoopamazon-s3apache-sparkmapreducemesos

Can Apache Spark run without Hadoop?


Are there any dependencies between Spark and Hadoop?

If not, are there any features I'll miss when I run Spark without Hadoop?


Solution

  • Spark can run without Hadoop but some of its functionality relies on Hadoop's code (e.g. handling of Parquet files). We're running Spark on Mesos and S3 which was a little tricky to set up but works really well once done (you can read a summary of what needed to properly set it here).

    (Edit) Note: since version 2.3.0 Spark also added native support for Kubernetes