apache-sparkpysparkapache-spark-sqlspark-avro

Installing Apache Spark Packages to run Locally


I am looking for a clear guide or steps to installing Spark packages (specifically spark-avro) to run locally and correctly using them with spark-submit command.

I've spent a lot of time reading many posts and guides, but still not able to get spark-submit to use the locally deployed spark-avro package. Hence, if someone has already accomplished this with spark-avro or another package, please share your wisdom :)

All the existing documentation I found is a bit unclear.

Clear steps and examples would be much appreciated! P.S. I know Python/PySpark/SQL, but not much Java (yet) ...

Michael


Solution

  • In spark-submit command itself you can pass avro package details (make sure avro and spark version support)

    spark-submit --packages org.apache.spark:spark-avro_<required_version>:<spark_version>
    

    Example,

    spark-submit --packages org.apache.spark:spark-avro_2.11:2.4.0
    

    same way you can pass it along with spark-shell command as well to work on avro files.