I'm trying to build the Apache Crunch source code on my CentOS 7 machine, but am getting the following error in the crunch-spark
project when I execute mvn package
[ERROR] /home/bwatson/programming/git/crunch/crunch-spark/src/it/scala/org/apache/crunch/scrunch/spark/PageRankClassTest.scala:71: error: bad symbolic reference. A signature in PTypeH.class refers to term protobuf
[ERROR] in package com.google which is not available.
[ERROR] It may be completely missing from the current classpath, or the version on
[ERROR] the classpath might be incompatible with the version used when compiling PTypeH.class.
[ERROR] .map(line => { val urls = line.split("\\t"); (urls(0), urls(1)) })
Other SO questions about similar errors (here and here) seem to involve PATH
or version issues. I've been messing around but can't seem to resolve them. For completeness:
[bwatson@ben-pc crunch]$ scala -version
Scala code runner version 2.11.5 -- Copyright 2002-2013, LAMP/EPFL
[bwatson@ben-pc crunch]$ java -version
java version "1.8.0_31"
Java(TM) SE Runtime Environment (build 1.8.0_31-b13)
Java HotSpot(TM) 64-Bit Server VM (build 25.31-b07, mixed mode)
[bwatson@ben-pc crunch]$ mvn -version
Apache Maven 3.0.5 (Red Hat 3.0.5-16)
Maven home: /usr/share/maven
Java version: 1.8.0_31, vendor: Oracle Corporation
Java home: /usr/java/jdk1.8.0_31/jre
Default locale: en_GB, platform encoding: UTF-8
OS name: "linux", version: "3.10.0-123.20.1.el7.x86_64", arch: "amd64", family: "unix"
Any advice? I'm not really sure where Scala is looking for its dependencies, but I'd have thought that Maven would take care of it.
It turns out the official documentation for Crunch was missing a Maven parameter. The issue was solved by building using:
mvn package -Dcrunch.platform=2