i'm having some problems with custom classes in giraph. I made a VertexInput and Output format, but i always getting the following error:
java.io.IOException: ensureRemaining: Only * bytes remaining, trying to read *
with different values where the "*" are placed.
This was tested on a Single Node Cluster.
This problem happen when a vertexIterator do next(), and there aren't any more vertex left. This iterator it's invocated from a flush method, but i don't understand, basically, why the "next()" method is failing. Here are some logs and classes...
My log is the following:
15/09/08 00:52:21 INFO bsp.BspService: BspService: Connecting to ZooKeeper with job giraph_yarn_application_1441683854213_0001, 1 on localhost:22181
15/09/08 00:52:21 INFO zookeeper.ZooKeeper: Client environment:zookeeper.version=3.4.5-1392090, built on 09/30/2012 17:52 GMT
15/09/08 00:52:21 INFO zookeeper.ZooKeeper: Client environment:host.name=localhost
15/09/08 00:52:21 INFO zookeeper.ZooKeeper: Client environment:java.version=1.7.0_79
15/09/08 00:52:21 INFO zookeeper.ZooKeeper: Client environment:java.vendor=Oracle Corporation
15/09/08 00:52:21 INFO zookeeper.ZooKeeper: Client environment:java.home=/usr/lib/jvm/java-7-openjdk-amd64/jre
15/09/08 00:52:21 INFO zookeeper.ZooKeeper: Client environment:java.class.path=.:${CLASSPATH}:./**/
15/09/08 00:52:21 INFO zookeeper.ZooKeeper: Client environment:java.library.path=/usr/java/packages/lib/amd64:/usr/lib/x86_64-linux-gnu/jni:/lib/x86_64-linux-gnu:/usr/lib/x86_64-linux-gnu:/usr/lib/jni:/lib:/usr/l$
15/09/08 00:52:21 INFO zookeeper.ZooKeeper: Client environment:java.io.tmpdir=/tmp
15/09/08 00:52:21 INFO zookeeper.ZooKeeper: Client environment:java.compiler=<NA>
15/09/08 00:52:21 INFO zookeeper.ZooKeeper: Client environment:os.name=Linux
15/09/08 00:52:21 INFO zookeeper.ZooKeeper: Client environment:os.arch=amd64
15/09/08 00:52:21 INFO zookeeper.ZooKeeper: Client environment:os.version=3.13.0-62-generic
15/09/08 00:52:21 INFO zookeeper.ZooKeeper: Client environment:user.name=hduser
15/09/08 00:52:21 INFO zookeeper.ZooKeeper: Client environment:user.home=/home/hduser
15/09/08 00:52:21 INFO zookeeper.ZooKeeper: Client environment:user.dir=/app/hadoop/tmp/nm-local-dir/usercache/hduser/appcache/application_1441683854213_0001/container_1441683854213_0001_01_000003
15/09/08 00:52:21 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=localhost:22181 sessionTimeout=60000 watcher=org.apache.giraph.worker.BspServiceWorker@4256d3a0
15/09/08 00:52:21 INFO zookeeper.ClientCnxn: Opening socket connection to server localhost/127.0.0.1:22181. Will not attempt to authenticate using SASL (unknown error)
15/09/08 00:52:21 INFO zookeeper.ClientCnxn: Socket connection established to localhost/127.0.0.1:22181, initiating session
15/09/08 00:52:21 INFO zookeeper.ClientCnxn: Session establishment complete on server localhost/127.0.0.1:22181, sessionid = 0x14fab0de0bb0002, negotiated timeout = 40000
15/09/08 00:52:21 INFO bsp.BspService: process: Asynchronous connection complete.
15/09/08 00:52:21 INFO netty.NettyServer: NettyServer: Using execution group with 8 threads for requestFrameDecoder.
15/09/08 00:52:21 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
15/09/08 00:52:21 INFO netty.NettyServer: start: Started server communication server: localhost/127.0.0.1:30001 with up to 16 threads on bind attempt 0 with sendBufferSize = 32768 receiveBufferSize = 524288
15/09/08 00:52:21 INFO netty.NettyClient: NettyClient: Using execution handler with 8 threads after request-encoder.
15/09/08 00:52:21 INFO graph.GraphTaskManager: setup: Registering health of this worker...
15/09/08 00:52:21 INFO yarn.GiraphYarnTask: [STATUS: task-1] WORKER_ONLY starting...
15/09/08 00:52:22 INFO bsp.BspService: getJobState: Job state already exists (/_hadoopBsp/giraph_yarn_application_1441683854213_0001/_masterJobState)
15/09/08 00:52:22 INFO bsp.BspService: getApplicationAttempt: Node /_hadoopBsp/giraph_yarn_application_1441683854213_0001/_applicationAttemptsDir already exists!
15/09/08 00:52:22 INFO bsp.BspService: getApplicationAttempt: Node /_hadoopBsp/giraph_yarn_application_1441683854213_0001/_applicationAttemptsDir already exists!
15/09/08 00:52:22 INFO worker.BspServiceWorker: registerHealth: Created my health node for attempt=0, superstep=-1 with /_hadoopBsp/giraph_yarn_application_1441683854213_0001/_applicationAttemptsDir/0/_superstepD$
15/09/08 00:52:22 INFO netty.NettyServer: start: Using Netty without authentication.
15/09/08 00:52:22 INFO bsp.BspService: process: partitionAssignmentsReadyChanged (partitions are assigned)
15/09/08 00:52:22 INFO worker.BspServiceWorker: startSuperstep: Master(hostname=localhost, MRtaskID=0, port=30000)
15/09/08 00:52:22 INFO worker.BspServiceWorker: startSuperstep: Ready for computation on superstep -1 since worker selection and vertex range assignments are done in /_hadoopBsp/giraph_yarn_application_1441683854$
15/09/08 00:52:22 INFO yarn.GiraphYarnTask: [STATUS: task-1] startSuperstep: WORKER_ONLY - Attempt=0, Superstep=-1
15/09/08 00:52:22 INFO netty.NettyClient: Using Netty without authentication.
15/09/08 00:52:22 INFO netty.NettyClient: Using Netty without authentication.
15/09/08 00:52:22 INFO netty.NettyClient: connectAllAddresses: Successfully added 2 connections, (2 total connected) 0 failed, 0 failures total.
15/09/08 00:52:22 INFO netty.NettyServer: start: Using Netty without authentication.
15/09/08 00:52:22 INFO handler.RequestDecoder: decode: Server window metrics MBytes/sec received = 0, MBytesReceived = 0.0001, ave received req MBytes = 0.0001, secs waited = 1.44168435E9
15/09/08 00:52:22 INFO worker.BspServiceWorker: loadInputSplits: Using 1 thread(s), originally 1 threads(s) for 1 total splits.
15/09/08 00:52:22 INFO worker.InputSplitsHandler: reserveInputSplit: Reserved input split path /_hadoopBsp/giraph_yarn_application_1441683854213_0001/_vertexInputSplitDir/0, overall roughly 0.0% input splits rese$
15/09/08 00:52:22 INFO worker.InputSplitsCallable: getInputSplit: Reserved /_hadoopBsp/giraph_yarn_application_1441683854213_0001/_vertexInputSplitDir/0 from ZooKeeper and got input split 'hdfs://hdnode01:54310/u$
15/09/08 00:52:22 INFO worker.InputSplitsCallable: loadFromInputSplit: Finished loading /_hadoopBsp/giraph_yarn_application_1441683854213_0001/_vertexInputSplitDir/0 (v=6, e=10)
15/09/08 00:52:22 INFO worker.InputSplitsCallable: call: Loaded 1 input splits in 0.16241108 secs, (v=6, e=10) 36.94329 vertices/sec, 61.572155 edges/sec
15/09/08 00:52:22 ERROR utils.LogStacktraceCallable: Execution of callable failed
java.lang.IllegalStateException: next: IOException
at org.apache.giraph.utils.VertexIterator.next(VertexIterator.java:101)
at org.apache.giraph.partition.BasicPartition.addPartitionVertices(BasicPartition.java:99)
at org.apache.giraph.comm.requests.SendWorkerVerticesRequest.doRequest(SendWorkerVerticesRequest.java:115)
at org.apache.giraph.comm.netty.NettyWorkerClientRequestProcessor.doRequest(NettyWorkerClientRequestProcessor.java:466)
at org.apache.giraph.comm.netty.NettyWorkerClientRequestProcessor.flush(NettyWorkerClientRequestProcessor.java:412)
at org.apache.giraph.worker.InputSplitsCallable.call(InputSplitsCallable.java:241)
at org.apache.giraph.worker.InputSplitsCallable.call(InputSplitsCallable.java:60)
at org.apache.giraph.utils.LogStacktraceCallable.call(LogStacktraceCallable.java:51)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: ensureRemaining: Only 0 bytes remaining, trying to read 1
at org.apache.giraph.utils.UnsafeReads.ensureRemaining(UnsafeReads.java:77)
at org.apache.giraph.utils.UnsafeArrayReads.readByte(UnsafeArrayReads.java:123)
at org.apache.giraph.utils.UnsafeReads.readLine(UnsafeReads.java:100)
at pruebas.TextAndDoubleComplexWritable.readFields(TextAndDoubleComplexWritable.java:37)
at org.apache.giraph.utils.WritableUtils.reinitializeVertexFromDataInput(WritableUtils.java:540)
at org.apache.giraph.utils.VertexIterator.next(VertexIterator.java:98)
... 11 more
15/09/08 00:52:22 ERROR worker.BspServiceWorker: unregisterHealth: Got failure, unregistering health on /_hadoopBsp/giraph_yarn_application_1441683854213_0001/_applicationAttemptsDir/0/_superstepDir/-1/_workerHea$
15/09/08 00:52:22 ERROR yarn.GiraphYarnTask: GiraphYarnTask threw a top-level exception, failing task
java.lang.RuntimeException: run: Caught an unrecoverable exception waitFor: ExecutionException occurred while waiting for org.apache.giraph.utils.ProgressableUtils$FutureWaitable@4bbf48f0
at org.apache.giraph.yarn.GiraphYarnTask.run(GiraphYarnTask.java:104)
at org.apache.giraph.yarn.GiraphYarnTask.main(GiraphYarnTask.java:183)
Caused by: java.lang.IllegalStateException: waitFor: ExecutionException occurred while waiting for org.apache.giraph.utils.ProgressableUtils$FutureWaitable@4bbf48f0
at org.apache.giraph.utils.ProgressableUtils.waitFor(ProgressableUtils.java:193)
at org.apache.giraph.utils.ProgressableUtils.waitForever(ProgressableUtils.java:151)
at org.apache.giraph.utils.ProgressableUtils.waitForever(ProgressableUtils.java:136)
at org.apache.giraph.utils.ProgressableUtils.getFutureResult(ProgressableUtils.java:99)
at org.apache.giraph.utils.ProgressableUtils.getResultsWithNCallables(ProgressableUtils.java:233)
at org.apache.giraph.worker.BspServiceWorker.loadInputSplits(BspServiceWorker.java:316)
at org.apache.giraph.worker.BspServiceWorker.loadVertices(BspServiceWorker.java:409)
at org.apache.giraph.worker.BspServiceWorker.setup(BspServiceWorker.java:629)
at org.apache.giraph.graph.GraphTaskManager.execute(GraphTaskManager.java:284)
at org.apache.giraph.yarn.GiraphYarnTask.run(GiraphYarnTask.java:92)
... 1 more
Caused by: java.util.concurrent.ExecutionException: java.lang.IllegalStateException: next: IOException
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.util.concurrent.FutureTask.get(FutureTask.java:202)
at org.apache.giraph.utils.ProgressableUtils$FutureWaitable.waitFor(ProgressableUtils.java:312)
at org.apache.giraph.utils.ProgressableUtils.waitFor(ProgressableUtils.java:185)
... 10 more
Caused by: java.lang.IllegalStateException: next: IOException
at org.apache.giraph.utils.VertexIterator.next(VertexIterator.java:101)
at org.apache.giraph.partition.BasicPartition.addPartitionVertices(BasicPartition.java:99)
at org.apache.giraph.comm.requests.SendWorkerVerticesRequest.doRequest(SendWorkerVerticesRequest.java:115)
at org.apache.giraph.comm.netty.NettyWorkerClientRequestProcessor.doRequest(NettyWorkerClientRequestProcessor.java:466)
at org.apache.giraph.comm.netty.NettyWorkerClientRequestProcessor.flush(NettyWorkerClientRequestProcessor.java:412)
at org.apache.giraph.worker.InputSplitsCallable.call(InputSplitsCallable.java:241)
at org.apache.giraph.worker.InputSplitsCallable.call(InputSplitsCallable.java:60)
at org.apache.giraph.utils.LogStacktraceCallable.call(LogStacktraceCallable.java:51)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: ensureRemaining: Only 0 bytes remaining, trying to read 1
at org.apache.giraph.utils.UnsafeReads.ensureRemaining(UnsafeReads.java:77)
at org.apache.giraph.utils.UnsafeArrayReads.readByte(UnsafeArrayReads.java:123)
at org.apache.giraph.utils.UnsafeReads.readLine(UnsafeReads.java:100)
at pruebas.TextAndDoubleComplexWritable.readFields(TextAndDoubleComplexWritable.java:37)
at org.apache.giraph.utils.WritableUtils.reinitializeVertexFromDataInput(WritableUtils.java:540)
at org.apache.giraph.utils.VertexIterator.next(VertexIterator.java:98)
... 11 more
My Input format:
package pruebas;
import org.apache.giraph.edge.Edge;
import org.apache.giraph.edge.EdgeFactory;
import org.apache.giraph.io.formats.AdjacencyListTextVertexInputFormat;
import org.apache.hadoop.io.DoubleWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.InputSplit;
import org.apache.hadoop.mapreduce.TaskAttemptContext;
/**
* @author hduser
*
*/
public class IdTextWithComplexValueInputFormat
extends
AdjacencyListTextVertexInputFormat<Text, TextAndDoubleComplexWritable, DoubleWritable> {
@Override
public AdjacencyListTextVertexReader createVertexReader(InputSplit split,
TaskAttemptContext context) {
return new TextComplexValueDoubleAdjacencyListVertexReader();
}
protected class TextComplexValueDoubleAdjacencyListVertexReader extends
AdjacencyListTextVertexReader {
/**
* Constructor with
* {@link AdjacencyListTextVertexInputFormat.LineSanitizer}.
*
* @param lineSanitizer
* the sanitizer to use for reading
*/
public TextComplexValueDoubleAdjacencyListVertexReader() {
super();
}
@Override
public Text decodeId(String s) {
return new Text(s);
}
@Override
public TextAndDoubleComplexWritable decodeValue(String s) {
TextAndDoubleComplexWritable valorComplejo = new TextAndDoubleComplexWritable();
valorComplejo.setVertexData(Double.valueOf(s));
valorComplejo.setIds_vertices_anteriores("");
return valorComplejo;
}
@Override
public Edge<Text, DoubleWritable> decodeEdge(String s1, String s2) {
return EdgeFactory.create(new Text(s1),
new DoubleWritable(Double.valueOf(s2)));
}
}
}
TextAndDoubleComplexWritable:
package pruebas;
import java.io.DataInput;
import java.io.DataOutput;
import java.io.IOException;
import org.apache.hadoop.io.Writable;
public class TextAndDoubleComplexWritable implements Writable {
private String idsVerticesAnteriores;
private double vertexData;
public TextAndDoubleComplexWritable() {
super();
this.idsVerticesAnteriores = "";
}
public TextAndDoubleComplexWritable(double vertexData) {
super();
this.vertexData = vertexData;
}
public TextAndDoubleComplexWritable(String ids_vertices_anteriores,
double vertexData) {
super();
this.idsVerticesAnteriores = ids_vertices_anteriores;
this.vertexData = vertexData;
}
public void write(DataOutput out) throws IOException {
out.writeUTF(idsVerticesAnteriores);
}
public void readFields(DataInput in) throws IOException {
idsVerticesAnteriores = in.readLine();
}
public String getIds_vertices_anteriores() {
return idsVerticesAnteriores;
}
public void setIds_vertices_anteriores(String ids_vertices_anteriores) {
this.idsVerticesAnteriores = ids_vertices_anteriores;
}
public double getVertexData() {
return vertexData;
}
public void setVertexData(double vertexData) {
this.vertexData = vertexData;
}
}
My input file:
Portada 0.0 Sugerencias 1.0
Sugerencias 3.0 Portada 1.0
and i execute it with this command:
$HADOOP_HOME/bin/yarn jar $GIRAPH_HOME/giraph-examples/target/giraph-examples-1.1.0-for-hadoop-2.4.0-jar-with-dependencies.jar org.apache.giraph.GiraphRunner lectura_de_grafo.BusquedaDeCaminosNavegacionalesWikiquote -vif pruebas.IdTextWithComplexValueInputFormat -vip /user/hduser/input/wiki-graph-chiquito.txt -op /user/hduser/output/caminosNavegacionales -w 2 -yh 250
Any help would be appreciated!
UPDATE: My input file was wrong. Giraph (or my example of it) doesn't handle very well outgoing to non listed vertex.
But the problem still happen. I updated the file data on my original question .
UPDATE 2: The OutputFormat it's not used, and the algorithm for computation is never executed either. I remove both for helping to clarify the question.
Update 3, 19/11/2015:
The problem wasn't in the input format, the input format worked well and read the data entirely.
The problem was in the class TextAndDoubleComplexWritable
, i add it to my original question, for a better explanation of the final solution for this (i added an answer too).
The problem was in the class TextAndDoubleComplexWritable. I wasn't aware of the importance of methods readFields
and write
when we are implementing the Writable
interface. This are crucial because are the methods that let us send and receive information in giraph. I was writing a empty string in the readFields
method, and i should use that method for writing both values of my vertex. I updated both methods in the following way:
public void write(DataOutput out) throws IOException {
out.writeDouble(this.vertexData);
out.writeUTF(this.idsVerticesAnteriores != "" ? "hola"
: this.idsVerticesAnteriores);
}
public void readFields(DataInput in) throws IOException {
this.vertexData = in.readDouble();
this.idsVerticesAnteriores = in.readUTF();
// idsVerticesAnteriores = in.readLine();
}
and this is working, finally!!