[SOLVED] Checkpoint stream data to HDFS clulster

Checkpoint stream data to HDFS clulster

I have a HDFS cluster , and it has got two NameNodes. Usually if a use a HDFS client to save data, it takes care of which NameNode to use if one of these is down.

But in Spark, for checkpointing, the API is: StreamingCOntext.checkpoint("hdfs://100.90.100.11:9000/sparkData").

Here i can only specify one of the NameNode, and if that goes down , Spark has no itelligence to switch to second one.

Can anyone help me here?

Is there a way, Spark can understand the "hdfs-site.xml" (which has the information of both the namenodes) if i place this XML in the classpath.

Solution

Ok, i found the answer. You can use below syntax to add resources like core-site.xml, hdfs-site.xml etc:

SparkContext.hadoopConfiguration().addResource(ABC.class.getClassLoader().getResource("core-site.xml")); SparkContext.hadoopConfiguration().addResource(ABC.class.getClassLoader().getResource("hdfs-site.xml"));