hdfsspark-streamingspark-checkpoint

Checkpoint stream data to HDFS clulster


I have a HDFS cluster , and it has got two NameNodes. Usually if a use a HDFS client to save data, it takes care of which NameNode to use if one of these is down.

But in Spark, for checkpointing, the API is: StreamingCOntext.checkpoint("hdfs://100.90.100.11:9000/sparkData").

Here i can only specify one of the NameNode, and if that goes down , Spark has no itelligence to switch to second one.

Can anyone help me here?

Is there a way, Spark can understand the "hdfs-site.xml" (which has the information of both the namenodes) if i place this XML in the classpath.


Solution

  • Ok, i found the answer. You can use below syntax to add resources like core-site.xml, hdfs-site.xml etc:

    SparkContext.hadoopConfiguration().addResource(ABC.class.getClassLoader().getResource("core-site.xml")); SparkContext.hadoopConfiguration().addResource(ABC.class.getClassLoader().getResource("hdfs-site.xml"));