This is a question about spring-xd release 1.0.1 working together with maprfs, which is officially not yet supported. Still I would like to get it to work.
So this is what we did:
1) adjusted the xd-shell and xd-worker and xd-singlenode shell scripts to accept the parameter --hadoopDistro mapr
2) added libraries to the new directory $XD_HOME/lib/mapr
avro-1.7.4.jar jersey-core-1.9.jar
hadoop-annotations-2.2.0.jar jersey-server-1.9.jar
hadoop-core-1.0.3-mapr-3.0.2.jar jetty-util-6.1.26.jar
hadoop-distcp-2.2.0.jar maprfs-1.0.3-mapr-3.0.2.jar
hadoop-hdfs-2.2.0.jar protobuf-java-2.5.0.jar
hadoop-mapreduce-client-core-2.2.0.jar spring-data-hadoop-2.0.2.RELEASE-hadoop24.jar
hadoop-streaming-2.2.0.jar spring-data-hadoop-batch-2.0.2.RELEASE-hadoop24.jar
hadoop-yarn-api-2.2.0.jar spring-data-hadoop-core-2.0.2.RELEASE-hadoop24.jar
hadoop-yarn-common-2.2.0.jar spring-data-hadoop-store-2.0.2.RELEASE-hadoop24.jar
3) run bin/xd-singlenode --hadoopDistro mapr
and shell/bin/xd-shell --hadoopDistro mapr
.
When creating and deploying a stream via stream create foo --definition "time | hdfs" --deploy
, data is written to a file tmp/xd/foo/foo-1.txt.tmp on maprfs. Yet when undeploying the stream, the following exceptions appears:
org.springframework.data.hadoop.store.StoreException: Failed renaming from /xd/foo/foo-1.txt.tmp to /xd/foo/foo-1.txt; nested exception is java.io.FileNotFoundException: Requested file /xd/foo/foo-1.txt does not exist.
at org.springframework.data.hadoop.store.support.OutputStoreObjectSupport.renameFile(OutputStoreObjectSupport.java:261)
at org.springframework.data.hadoop.store.output.TextFileWriter.close(TextFileWriter.java:92)
at org.springframework.xd.integration.hadoop.outbound.HdfsDataStoreMessageHandler.doStop(HdfsDataStoreMessageHandler.java:58)
at org.springframework.xd.integration.hadoop.outbound.HdfsStoreMessageHandler.stop(HdfsStoreMessageHandler.java:94)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:317)
at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:201)
at com.sun.proxy.$Proxy120.stop(Unknown Source)
at org.springframework.integration.endpoint.EventDrivenConsumer.doStop(EventDrivenConsumer.java:64)
at org.springframework.integration.endpoint.AbstractEndpoint.stop(AbstractEndpoint.java:100)
at org.springframework.integration.endpoint.AbstractEndpoint.stop(AbstractEndpoint.java:115)
at org.springframework.integration.config.ConsumerEndpointFactoryBean.stop(ConsumerEndpointFactoryBean.java:303)
at org.springframework.context.support.DefaultLifecycleProcessor.doStop(DefaultLifecycleProcessor.java:229)
at org.springframework.context.support.DefaultLifecycleProcessor.access$300(DefaultLifecycleProcessor.java:51)
at org.springframework.context.support.DefaultLifecycleProcessor$LifecycleGroup.stop(DefaultLifecycleProcessor.java:363)
at org.springframework.context.support.DefaultLifecycleProcessor.stopBeans(DefaultLifecycleProcessor.java:202)
at org.springframework.context.support.DefaultLifecycleProcessor.stop(DefaultLifecycleProcessor.java:106)
at org.springframework.context.support.AbstractApplicationContext.stop(AbstractApplicationContext.java:1186)
at org.springframework.xd.module.core.SimpleModule.stop(SimpleModule.java:234)
at org.springframework.xd.dirt.module.ModuleDeployer.destroyModule(ModuleDeployer.java:132)
at org.springframework.xd.dirt.module.ModuleDeployer.handleUndeploy(ModuleDeployer.java:111)
at org.springframework.xd.dirt.module.ModuleDeployer.undeploy(ModuleDeployer.java:83)
at org.springframework.xd.dirt.server.ContainerRegistrar.undeployModule(ContainerRegistrar.java:261)
at org.springframework.xd.dirt.server.ContainerRegistrar$StreamModuleWatcher.process(ContainerRegistrar.java:884)
at org.apache.curator.framework.imps.NamespaceWatcher.process(NamespaceWatcher.java:67)
at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:522)
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
Caused by: java.io.FileNotFoundException: Requested file /xd/foo/foo-1.txt does not exist.
at com.mapr.fs.MapRFileSystem.getMapRFileStatus(MapRFileSystem.java:805)
at com.mapr.fs.MapRFileSystem.delete(MapRFileSystem.java:629)
at org.springframework.data.hadoop.store.support.OutputStoreObjectSupport.renameFile(OutputStoreObjectSupport.java:258)
... 29 more
I had a look at the OutputStoreObjectSupport.renameFile()
function. When a file on hdfs is finished, this method tries to rename the file /xd/foo/foo-1.txt.tmp to xd/foo/foo1.txt. This is the relevant code:
try {
FileSystem fs = path.getFileSystem(getConfiguration());
boolean succeed;
try {
fs.delete(toPath, false);
log.info("Renaming path=[" + path + "] toPath=[" + toPath + "]");
succeed = fs.rename(path, toPath);
} catch (Exception e) {
throw new StoreException("Failed renaming from " + path + " to " + toPath, e);
}
if (!succeed) {
throw new StoreException("Failed renaming from " + path + " to " + toPath + " because hdfs returned false");
}
}
When the target file does not exist on hdfs, maprfs seems to throw an exception when fs.delete(toPath, false)
is called. Yet throwing an exception in this case does not make sense. I assume that other Filesystem implementations behave differently, but this is a point I still need to verify. Unfortuntately I cannot find the sources for MapRFileSystem.java. Is this closed source? This would help me to better understand the issue. Has anybody experience with writing from spring-xd to maprfs? Or renaming files on maprfs with spring-data-hadoop?
I managed to reproduce the issue outside of spring XD with a simple test case (see below). Note that this exception is only thrown if the inWritingSuffix or the inWritingPrefix is set. Otherwise spring-hadoop will not attempt to rename the file. So this is the still somehow unsatisfactory workaround for me: refrain from using inWritingPrefixes and inWritingSuffixes.
@ContextConfiguration("context.xml")
@RunWith(SpringJUnit4ClassRunner.class)
public class MaprfsSinkTest {
@Autowired
Configuration configuration;
@Autowired
FileSystem filesystem;
@Autowired
DataStoreWriter<String >storeWriter;
@Test
public void testRenameOnMaprfs() throws IOException, InterruptedException {
Path testPath = new Path("/tmp/foo.txt");
filesystem.delete(testPath, true);
TextFileWriter writer = new TextFileWriter(configuration, testPath, null);
writer.setInWritingSuffix("tmp");
writer.write("some entity");
writer.close();
}
@Test
public void testStoreWriter() throws IOException {
this.storeWriter.write("something");
}
}
I created a new branch for spring-hadoop which supports maprfs:
https://github.com/blinse/spring-hadoop/tree/origin/2.0.2.RELEASE-mapr
Building this release and using the resulting jar works fine with the hdfs sink.