hadoopflumeflume-ng

HDFS IO error (hadoop 2.8) with Flume


I am getting the below error when I try to get a streaming data into hadoop through Flume.

I have created link in flume/lib that point to the .jar files in hadoop/share/hadoop/

I double checked the URL and I think they are all correct. Thought of posting to get some more eyes and some feedback.

      2017-07-20 10:53:18,959 (SinkRunner-PollingRunner-DefaultSinkProcessor) [WARN -org.apache.flume.sink.hdfs.HDFSEventSink.process HDFSEventSink.java:455)] HDFS IO error
      java.io.IOException: No FileSystem for scheme: hdfs
        at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2798)
        at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2809)
        at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:100)
        at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2848)
        at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2830)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:389)
        at org.apache.hadoop.fs.Path.getFileSystem(Path.java:356)
        at org.apache.flume.sink.hdfs.BucketWriter$1.call(BucketWriter.java:243)
        at org.apache.flume.sink.hdfs.BucketWriter$1.call(BucketWriter.java:235)
        at org.apache.flume.sink.hdfs.BucketWriter$9$1.run(BucketWriter.java:679)
        at org.apache.flume.auth.SimpleAuthenticator.execute(SimpleAuthenticator.java:50)
        at org.apache.flume.sink.hdfs.BucketWriter$9.call(BucketWriter.java:676)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:748)

Here is the Flume Sink Config

agent1.sinks.PurePathSink.type = hdfs
agent1.sinks.PurePathSink.hdfs.path = hdfs://127.0.0.1:9000/User/bts/pp 
agent1.sinks.PurePathSink.hdfs.fileType = DataStream
agent1.sinks.PurePathSink.hdfs.filePrefix = export
agent1.sinks.PurePathSink.hdfs.fileSuffix = .txt
agent1.sinks.PurePathSink.hdfs.rollInterval = 120
agent1.sinks.PurePathSink.hdfs.rollSize = 131072

core-site.xml - Hadoop 2.8

<configuration>

    <property>
        <name>hadoop.tmp.dir</name>
        <value>/home1/tmp</value>
        <description>A base for other temporary directories</description>
    </property>
    <property>
        <name>fs.default.name</name>
        <value>hdfs://127.0.0.1:9000</value>
    </property>

    <property>
        <name>fs.file.impl</name>
        <value>org.apache.hadoop.fs.LocalFileSystem</value>
        <description>The FileSystem for file: uris.</description>
    </property>

    <property>
        <name>fs.hdfs.impl</name>
        <value>org.apache.hadoop.hdfs.DistributedFileSystem</value>
        <description>The FileSystem for hdfs: uris.</description>
    </property>

Solution

  • In my case I found that explicitly declaring the paths solved the issue. it had to do with which Jar it was picking up.

    Thanks @V.Bravo for your reply. I am not using a distribution but standing up a cluster of my own