ibm-cloudobject-storageanalytics-engine

hadoop fs -copyFromLocal localfile.txt cos://remotefile.txt => Failed to create /disk2/s3a


I'm trying to upload a file to cloud object storage from IBM Analytics Engine:

$ hadoop fs -copyFromLocal LICENSE-2.0.txt \
   cos://xxxxx/LICENSE-2.0.txt

However, I'm receiving warnings about failure to create disks:

18/01/26 17:47:47 WARN fs.LocalDirAllocator$AllocatorPerContext: Failed to create /disk1/s3a 18/01/26 17:47:47 WARN fs.LocalDirAllocator$AllocatorPerContext: Failed to create /disk2/s3a

Note even though I recieve this warning, the file is still uploaded:

$ hadoop fs -ls cos://xxxxx/LICENSE-2.0.txt

-rw-rw-rw- 1 clsadmin clsadmin 11358 2018-01-26 17:49 cos://xxxxx/LICENSE-2.0.txt

The problem seems to be:

$ grep -B2 -C1 'disk' /etc/hadoop/conf/core-site.xml
    <property>
      <name>fs.s3a.buffer.dir</name>
      <value>/disk1/s3a,/disk2/s3a,/tmp/s3a</value>
    </property>

$ ls -lh /disk1 /disk2
ls: cannot access /disk1: No such file or directory
ls: cannot access /disk2: No such file or directory

What are the implications of these warnings? The /tmp/s3a folder does exist, so can we ignore the warnings about these other folders?


Solution

  • The hadoop property 'fs.s3a.buffer.dir' supports list (comma separated values)and points to a local path. When the path is missing, the warnings do appear but they can be safely ignored since they are harmless.If the same command had been run from within the data node, the warning would not show up.Regardless of the warning, the file will be copied to Cloud Object Store, hence does not have any other impact.

    Idea to have multiple values for fs.s3a.buffer.dir to'/disk1/s3a,/disk2/s3a,/tmp/s3a' is that when hadoop jobs are run on cluster with Cloud Object Store, the map-reduce tasks are scheduled on data nodes which has additional disks viz /disk1 and /disk2 which has more disk capacity compared to management nodes.