hadoophdfsbigdatahadoop3erasure-code

Hadoop 3 : how to configure / enable erasure coding?


I'm trying to setup an Hadoop 3 cluster.

Two questions about the Erasure Coding feature :

  1. How I can ensure that erasure coding is enabled ?
  2. Do I still need to set the replication factor to 3 ?

Please indicate the relevant configuration properties related to erasure coding/replication, in order to get the same data security as Hadoop 2 (replication factor 3) but with the disk space benefits of Hadoop 3 erasure coding (only 50% overhead instead of 200%).


Solution

  • In Hadoop3 we can enable Erasure coding policy to any folder in HDFS. By default erasure coding is not enabled in Hadoop3, you can enable it by using setPolicy command with specifying desired path of folder.

    1: To ensure erasure coding is enabled, you can run getPolicy command.

    2: In Hadoop3 Replication factor setting will affect only to other folders which is not configured by erasure code setPolicy. You can use both Erasure coding and replication factor settings in single cluster.

    Command to List the supported erasure policies:

    ./bin/hdfs ec -listPolicies

    Command to Enable XOR-2-1-1024k Erasure policy:

    ./bin/hdfs ec -enablePolicy -policy XOR-2-1-1024k

    Command to Set Erasure policy to HDFS directory:

    ./bin/hdfs ec -setPolicy -path /tmp -policy XOR-2-1-1024k

    Command to Get the policy set to the given directory:

    ./bin/hdfs ec -getPolicy -path /tmp

    Command to Remove the policy from the directory.i.e unset policy:

    ./bin/hdfs ec -unsetPolicy -path /tmp

    Command to Disable policy:

    ./bin/hdfs ec -disablePolicy -policy XOR-2-1-1024k