hadoophdfsclouderanamenode

Why double amount of memory is used for Name Node files?


the Cloudera blog or in hortonwork forum I read::

"Every file, directory and block in HDFS is represented as an object in the namenode’s memory, each of which occupies 150 bytes, as a rule of thumb. So 10 million files, each using a block, would use about 3 gigabytes of memory"

BUT:

10000000 * 150 = 1500000000 byte = 1.5 GB.

Looks like For 3GB I need to allocate 300 bytes. I don't understand why 300 bytes are used for each file instead of 150? It's just NameNode. There should not be any replication factor.

Thanks


Solution

  • For every small file, namenode needs to store two objects in memory: per-file object and per-block object. This results in approximately 300 bytes per single file.