Using webhdfs we can get the content summary of a directory/file.
However, the following properties are unclear for me:
"length":
{
"description": "The number of bytes used by the content.",
"type" : "integer",
"required" : true
}
"spaceConsumed":
{
"description": "The disk space consumed by the content.",
"type" : "integer",
"required" : true
}
What exactly is the difference between those ? Is spaceConsumed
the size taken on disk duplication included ? The internal method documentation does not provide additional detail.
According to a collegue, the answer is:
spaceConsumed = length * replicationFactor
However, I have no source to prove it.