[SOLVED] Truncating a table in HBase did not free up space for HDFS

Truncating a table in HBase did not free up space for HDFS

I was having problems with HBase and HDFS, as I understood HBase runs on top of HDFS and thus I assumed it used HDFS to store its data. The problem was that HDFS was close to its maximum space utilization and it and HBase were constantly crashing.

Here is a picture of the monitoring tool Ambari in the HDFS tab:

For a reasonable amount of time I wasn't even able to run normal HBase shell comands such as list and scan, so I couldn't even try to delete some data from the tables. However, when I did manage to restart HBase, which is running in a single node, I truncated the table which was using that biggest amount of space and when I checked through the du (disk usage) command, the hdfs directory was even bigger. And the HDFS Space Utilization was still very high. How can I free up space in the Hadoop Distributed File System if not by removing entries from HBase? Removing random directories in the HDFS manually feels very sloppy and I would prefer not to do that. Should I increase HDFS reserved space? Should I decrease replication? Should I force a compaction?

Solution

Increasing disk space for the VM that was running HBase and HDFS substantially reduced HDFS storage usage and it was enough for what I needed.

Just make sure you are expanding the partition where HDFS actually resides in.