[SOLVED] Deleting HDFS Block Pool

Deleting HDFS Block Pool

I am running a Spark on Hadoop cluster. I tried running a Spark job and noticed I was getting some issues, eventually realised by looking at the logs of the data node that the file system of one of the datanodes is full

I looked at hdfs dfsadmin -report to identify this. The category DFS remaining is 0B because the non-DFS used is massive (155GB of 193GB configured capacity).

When I looked at the file system on this data node I could see most of this comes from the /usr/local/hadoop_work/ directory. There are three block pools there and one of them is very large (98GB). When I look on the other data node in the cluster it only has one block pool.

What I am wondering is can I simply delete two of these block pools? I'm assuming (but don't know enough about this) that the namenode (I have only one) will be looking at the most recent block pool which is smaller in size and corresponds to the one on the other data node.

Solution

As outlined in the comment above, eventually I did just delete the two block pools. I did this based on the fact that these block pool ID's didn't exist in the other data node and by looking through the local filesystem I could see the files under these ID's hadn't been updated for a while.