rheatmaphclust

Clustering of dissimilar matrix with NA values for heatmap without imputation


I am trying to make a heatmap of a dissimilar matrix that a lot of NAs. However, I ran into problems when trying to perform clustering. Without clustering the heatmap works fine. I do not want to impute/remove the NAs. Is there anyway to perform clustering? I understand that with NAs calculating distance is a problem but there should be a way around it, right?

I get the following error message:

" Error in hclust(get_dist(submat, distance), method = method) : NA/NaN/Inf in foreign function call (arg 10)

In addition: Warning message: NA exists in the matrix, calculating distance by removing NA values."

Edit:

The data I am using is an unusual matrix with a lot of NAs. Perhaps this is the problem? But I would like to visualize these NAs in the heatmap as well. So only cluster rows but not the columns.

dissimilar matrix example


Solution

  • Okay, I managed to solve this problem. I had to do simple imputation. I just replaced all NAs with a "constant".

    Then I can visualize the entire dataset without removing any samples or rows, cluster both rows and columns. Then, when I want to plot where the NAs are in the dataset, I just had to give the "constant" a specific colour in any plot.

    In this way, I treat all the NAs the same without assigning NAs in each row/column a value based on other samples (such as mean/median/regression methods). This method works best for my dataset without skewing them in any direction.