r cluster-analysis dimension statistics-bootstrap pvclust

cluster one-dimensional data using pvclust

Thanks for taking time read this question. I have some one-dimensional data to cluster in R. The basic hclust command works fine. But the pvclust command, however, does not take one-dimensional data, and keeps saying:

Error in hclust(distance, method = method.hclust) : 
  must have n >= 2 objects to cluster

I found a work-around, that I added some all-zero rows to the data. So the data becomes:

       [,1]   [,2]   [,3]  [,4]  [,5]   [,6]   [,7]   [,8]   [,9]  [,10]
[1,]  7.424 14.251 15.957 1.542 2.451 20.836 13.534 20.003 12.555 10.817
[2,]      0      0      0     0     0      0      0      0      0      0
[3,]      0      0      0     0     0      0      0      0      0      0
[4,]      0      0      0     0     0      0      0      0      0      0

Then I ran pvclust, and it worked!

But I am concerned that this work-around screws up the mathematics laying behind pvclust. Can any one tell me whether I am right/wrong, and if there's a better solution to my question?

Thank you!

Solution

First of all, let me state that none of these methods is meant for one-dimensional data.

For one-dimensional data, please use a method that exploits that the data can be sorted. For example, use a method based on kernel density estimation.

The term "cluster analysis" is usually used with multidimensional data only. In one dimensional, there are much better methods. See also "natural breaks optimization", but IMHO you should be using kernel density estimation: split the data at local minima in the KDE.

Now to your actual question. Most likely the problem is that you are ... passing 1 dimensional data. Which is interpreted as one record, with d dimensions, and thus the method complains about having a single sample only. You may have success by first transposing your record.

With your hack of adding zero records, the result most likely becomes bogus. You are probably clustering a data set that has 1 vector that contains your data, and 3 vectors that are all zero...

But in the end, you should not be using these methods here anyway! Use a method that exploits that your data can be sorted.