rrandomcluster-analysishierarchical-clusteringpvclust

Cluster analysis in R: How can I get deterministic results from pvclust?


pvclust is great for cluster analysis in R. However, when running it as part of a batch operation, it is annoying to get different results for the same data. Obviously, there are many "correct" clusterings of the same data, and it seems that pvclust uses some randomness to determine the clusters of a specific run. But is there any way to get deterministic results?

I want to be able to present a minimal, repeatable analysis package: the data plus an R script, and a separate written document that contains my interpretations of the clustering. It is then possible for others to add to the analysis, e.g. by changing the aesthetic appearance of plots. Now, the interpretations will always be out of sync with what someone else gets when they run the script containing pvclust.


Solution

  • Not only for cluster analysis, but when there is randomness involved, you can fix the random number generator so you always get the same results.

    Try:

    set.seed(seed=123)
    # your code here
    

    The seed can be any integer, or something that can be converted to integer. And that's all.