I want to plot a dendrogram for a cluster result. Right now I am using ElkiBuilder from ELKI 0.7.5 for clustering.
In the best case I'd like to directly plot a dendrogram.
If that's not possible I'd like to extract information (distances) from the clustering to create a dendrogram with another library (eg. using newick format)
Therefore my questions:
Is it possible to create dendrograms with ELKI?
Is it possible to access the distances which have been calculated during the clustering? (the distances used when two clusters were merged)
Right now I am using the following code for clustering:
public Clustering<?> createClustering() {
double[][] distanceMatrix = new double[][]{
{0.0, 1.0, 3.0},
{1.0, 0.0, 4.0},
{3.0, 4.0, 0.0}
};
int noOfClusters = 2;
// Adapter to load data from an existing array.
DatabaseConnection dbc = new ArrayAdapterDatabaseConnection(distanceMatrix);
// Create a database (which may contain multiple relations!)
Database db = new StaticArrayDatabase(dbc, null);
// Load the data into the database (do NOT forget to initialize...)
db.initialize();
Clustering<?> clustering = new ELKIBuilder<>(CutDendrogramByNumberOfClusters.class) //
.with(CutDendrogramByNumberOfClusters.Parameterizer.MINCLUSTERS_ID, noOfClusters) //
.with(AbstractAlgorithm.ALGORITHM_ID, AnderbergHierarchicalClustering.class) //
.with(AGNES.Parameterizer.LINKAGE_ID, WardLinkage.class)
.build().run(db);
return clustering;
}
The AGNES
class (instead I recommend to use AnderbergHierarchicalClustering
instead, it is much faster but gives the exact same result) returns the clustering in a standard form called "pointer hierarchy" (PointerHierarchyRepresentationResult
). The merge of i and j at height h is represented as a pointer from i to j, with height h. Afterwards, j represents the merged cluster. This basic form was introduces by Sibson et al. with the SLINK algorithm in 1973.
In particular this contains the y
information (getParentDistanceStore
), the merges (given by getParentStore
), and it can compute an order to arrange the points for visualization getPositions
.
You may want to have a look at the code of DendrogramVisualization
, which is responsible for creating the SVG dendrogram in the GUI.