javahierarchical-clusteringdendrogramelki

Create Dendrogram with Elki


I want to plot a dendrogram for a cluster result. Right now I am using ElkiBuilder from ELKI 0.7.5 for clustering.

In the best case I'd like to directly plot a dendrogram.

If that's not possible I'd like to extract information (distances) from the clustering to create a dendrogram with another library (eg. using newick format)

Therefore my questions:

Right now I am using the following code for clustering:

public Clustering<?> createClustering() {
    double[][] distanceMatrix = new double[][]{
            {0.0, 1.0, 3.0},
            {1.0, 0.0, 4.0},
            {3.0, 4.0, 0.0}
    };
    int noOfClusters = 2;
    // Adapter to load data from an existing array.
    DatabaseConnection dbc = new ArrayAdapterDatabaseConnection(distanceMatrix);
    // Create a database (which may contain multiple relations!)
    Database db = new StaticArrayDatabase(dbc, null);
    // Load the data into the database (do NOT forget to initialize...)
    db.initialize();

    Clustering<?> clustering = new ELKIBuilder<>(CutDendrogramByNumberOfClusters.class) //
            .with(CutDendrogramByNumberOfClusters.Parameterizer.MINCLUSTERS_ID, noOfClusters) //
            .with(AbstractAlgorithm.ALGORITHM_ID, AnderbergHierarchicalClustering.class) //
            .with(AGNES.Parameterizer.LINKAGE_ID, WardLinkage.class)
            .build().run(db);
    return clustering;
}

Solution

  • The AGNES class (instead I recommend to use AnderbergHierarchicalClustering instead, it is much faster but gives the exact same result) returns the clustering in a standard form called "pointer hierarchy" (PointerHierarchyRepresentationResult). The merge of i and j at height h is represented as a pointer from i to j, with height h. Afterwards, j represents the merged cluster. This basic form was introduces by Sibson et al. with the SLINK algorithm in 1973.

    In particular this contains the y information (getParentDistanceStore), the merges (given by getParentStore), and it can compute an order to arrange the points for visualization getPositions.

    You may want to have a look at the code of DendrogramVisualization, which is responsible for creating the SVG dendrogram in the GUI.