pythonscipyhierarchical-clustering

scipy: How to plot the hierarchical clustering tree


I am interested in plotting the tree represented by the output of hierarchy.to_tree().

To clarify my question, I give the following MWE:

import numpy as np
from scipy.cluster import hierarchy
from scipy.spatial import distance_matrix
import matplotlib.pyplot as plt

arr = np.array([[141, 0, 0, 0, 0],
                   [0, 144, 0, 0, 0],
                   [0, 0, 138, 0, 0],
                   [0, 0, 0, 143, 0],
                   [0, 0, 0, 0, 134]])

d = distance_matrix(arr, arr)
hc = hierarchy.linkage(d, method="complete")

I can plot the dendrogram using:

hierarchy.dendrogram(hc,  labels=['A','B','C', 'D', 'F'])
plt.show()

Output: enter image description here

To obtain the tree representation, I do like so:

hierarchy_classes = hierarchy.to_tree(hc)

But then I'm not sure how to plot the hierarchical clustering tree itself.

EDIT:

To make question clear, I modified it to show the expected output.

Expected output:

Something like this:

enter image description here


Solution

  • I would suggest walking the tree recursively and using graphviz to visualize it.

    Example:

    import graphviz
    
    
    def render_tree(tree, labels):
        dot = graphviz.Digraph('cluster_heirarchy')
        render_tree_recursive(tree, dot, labels)
        return dot
    
    
    def render_tree_recursive(node, dot, labels, parent=None):
        label = None
        if node.count == 1 and node.id < len(labels):
            label = labels[node.id]
        dot.node(f"c{node.id}", label)
        if parent is not None:
            dot.edge(f"c{parent.id}", f"c{node.id}")
        if node.left is not None:
            render_tree_recursive(node.left, dot, labels, node)
        if node.right is not None:
            render_tree_recursive(node.right, dot, labels, node)
    
    
    graph = render_tree(hierarchy_classes, labels=['A','B','C', 'D', 'F'])
    

    Output:

    linkage diagram