pythonscikit-learnvisualizationdecision-treepydot

sklearn.tree.export_graphviz alternatives


It is possible to visualize decision trees using pydotplus from pypi, but it has issues on my machine (it says it was not build with libexpat and thus it only shows a number on a node instead of a table with some information) and I'd like to use an alternative. I already tried using networkx, but it requires pygraphviz to read .dot files and make a networkx graph of them. When I tried to install it using pip that also failed.

So now I am looking for an alternative way of visualizing decision trees, which can be installed using pip or anaconda.

Which alternatives exist?

EDIT#1

Output of conda list:

# packages in environment at /home/xiaolong/development/anaconda3/envs/coursera_ml_classification:
#
alabaster                 0.7.7                    py34_0    defaults
awscli                    1.6.2                     <pip>
babel                     2.3.3                    py34_0    defaults
backports                 1.0                      py34_0    defaults
backports-abc             0.4                       <pip>
backports.shutil-get-terminal-size 1.0.0                     <pip>
backports_abc             0.4                      py34_0    defaults
bcdoc                     0.12.2                    <pip>
boto                      2.33.0                    <pip>
botocore                  0.73.0                    <pip>
cairo                     1.12.18                       6    defaults
certifi                   2015.4.28                 <pip>
colorama                  0.2.5                     <pip>
cycler                    0.10.0                   py34_0    defaults
decorator                 4.0.9                    py34_0    defaults
docutils                  0.12                     py34_0    defaults
entrypoints               0.2                      py34_1    defaults
expat                     2.1.0                         0    defaults
fontconfig                2.11.1                        5    defaults
freetype                  2.5.5                         0    defaults
get_terminal_size         1.0.0                    py34_0    defaults
glib                      2.43.0                        2    asmeurer
graphviz                  2.38.0                        1    defaults
harfbuzz                  0.9.39                        0    defaults
imagesize                 0.7.0                    py34_0    defaults
ipykernel                 4.3.1                    py34_0    defaults
ipython                   4.2.0                    py34_0    defaults
ipython-genutils          0.1.0                     <pip>
ipython_genutils          0.1.0                    py34_0    defaults
ipywidgets                4.1.1                    py34_0    defaults
jedi                      0.9.0                    py34_0    defaults
jinja2                    2.8                      py34_0    defaults
jmespath                  0.5.0                     <pip>
jsonschema                2.5.1                    py34_0    defaults
jupyter                   1.0.0                    py34_2    defaults
jupyter-client            4.2.2                     <pip>
jupyter-console           4.1.1                     <pip>
jupyter-core              4.1.0                     <pip>
jupyter_client            4.2.2                    py34_0    defaults
jupyter_console           4.1.1                    py34_0    defaults
jupyter_core              4.1.0                    py34_0    defaults
libffi                    3.2.1                         0    defaults
libgcc                    5.2.0                         0    defaults
libgfortran               3.0.0                         1    defaults
libpng                    1.6.17                        0    defaults
libsodium                 1.0.3                         0    defaults
libxml2                   2.9.2                         0    defaults
llvmlite                  0.10.0                   py34_0    defaults
markupsafe                0.23                     py34_0    defaults
matplotlib                1.5.1               np111py34_0    defaults
mistune                   0.7.2                    py34_0    defaults
mkl                       11.3.1                        0    defaults
multipledispatch          0.4.8                     <pip>
nbconvert                 4.2.0                    py34_0    defaults
nbformat                  4.0.1                    py34_0    defaults
notebook                  4.2.0                    py34_0    defaults
numpy                     1.11.0                   py34_0    defaults
openssl                   1.0.2h                        0    defaults
pandas                    0.18.1              np111py34_0    defaults
pango                     1.39.0                        0    defaults
path.py                   8.2.1                    py34_0    defaults
pep8                      1.7.0                    py34_0    defaults
pexpect                   4.0.1                    py34_0    defaults
pickleshare               0.5                      py34_0    defaults
pip                       8.1.1                    py34_1    defaults
pixman                    0.32.6                        0    defaults
prettytable               0.7.2                     <pip>
psutil                    4.1.0                    py34_0    defaults
ptyprocess                0.5                      py34_0    defaults
pyasn1                    0.1.9                     <pip>
pydotplus                 2.0.2                    py34_0    file:///home/xiaolong/development/anaconda3/conda-bld/linux-64/pydotplus-2.0.2-py34_0.tar.bz2
pyflakes                  1.1.0                    py34_0    defaults
pygments                  2.1.3                    py34_0    defaults
pyparsing                 2.1.1                    py34_0    defaults
pyqt                      4.11.4                   py34_1    defaults
python                    3.4.4                         0    defaults
python-contrib-nbextensions alpha                     <pip>
python-dateutil           2.5.2                    py34_0    defaults
pytz                      2016.3                   py34_0    defaults
pyyaml                    3.11                      <pip>
pyzmq                     15.2.0                   py34_0    defaults
qt                        4.8.7                         1    defaults
qtconsole                 4.2.1                    py34_0    defaults
readline                  6.2                           2    defaults
requests                  2.9.1                     <pip>
rope                      0.9.4                    py34_1    defaults
rope-py3k                 0.9.4.post1               <pip>
rsa                       3.1.2                     <pip>
scikit-learn              0.17.1              np111py34_0    defaults
scipy                     0.17.0              np111py34_3    defaults
setuptools                20.7.0                   py34_0    defaults
sframe                    1.8.5                     <pip>
simplegeneric             0.8.1                    py34_0    defaults
sip                       4.16.9                   py34_0    defaults
six                       1.10.0                   py34_0    defaults
snowballstemmer           1.2.1                    py34_0    defaults
sphinx                    1.4.1                    py34_0    defaults
sphinx-rtd-theme          0.1.9                     <pip>
sphinx_rtd_theme          0.1.9                    py34_0    defaults
spyder                    2.3.8                    py34_1    defaults
sqlite                    3.9.2                         0    defaults
terminado                 0.5                      py34_1    defaults
tk                        8.5.18                        0    defaults
tornado                   4.3                      py34_0    defaults
traitlets                 4.2.1                    py34_0    defaults
wheel                     0.29.0                   py34_0    defaults
xz                        5.0.5                         1    defaults
zeromq                    4.1.3                         0    defaults
zlib                      1.2.8                         0    defaults

SciPy version: 0.17.0

digraph Tree {
node [shape=box, style="filled", color="black"] ;
0 [label="grade.B <= 0.5\ngini = 0.5\nsamples = 37224\nvalue = [18476, 18748]", fillcolor="#399de504"] ;
1 [label="grade.C <= 0.5\ngini = 0.4973\nsamples = 32094\nvalue = [17218, 14876]", fillcolor="#e5813923"] ;
0 -> 1 [labeldistance=2.5, labelangle=45, headlabel="True"] ;
2 [label="gini = 0.4829\nsamples = 21728\nvalue = [12875, 8853]", fillcolor="#e5813950"] ;
1 -> 2 ;
3 [label="gini = 0.4869\nsamples = 10366\nvalue = [4343, 6023]", fillcolor="#399de547"] ;
1 -> 3 ;
4 [label="grade.A <= 14.8301\ngini = 0.3702\nsamples = 5130\nvalue = [1258, 3872]", fillcolor="#399de5ac"] ;
0 -> 4 [labeldistance=2.5, labelangle=-45, headlabel="False"] ;
5 [label="gini = 0.3555\nsamples = 4987\nvalue = [1153, 3834]", fillcolor="#399de5b2"] ;
4 -> 5 ;
6 [label="gini = 0.3902\nsamples = 143\nvalue = [105, 38]", fillcolor="#e58139a3"] ;
4 -> 6 ;
}

EDIT#2

I programmed this in a Jupyter notebook, but that has a bug of not coloring the svg if you try to display the SVG using:

![Decision Tree]('dtree.svg')

I found a work-around here:

from IPython.display import HTML

svg = None
with open('dtree.svg') as svg_file:
    svg = svg_file.read()

HTML(svg)

Solution

  • It's not the sexiest solution but I use the Grapviz CLI (it's called dot) called via subprocess, I'm on Mac, so I installed it with homebrew, but you can download binaries for other platforms from their downloads page. Here's an example using the Titanic datset:

    import pandas as pd
    import subprocess
    import seaborn.apionly as sns
    fromwd sklearn.preprocessing import Imputer
    from sklearn.tree import DecisionTreeClassifier, export_graphviz
    
    raw_data = sns.load_dataset('titanic')
    predictors = ['pclass','sex','age','sibsp','parch','fare','embarked','alone','adult_male']
    categorical = ['sex','embarked']
    numeric = [c for c in predictors if c not in categorical]
    target='survived'
    
    encoded_data = pd.get_dummies(raw_data[predictors], columns=categorical)
    
    imputer = Imputer()
    X = imputer.fit_transform(encoded_data).astype('float32')
    Y = raw_data[target].astype('float32')
    
    model = DecisionTreeClassifier(min_samples_leaf=10, max_depth=3)
    model.fit(X, Y)
    
    export_graphviz(model,
                    out_file='tree.dot',
                    feature_names=encoded_data.columns,
                    proportion=True,
                    filled=True,
                    impurity=False)
    
    subprocess.call(['dot', '-Tpdf', 'tree.dot', '-o' 'tree.pdf'])