pythonmatplotlibscikit-learndecision-tree

DecisiontreeClassifier, why is the sum of values wrong?


I visualized my decisiontreeclassifier and I noticed, that the sum of samples are wrong or formulated differently the 'value' value does not fit with the value of the samples(Screenshot)? Do I misinterpret my Decisiontree? I thought if got 100 samples in my node and 40 are True and 60 are False, I got in my next node 40 (or 60) samples which are divided again...

import matplotlib.pyplot as plt
from sklearn import tree
tree1=DecisionTreeClassifier(criterion="entropy",max_features=13,max_leaf_nodes=75,min_impurity_decrease=0.001,min_samples_leaf=12,min_samples_split=20,splitter="best",max_depth=9)

tree1.fit(X_train,y_train)
feature_names=Daten.drop("Abwanderung_LabelEncode",axis=1).columns
class_names=["Keine Abwanderung","Abwanderung"]
fig = plt.figure(figsize=(25,20))
_ = tree.plot_tree(tree1, 
               feature_names=feature_names,
               class_names=class_names,
               rounded=True,
               filled=True)

Screenshot of a part of my decisiontree


Solution

  • The plot is correct.

    The two values in value are not the number of samples to go to the children nodes; instead, they are the negative and positive class counts in the node. For example, 748=101+647; there are 748 samples in that node, 647 of which are positive class. The child nodes have 685 and 63 samples, and 685+63=647. The left child has 47 of the negative samples, and the right node 54, and 47+54=101, the total number of negative samples.