This is super simple but I'm learning about decision trees and the ID3 algorithm. I found a website that's very helpful and I was following everything about entropy and information gain until I got to
I don't understand how the entropy for each individual attribute (sunny, windy, rainy) is calculated--specifically, how p-sub-i is calculated. It seems different than the way it is calculated for Entropy(S). Can anyone explain the process behind this calculation?
To split a node into two different child nodes, one method consists splitting the node according to the variable that can maximise your information gain.
When you reach a pure leaf node, the information gain equals 0 (because you can't gain any information by splitting a node containing only one variable - logic
).
In your example Entropy(S) = 1.571
is your current entropy - the one you have before splitting. Let's call it HBase
.
Then you compute the entropy depending on several splittable parameters.
To get your Information Gain, you substract the entropy of your child nodes to HBase
-> gain = Hbase - child1NumRows/numOfRows*entropyChild1 - child2NumRows/numOfRows*entropyChild2
def GetEntropy(dataSet):
results = ResultsCounts(dataSet)
h = 0.0 #h => entropy
for i in results.keys():
p = float(results[i]) / NbRows(dataSet)
h = h - p * math.log2(p)
return h
def GetInformationGain(dataSet, currentH, child1, child2):
p = float(NbRows(child1))/NbRows(dataSet)
gain = currentH - p*GetEntropy(child1) - (1 - p)*GetEntropy(child2)
return gain
The objective is to get the best of all Information Gains!
Hope this helps! :)