Decision Tree Induction Based on Entropy
and Information Gain
Context: Download
the public online
slides for Data Mining: Practical Machine Learning
Tools and Techniques (3rd. Edition). Unzip the file to find the
pdf slides for Chapter 4 of the book. Read pages
28~37 of the online slides for Chapter 4 of the book and do the following homework.
Table 1: The original weather data set in the book
Outlooksunnysunnyovercastrainyrainyrainyovercastsunnysunnyrainysunnyovercastovercastrainy
|
Temphothothotmildcoolcoolcoolmildcoolmildmildmildhotmild
|
Humidityhighhighhighhighnormalnormalnormalhighnormalnormalnormalhighnormalhigh
|
WindyFALSETRUEFALSEFALSEFALSETRUETRUEFALSEFALSEFALSETRUETRUEFALSETRUE |
Play (Class)nonoyesyesyes
no yes
no yesyesyesyesyes
no |
Example and Explanations in the book:
Table 1 above is a weather data set with 5 attributes (Outlook, Temp, Humidity, Windy, and Play) and 15 records. The last attribute Play is the class attribute and our goal is to find out a way to classify any given new record where the values of the first 4 attributes are known into one of the two classes Play=Yes or Play=Yes. In other words, we want to learn a way to predict whether Play=Yes or Play=Yes based on the values of the first 4 attributes of the record.
On pages 28~37 of the online slides
for Chapter 4 for Data Mining:
Practical Machine Learning Tools and Techniques (3rd. Edition), you can find the
development of a decision tree step by step for the weather data set in Table
1.
Exercise 1. Show how you can
apply the decision tree there to classify a new case like the following one.
|
Outlook |
Temp |
Humidity |
Windy |
Play (Class) |
|
Sunny |
Cool |
High |
True |
??? |
Table 2 is another weather data set for this homework.
OutlookSunnySunnyOvercastRainyRainyRainyOvercastSunnySunnyRainySunnyOvercastOvercastRainy
|
Temp Humidity Windyhot high FALSEhot high hot high FALSEmild high FALSEcool normal FALSEcool normal cool normal mild high FALSEcool normal FALSEmild normal FALSEmild normal mild high hot normal FALSEmild high
|
Play (Class)noyesnonoyesnononoyesyesnoyesyesyes
|
Exercise 2. Show step by step how you can compute the related information gain to develop a decision tree based on the revised weather data set in Table 2 using the information gain as a measure to pick attributes for classification and show how will classify the new case below in the bottom.
|
Outlook |
Temp |
Humidity |
Windy |
Play (Class) |
|
Sunny
|
Cool |
High |
True
|
??? |