Decision Tree Induction Based on Entropy
and Information Gain
Context: Download
the public online
slides for Data Mining: Practical Machine Learning
Tools and Techniques (3rd. Edition). Unzip the file to find the
pdf slides for Chapter 4 of the book. Read pages
28~37 of the online slides for Chapter 4 of the book and do the following homework.
Table 1: The original weather data set in the book
Outlook sunny sunny overcast rainy rainy rainy overcast sunny sunny rainy sunny overcast overcast rainy
|
Temp hot hot hot mild cool cool cool mild cool mild mild mild hot mild
|
Humidity high high high high normal normal normal high normal normal normal high normal high
|
Windy FALSE TRUE FALSE FALSE FALSE TRUE TRUE FALSE FALSE FALSE TRUE TRUE FALSE TRUE
|
Play (Class) no no yes yes yes
no yes
no yes yes yes yes yes
no |
Example and Explanations in the book:
Table 1 above is a weather data set with 5 attributes (Outlook, Temp, Humidity, Windy, and Play) and 15 records. The last attribute Play is the class attribute and our goal is to find out a way to classify any given new record where the values of the first 4 attributes are known into one of the two classes Play=Yes or Play=Yes. In other words, we want to learn a way to predict whether Play=Yes or Play=Yes based on the values of the first 4 attributes of the record.
On pages 28~37 of the online slides
for Chapter 4 for Data Mining:
Practical Machine Learning Tools and Techniques (3rd. Edition), you can find the
development of a decision tree step by step for the weather data set in Table
1.
Exercise 1. Show how you can
apply the decision tree there to classify a new case like the following one.
Outlook |
Temp |
Humidity |
Windy |
Play (Class) |
Sunny |
Cool |
High |
True |
??? |
Table 2 is another weather data set for this homework.
Outlook Sunny Sunny Overcast Rainy Rainy Rainy Overcast Sunny Sunny Rainy Sunny Overcast Overcast Rainy
|
Temp Humidity Windy hot high FALSE hot high hot high FALSE mild high FALSE cool normal FALSE cool normal cool normal mild high FALSE cool normal FALSE mild normal FALSE mild normal mild high hot normal FALSE mild high
|
Play (Class) no yes no no yes no no no yes yes no yes yes yes
|
Exercise 2. Show step by step how you can compute the related information gain to develop a decision tree based on the revised weather data set in Table 2 using the information gain as a measure to pick attributes for classification and show how will classify the new case below in the bottom.
Outlook |
Temp |
Humidity |
Windy |
Play (Class) |
Sunny
|
Cool |
High |
True
|
??? |