Decision Tree Induction Based on Entropy and Information Gain

 

Context: Download the public online slides for Data Mining: Practical Machine Learning Tools and Techniques (3rd. Edition). Unzip the file to find the pdf slides for Chapter 4 of the book. Read pages 28~37 of the online slides for Chapter 4 of the book and do the following homework.

 

Table 1: The original weather data set in the book

 

Outlook
sunny
sunny
overcast
rainy
rainy
rainy
overcast
sunny
sunny
rainy
sunny
overcast
overcast
rainy

 

Temp
hot
hot
hot
mild
cool
cool
cool
mild
cool
mild
mild
mild
hot
mild

 

Humidity
high
high
high
high
normal
normal
normal
high
normal
normal
normal
high
normal
high

 

Windy
FALSE
TRUE
FALSE
FALSE
FALSE
TRUE
TRUE
FALSE
FALSE
FALSE
TRUE
TRUE
FALSE
TRUE
 
Play (Class)
no
no
yes
yes
yes

no

yes 

no

yes
yes
yes
yes
yes 

no

 

 

Example and Explanations in the book: 

 

Table 1 above is a weather data set with 5 attributes (Outlook, Temp, Humidity, Windy, and Play) and 15 records. The last attribute Play is the class attribute and our goal is to find out a way to classify any given new record where the values of the first 4 attributes are known into one of the two classes Play=Yes or Play=Yes. In other words, we want to learn a way to predict whether Play=Yes or Play=Yes based on the values of the first 4 attributes of the record. 

 

On pages 28~37 of the online slides for Chapter 4 for Data Mining: Practical Machine Learning Tools and Techniques (3rd. Edition), you can find the development of a decision tree step by step for the weather data set in Table 1.

 

Exercise 1. Show how you can apply the decision tree there to classify a new case like the following one.

 

Outlook

Temp

Humidity

Windy

Play (Class)

Sunny

Cool

High

True

???

 

 

 

 

 

Table 2 is another weather data set for this homework.

 

Outlook
Sunny
Sunny
Overcast
Rainy
Rainy
Rainy
Overcast
Sunny
Sunny
Rainy
Sunny
Overcast
Overcast
Rainy

 

Temp                     Humidity                 Windy
hot                         high                       FALSE
hot                         high                       TRUE
hot                         high                       FALSE
mild                       high                       FALSE
cool                       normal                   FALSE
cool                       normal                   TRUE
cool                       normal                   TRUE
mild                       high                       FALSE
cool                       normal              FALSE
mild                       normal              FALSE
mild                       normal              TRUE
mild                       high                  TRUE
hot                         normal             FALSE
mild                       high                 TRUE

 

Play (Class)
no
yes
no
no
yes
no
no
no
yes
yes
no
yes
yes
yes

 

 

 

Exercise 2. Show step by step how you can compute the related information gain to develop a decision tree based on the revised weather data set in Table 2 using the information gain as a measure to pick attributes for classification and show how will classify the new case below in the bottom.

 

Outlook

Temp

Humidity

Windy

Play (Class)

Sunny

Cool

High

True

???