On the Naïve Bayes classifier

 

 

Table 1 below is a weather data set with 5 attributes (Outlook, Temp, Humidity, Windy, and Play) and 15 records. The last attribute Play is the class attribute and our goal is to find out a way to classify any given new record where the values of the first 4 attributes are known into one of the two classes Play=Yes or Play=Yes. In other words, we want to learn a way to predict whether Play=Yes or Play=Yes based on the values of the first 4 attributes of the record. 

 

Outlook
sunny
sunny
overcast
rainy
rainy
rainy
overcast
sunny
sunny
rainy
sunny
overcast
overcast
rainy

 

Temp
hot
hot
hot
mild
cool
cool
cool
mild
cool
mild
mild
mild
hot
mild

 

Humidity
high
high
high
high
normal
normal
normal
high
normal
normal
normal
high
normal
high

 

Windy
FALSE
TRUE
FALSE
FALSE
FALSE
TRUE
TRUE
FALSE
FALSE
FALSE
TRUE
TRUE
FALSE
TRUE
 
Play (Class)
no
no
yes
yes
yes

no

yes 

no

yes
yes
yes
yes
yes 

no

 

 

Introduction to the Naïve Bayes classier: 

 

Carefully read the overview of Naïve Bayes classifier here, which use the data set in Table I as an example. It is based on Sections 4.2 of Data Mining: Practical Machine Learning Tools and Techniques and describes the use of the naïve Bayes approach to collect statistics based on existing data to classify new instances. It shows how you can collect the statistics needed in the naïve Bayes method step by step using the weather data set in Table 1 above and show we can apply them to classify a new case like the following one.

 

Outlook

Temp

Humidity

Windy

Play (Class)

Sunny

Cool

High

True

???

 

 

 

Things to do for this homework:

 

Table 2 is another weather data set for this homework.

 

Outlook
Sunny
Sunny
Overcast
Rainy
Rainy
Rainy
Overcast
Sunny
Sunny
Rainy
Sunny
Overcast
Overcast
Rainy

 

Temp               Humidity           Windy
hot                   high                  FALSE
hot                   high                  TRUE
hot                   high                  FALSE
mild                 high                  FALSE
cool                 normal             FALSE
cool                 normal             TRUE
cool                 normal             TRUE
mild                 high                  FALSE
cool                 normal              FALSE
mild                 normal              FALSE
mild                 normal              TRUE
mild                 high                  TRUE
hot                   normal             FALSE
mild                 high                 TRUE

 

Play (Class)
no
yes
no
no
yes
no
no
no
yes
yes
no
yes
yes
yes

 

 

 

Show step by step, how you can determine the statistics needed for using the naïve Bayes method for classification based on the revised weather data set in Table 2, and show how you will classify the new case below in the bottom.

 

 

Outlook

Temp

Humidity

Windy

Play (Class)

Sunny

Cool

High

True

???