**To the Bottom
of the Page**

Instructor: Dr.
Shieu-Hong Lin Email:

Class: MW 1:30-2:45 pm at LIB 141

TAs: Alvin Suh, William Tan

**Submission of all your work**: go to Biola
Canvas

**Your grades: **see them under Biola Canvas

*************************************************************************************************

**Week 1**. An Introduction to Hidden Markov Models
(HMMs) and Bayesian Networks as Probabilistic Models for Reasoning

**Reading 1**: Report due Wednesday, August. 31

- Read the handout and a power point slide set on HMMs for spelling recognition.
- Play with the demo executable here to interact with the models.
- Read this general survey article (Communications of the ACM, Vol. 53 No. 12) on Bayesian Networks
- Send in your reading report through Biola Canvas.

**Lab
# 1:** Report due Wednesday, August. 31

**How intelligently could the computer interact with people
given the collection of historical data of past interactions?** **(i)** Play rock-paper-scissor here
with the computer in the novice mode online for at least 50 matches. **Record the actions of both players** in
each match. What is percentage of time you are able to win? (ii) Play it again
with the computer set in the veteran mode for at least another 50 matches. **Record the actions of both players** in
each match. What is percentage of time you are able to win? **(iii)** **Submission**: Record your findings in the two steps above in a WORD
document together with an approach about how you may develop a program that can
play the game intelligently as that online program. Upload this file under
canvas.

*************************************************************************************************

**Week 2**. Applications of HMMs: From Spelling Recognition
to Speech Recognition | Overview of Machine Learning

**Reading 2**: Report due Wednesday, Sept. 7.

- Review the handout and a power point slide set on HMMs for spelling recognition one more time. You should make sure you have a clear enough understanding such that you are able to figure out (i) all the transition probabilities shown in Figure.1 in section 3.1 given the three parameter values of the spelling model and (ii) all the probabilities of typographic mistakes shown in the simplified example in the end of Section 3.2 given the three parameter values of the keyboard model.
- Read the prologue.
- Send in your reading report through Biola Canvas.

**Homework#1 **first
attempt on spelling recognition: due Wednesday, Sept.
7.

- Based on Reading #1, do the things described in the homework regarding Hidden Markov Models for spelling recognition.
- Send in your work through Biola Canvas.

**Programming #1A**: Wednesday, Sept. 7.

- Download this C++ project framework and finish the
implementation of the function
*prCharGivenCharOfState*in model.cpp. This will make the keyboard model work to provide the service in option*k*. (If you have time, you can go on to implement*getPrTableForPossibleInitialStates*and*getPrTableForPossibleNextStates*in model.cpp to finish Programming #1B too.) See these comments about the functions in model.cpp. The function*prCharGivenCharOfState*implements the keyboard model. Check to make sure your program works to provide option*K*and produce the same probabilities as the given demo program in Reading #1 does. Note that let¡¦s assume the keyboard only generates lower-case letters and just implement your functions to handle lower-case letters only. **Submission: (i)**Compress your entire Program 1A folder into a zip file and upload it through Biola Canvas.**(ii)**Carefully fill out this self-evaluation report and upload it through Biola Canvas.

*************************************************************************************************

**Week 3**. Formulation of Hidden Markov Models
(HMMs) | Ingredients of Machine Learning I

**Reading 3**: Wednesday, Sept. 14.

- Read pages 257~260 of
*A tutorial on hidden Markov models and selected applications in speech recognition*. - Review the prologue and Chapter 1 of Machine Learning.
- Send in your reading report through Biola Canvas.

**Programming #1B**: Wednesday, Sept. 14.

- Continue your
project from Programming #1A and finish the implementation of all three
functions
*prCharGivenCharOfState, getPrTableForPossibleInitialStates,*and*getPrTableForPossibleNextStates*in model.cpp. See these revised comments**(especially those in red**) about these functions. These functions implement the keyboard model and the spelling model. Check to make sure your program works to provide both options*K*and*S*and produce the same probabilities as the given demo program in Reading #1 does. Note that let¡¦s assume the keyboard only generates lower-case letters and just implement your functions to handle lower-case letters only. **Submission: (i)**Compress your entire Program 1B folder into a zip file and upload it through Biola Canvas.**(ii)**Carefully fill out this self-evaluation report and upload it through Biola Canvas.

*************************************************************************************************

**Week 4**. Formulation of Hidden Markov Models
(HMMs) | Ingredients of Machine Learning II

**Reading 4**: Wednesday, Sept. 21.

- Read pages 257~261 of
*A tutorial on hidden Markov models and selected applications in speech recognition*. - Review the prologue and Chapter 1 of Machine Learning.
- Review this general survey article (Communications of the ACM, Vol. 53 No. 12) on Bayesian Networks
- Send in your reading report through Biola Canvas.

**Programming #2A**: due Wednesday, Sept. 21.

- Download this C++ project framework
and finish the implementation of two additional functions
and*getKeyboardProbabilityTable*in model.cpp as described in here about these functions. They are needed to simulate the behavior of typing a given character. Check to make sure your program works to produce behavior of typing individual characters (under option A) consistent with that produced by the demo executable.*typeOneChar* **Submission: (i)**Compress your entire Program 2A folder into a zip file and upload it through Biola Canvas.**(ii)**Carefully fill out this self-evaluation report and upload it through Biola Canvas.

*************************************************************************************************

**Week 5**. Simulation Using Hidden Markov Models
(HMMs) | Binary Classification in Machine
Learning

**Reading 5**: Wednesday, Sept. 28.

- Read Chapter 2 of Machine Learning on Binary Classification.
- Read pages
260~263 of
*A tutorial on hidden Markov models and selected applications in speech recognition*. In particular, carefully read P. 261 of*A tutorial on hidden Markov models and selected applications in speech recognition*to make sure you understand (i) how you can use a hidden Markov model to simulate the generation of a sequence of observations and (ii) the tree basic problems for hidden Markov models. Also carefully read pages 262~263 to make sure you understand the solution to the first problem for hidden Markov models. - Send in your reading report through Biola Canvas.

**Programming #2B**: due Wednesday, Sept. 28.

- Download this C++ project framework and
finish the implementation of on additional functions
in model.cpp as described in here about these functions. It is needed to simulate behavior of typing a given word based on the keyboard model and the spelling model. Check to make sure your program works to produce typing behavior (under option A) consistent with that produced by the demo executable. In other words, it should produce traces explaining*typeOneWord***(i)****the****state transitions**according to the simulation of the spelling model and**(ii)****the****hidden states leading to the characters observed**according to the simulation of the keyboard model. **Notes on simulation**: Carefully read the descriptions in P. 261 of*A tutorial on hidden Markov models and selected applications in speech recognition*to make sure you understand how you can use a hidden Markov model (**i.e.**the spelling model + the keyboard model in our spelling recognition domain) to simulate the generation of a sequence of observations (**i.e.**a sequence of characters*F*or (ii) as soon as the simulation process has generated the maximal number of characters allowed by the call to*typeOneWord***.****Notes on using C**-**style character arrays**for storing strings: (i) The null character '\0' is used to signal the end of the string storeed in the character array and (ii) you use the*strlen*function to determine the length of the string currently stored in a character array. See more about the C-style character arrays for storing strings and related string functions in Sections 3.5.4 of C++ Primer and the C library reference guide.**Submission: (i)**Compress your entire Program 2B folder into a zip file and upload it through Biola Canvas.**(ii)**Carefully fill out this self-evaluation report and upload it through Biola Canvas.

**Lab #2 **(WEKA for machine learning / data mining): Wednesday**, Sept**. **28**.

- Install WEKA and unzip this zip file to get the zoo data set and the Iris data set.
- (i) Run WEKA and use the
**Explorer**in WEKA. Open the zoo data set. Select the classifier J48 (under the tree section of the classifier menu) and apply it to learn a decision tree from the dataset, and report the decision tree you found in your homework. (ii) Repeat the steps above to open the Iris data set and learn a decision tree from the data set report the decision tree you found in your homework. - Submit in your findings through Biola Canvas.

*************************************************************************************************

**Week 6**. Probabilistic Reasoning Using Hidden Markov
Models (HMMs) I | More on
Classification in Machine Learning

**Reading 6**: Wednesday, **Oct**. **5**.

- Read Chapter 3 of Machine Learning on Multi-Class Classification.
- Read pages
261~266 of
*A tutorial on hidden Markov models and selected applications in speech recognition*one more time. In particular, carefully read P. 262~263 to make sure you understand the solution to the first problem for hidden Markov models. - Send in your reading report through Biola Canvas.

**Programming #2C**: due Wednesday, **Oct. 5**.

- Implement one
additional function
in model.cpp as described here. Add an option B into the menu provided by the main function such that it can simulate typing the entire Biola vision statement twice and store the results (of typing the words in the vision statement) into a file by calling*typeOneArticle*. See this demo executable including this feature.*typeOneArticle* - Note that
this is an easy assignment since you can easily implement
with the help of*typeOneArticle*in*typeOneWord***Programming #2B.** **Submission: (i)**Compress your entire Program 2C folder into a zip file and upload it through Biola Canvas.**(ii)**Carefully fill out this self-evaluation report and upload it through Biola Canvas.

**Homework #2A**: due** **Wednesday**, Oct**. **5**.

On the brute-force
enumeration algorithm for probabilistic reasoning

- Read and answer the questions described in the homework regarding probabilistic reasoning using Hidden Markov Models for spelling recognition.
- Send in your work through Biola Canvas.

*************************************************************************************************

**Week 7**. Probabilistic Reasoning Using Hidden Markov
Models (HMMs) II | Tree Models

**Reading 7**: Wednesday, **Oct**. **12**.

- Read Chapter 5 of Machine Learning on Tree Models.
- Review pages
257~266 of
*A tutorial on hidden Markov models and selected applications in speech recognition*. In particular, carefully read**PP. 262~263**to make sure you understand the solution to**the first problem**for hidden Markov models. - Send in your reading report through Biola Canvas.

**Homework #2B (updated 3:30pm,
Monday Oct. 10)**: Wednesday**, Oct**.**12**.

On the forward algorithm** **for probabilistic reasoning** **

- Read and answer the questions described in the homework regarding probabilistic reasoning using the forward algorithm for Hidden Markov Models.
- Send in your work through Biola Canvas.

*************************************************************************************************

**Week 8**. **Test #1 | Torrey
Conference**

**Test #1 open-book test ****on HMMs**: Monday, **Oct**. **17**.

**At 1:29pm on**Monday,**Oct**.**17**,**log into Canvas**=>**Assignment**=>**Test 1 to see and work on the problem set.**- Understand the handout and a power point slide set on HMMs for spelling recognition.
- Understand the notation and the concepts on pages
257~266 of
*A tutorial on hidden Markov models and selected applications in speech recognition*. - Review the solutions to
Homework #2A
- Make sure you know the forward algorithm and how to do Homework #2B correctly.
- Grading: The weight of each
question in Part A is doubled, thus Part A has a weight of 20 points. Part B
==> 10 points. Part C and Part D are not counted. So 30 points are the
maximum you can get.

*************************************************************************************************

**Week 9**. Probabilistic Reasoning Using Hidden Markov
Models (HMMs) III | Rule Models

**Reading 9**: Wednesday, **Oct**. **26**.

- Read Chapter 6 of Machine Learning on Rule Models.
- Review pages
257~266 of
*A tutorial on hidden Markov models and selected applications in speech recognition*. In particular, carefully read P. 262~263 to make sure you understand the solution to the first problem for hidden Markov models. - Send in your reading report through Biola Canvas.

**Programming #3A**: due Wednesday, **Oct**.
**26**.

- Implement this function
*prOf1CharSeriesWhenTyping1Word*so that we can determine*Pr(*the probability that a person**d |w, p**)will end in a series of characters*p*while trying to type the word*d*.*w**demo executable*, which supports this feature.**You should implement the forward algorithm you study in Homework #2B**in the function*prOf1CharSeriesWhenTyping1Word***to calculate the probability of observing**a series of characterswhile trying to type the word*d**w*. **Submission**:**(i)**Compress your entire Program 3A folder into a zip file and upload it through Biola Canvas.**(ii)**Carefully fill out this self-evaluation report and upload it through Biola Canvas.

*************************************************************************************************

**Week 10**. Probabilistic Reasoning Using Hidden Markov
Models (HMMs) IV | Distance-Based Models

**Reading 10**: Wednesday, **Nov**.** 2**.

- Read Chapter 8 of Machine Learning on Distance-Based Models.
- Review pages
257~266 of
*A tutorial on hidden Markov models and selected applications in speech recognition*. In particular, carefully read**PP. 263~264**to make sure you understand the solution to**the second problem**for hidden Markov models. - Send in your reading report through Biola Canvas.

**Homework #2C **Wednesday**, Nov**.** 2**.

More practice on the
forward algorithm for probabilistic reasoning**: **

- Do the whole set of Test 1 again as Homework 2C. Read and answer the questions described in the homework regarding probabilistic reasoning using Hidden Markov Models for spelling recognition.
- Send in your work through Biola Canvas.

*************************************************************************************************

**Week 11**. Probabilistic Reasoning Using Hidden Markov
Models (HMMs) V | Probabilistic Models

**Reading 11**: Wednesday, **Nov**.** 9**.

- Read Chapter 9 of Machine Learning on Probabilistic Models.
- Review pages
257~266 of
*A tutorial on hidden Markov models and selected applications in speech recognition*one more time and carefully read**PP. 263~264**to make sure you understand the solution to**the second problem**for hidden Markov models. - Send in your reading report through Biola Canvas.

**Programming #3B**: due Wednesday**, Nov**.** 9**.

- For Programming #3B, we want to
implement this function
*logPrOfGettingDocument1WhenTypingDocument2*, which should determine and return the logarithm of the probability of getting*document1*as the result of a person*p**document2*. Note that as shown in the conceptual approach to Homework#1, this can be done*∑ log Pr(*where**d**)_{i}|w_{i}, p*Pr(*is the probability that person**d**)_{i}|w_{i}, pwill end in the*p**i*th character stringin the file of*d*_{i}*document1*while trying to type*i*th wordin the file of*w*_{i}*document2*. See this*demo executable*, which supports this feature. - Note: See here
about logarithm functions available in C.
**Experiment**: Do Experiment 3B and (as described in the experiment) record the required findings into the self-evaluation report for Programming 3B.**Submission**:**(i)**Compress your entire Program 3B folder into a zip file and upload it through Biola Canvas.**(ii)**Carefully fill out this self-evaluation report, including the findings from Experiment 3B. Upload the report through Biola Canvas.

About **the real authors** of
documents A to H in **Homework#1**
and Experiment
3B for **Programming #3B**.

**Lab #3 (download
the new dataset on Nov. 7)**: Due: due
Wednesday**, Nov**.** 9**.

1)
**Knowing
more about WEKA explorer**:
Read the manual of WEKA
3.7.10 to see (i) how you can open up datasets in **CSV** (comma
separated values) files and save them in arff format for
classification tasks and (ii) how you can use classifiers to datasets to learn
predictive models and run cross validation experiments. Read the description of
cross validation here.

2)
**Getting
the retention datasets and sign the agreement**: Log into the Canvas to
download 2016_Project.zip under File (a new version
uploaded on Nov. 7). Unzip the zip file and explore the contents inside.
First of all, carefully read the enclosed confidential agreement and sign it
before you use the retention datasets for this lab assignment*.*** Copy and paste the agreement and the
signature into your report**.

3)
**Learning
tree models using J48**: Use WEKA and apply the J48 method under ** decision
tree** to the training datasets in the

4)
**Cross
validation experiments using J48: **Do (3) above again and conduct 10-fold cross
validation experiments together accordingly. Based on the results of the
cross validation experiments, **put down
the expected precision
and recall of the prediction model in (3) in terms in your report.**

5)
**Cross
validation experiments using IBk: **Instead of the J48 classifier under ** decision
tree, **use the IBk classifier under

6)
**Submission:
**Upload your report for Lab #3 with the results from Step 2 to Step 5 under
Canvas.

**Homework #3A (updated Nov.
2) **Wednesday**, Nov**.** 9**

On the backward algorithm** **for probabilistic reasoning

- Read and answer the questions described in the homework regarding probabilistic reasoning using the backward algorithm for Hidden Markov Models.
- Send in your work through Biola Canvas.

*************************************************************************************************

**Week 12**. Probabilistic Reasoning Using Hidden Markov
Models (HMMs) VI | Features and Transformations of Features

**Reading 12**: Wednesday, **Nov**.** 16**.

- Read Chapter 10 of Machine Learning on features.
- Review pages
257~266 of
*A tutorial on hidden Markov models and selected applications in speech recognition*one more time and carefully read**PP. 263~264**to make sure you understand the solution to**the second problem**for hidden Markov models.

**Homework #3B** **(updated Nov. 14) **Wednesday**, Nov**.** 16 **

On the forward-backward
algorithm** **for probabilistic
reasoning** **using £\ values, £] values
and £^ values

- Read and answer the questions described in the homework regarding probabilistic reasoning using the forward-backward algorithm for Hidden Markov Models.
- Send in your work through Biola Canvas.

*************************************************************************************************

**Week 13**. Probabilistic Reasoning Using Hidden Markov
Models (HMMs) VII | Linear Models

**Reading 13**: Wednesday, **Nov**.** 23**.

- Read Chapter 7 (focusing on Sections 7.1~7.2) of Machine Learning on linear models.
- Review pages
257~266 of
*A tutorial on hidden Markov models and selected applications in speech recognition*one more time and carefully read**PP. 264~265**to make sure you understand the solution to**the third problem**for hidden Markov models.

**Programming #4A**: Wednesday**, Nov**.** 23**.

**Training and automatic learning of parameter values**:**See**here**.****Submission**:**(i)**Compress your entire Program 4A folder into a zip file and upload it through Biola Canvas.**(ii)**Carefully fill out this self-evaluation report, including the findings from Experiment 4A. Upload the report through Biola Canvas.

**Note**: About **the real authors** of
documents A to H in **Homework#1**
and **Experiment 3B** for **Programming
#3B**.

**Homework #3C**: Wednesday**, Nov**.** 23**.

On the Viterbi algorithm** **for probabilistic reasoning** **using £_ values

- Read and answer the questions described in the homework regarding probabilistic reasoning using the forward-backward algorithm for Hidden Markov Models.
- Send in your work through Biola Canvas.

**Homework#4**
(Naïve Bayes classification): Wednesday**, Nov**.** 23**.

- Purpose: Naïve Bayes classification.
- Send in your work through Biola Canvas.

*************************************************************************************************

**Week 14**. Probabilistic Reasoning Using Hidden Markov
Models (HMMs) VIII | Machine Learning Experiments

**No
class on Monday Dec. 5**: Dr. Lin out of town for a conference

**Reading 14**: Wednesday, **Dec**.** 7**.

- Play with
these AI experiments
by Google and report your thoughts on at least one of the experiments.
- Read Chapter 12 of Machine Learning on machine learning experiments.
- Review pages
257~266 of
*A tutorial on hidden Markov models and selected applications in speech recognition*one more time and carefully read**PP. 264~265**to make sure you understand the solution to**the third problem**for hidden Markov models.

**Homework #5:** (Linear
Regression and Linear Models): Wednesday**, Dec**.** 7**.

- Purpose: Linear regression.
- Send in your work through Biola Canvas.

**Homework #6**: (Decision tree induction based on entropy
and information gain): Wednesday**, Dec**.** 14**.

- Note: If there is a
tie of entropy reduction, break the tie arbitrarily.
- Purpose: Entropy and decision tree induction.
- Send in your work through Biola Canvas.

**Programming 4B:** **Spelling-Recognition with Training Data. Due **Wednesday**, Dec**.** 14****
(submission open till 19)**

¡P
**Updated demo
executable**: Please
download and carefully play
with the new demo executable (updated Dec. 14,
2016) for automatic recovery of a message X described below. The previous
version did not correctly rank the 4 most likely candidate words (for each
corrupted word) in the descending order of their probabilities. This new
version fixes the bug and correctly ranks the 4 most likely candidate words (for
each corrupted word) in the descending order of their probabilities. It also correctly
calculates the rates of accuracy of the recovered message according to the top
1 list, the top 2 list, the top 3 list, and the top 4 list respectively.

¡P
**Demo executable and the
programming task**: Please download
and carefully play with options L R, T, and U provided in the new demo executable (updated Dec. 14, 2016) for
automatic recovery of a message X described below. In your Programming #4A, you
have already implemented Option L, and now you need to enhance the new options
R, T, and U such that you can go through the steps to recognize from corruptedMessage1.txt and corruptedMessage2.txt as the results of Mr. X trying
to type an unknown document recorded in messageX.txt
twice given that all the original words are in vocabulary.txt.

**Final Test (Test #2): Take-home
open-book test ****Due **Monday**, Dec**.** 19**.

- Login into Canvas==> Assignment ==>
Test #2.

It is available online. No collaboration allowed. **Preparation:**Need to fully understand the handout and a power point slide set on HMMs for spelling recognition.**Preparation:**Need to fully understand pages 257~266 of*A tutorial on hidden Markov models and selected applications in speech recognition*.**Preparation:**Need to fully understand all the subjects covered by all the homework assignments: Homework#1~ Homework#6.

*************************************************************************************************

**TA hours: T Th 1:00~4:00pm
(Alvin Suh, William Tan), MATH/CS Alcove lab**

** **

******************************************************************************************************************

**Links to**** online resources**

- Machine Learning: online support
materials by the author
- Artificial
Intelligence A Modern Approach: Online slides
- Weka 3: Data Mining
Software in Java

**To the ****Top**** of the Page **