**To the Bottom
of the Page**

Instructor: Dr. Shieu-Hong Lin () Course Syllabus

Class: MW 12:00-13:15 pm at Busn 210

Office Hours: Dr Lin (Lim 137): MW TR 3:00-5:00pm **email** to confirm an appointment in advance

**Submission of all your work**: go to Biola Canvas **Your grades: **see them under Biola Canvas

*************************************************************************************************

**Week 1**. Overview of the Landscape of Machine Learning

**Reading 1**: Report due Wednesday, Due:
Wednesday, Sept. 12

- Read Chapter 1 of
*Explorations in Artificial Intelligence and Machine Learning*(CRC Press). - Browse this survey article (Communications of the ACM, Vol. 53 No. 12) on Bayesian Networks.
- Browse this general introduction to the key tools in the SciPy ecosystem.
- Explore the Pyplot tutorial on Matplotlib.org and this Jupiter notebook.
- Send in your reading report through Biola Canvas.

**Showcase**: Application of Hidden Markov Models (HMMs) as Bayesian
Networks

- Browse this spelling recognition showcase
- Browse the handout and a power point slide set on HMMs for spelling recognition.

**Lab
# 1 Rock-Paper-Scissor:** Report due Wednesday, Due:
Wednesday, Sept. 12

1. **Collecting data: **Download, unzip, and** run** rock-paper-scissor Agent#1 (or this
alternative x64 executable) **for a couple of times. Each time the
program would require you to play with the agent for 100 matches and yield a
transcript file ***RPS_transcript.txt*** about the outcomes of these 100 matches in
the same folder**. You can rename these text transcripts and then **put them together
**into a single combined transcript file of all matches. What is percentage
of matches in which you won? What is percentage of matches in which you lost?

2. **Learning from data**: Try to learn from** **the results in the transcript of
matches to improve your chance of winning the game. Then play with Agent #1
again based on what you have learned from the data in Step #1. **Put down** (i) what you have learned
from the data and (ii) whether it did help you to improve the chance of winning
the game into a WORD or text document.

3. **Submission of your work**: Upload **the combined transcript** in Step #1 and
the file of your thoughts and exploration in Step #2 file under canvas.

*************************************************************************************************

**Week 2**. Probabilistic Models for Reasoning: An
Introduction to Hidden Markov Models (HMMs) and Bayesian Networks

**Reading 2**: Report due Wednesday, Due:
Wednesday, Sept. 19

**Carefully read**the handout and a power point slide set on HMMs for spelling recognition.**Play with****the demo**executable here to interact with the models.- Browse an Application of Machine Learning in Computer Graphics and think about the analogy between this application and our spelling recognition project.
- Browse Chapter 3 of
*Explorations in Artificial Intelligence and Machine Learning*(CRC Press). - Send in your reading report through Biola Canvas.

**Programming #1A**: Wednesday, Sept. 19.

- Download and
carefully examine this C++ project framework
to finish the implementation of the function
*prCharGivenCharOfState**k*. (If you have time, you can go on to implement*getPrTableForPossibleInitialStates**getPrTableForPossibleNextStates*in model.cpp to finish Programming #1B too.) See the global variables for storing parameter settings of the keyboard model and the spelling model in mode.cpp. Read these comments about the functions in model.cpp. The function*prCharGivenCharOfState**K*and produce the same probabilities as the given demo program**demo**executable here does. Note that let¡¦s assume the keyboard only generates lower-case letters and just implement your functions to handle lower-case letters only. **Submission: (i)**Compress your entire Program 1A folder into a zip file and upload it through Biola Canvas.**(ii)**Carefully fill out this self-evaluation report and upload it through Biola Canvas.

*************************************************************************************************

**Week 3**. Hidden Markov Models (HMMs) for Spelling
Recognition: Implementation of the keyboard model and the spelling model

**Reading 3**: Report due Wednesday, Due:
Wednesday, Sept. 26

**One more time: carefully read**the handout and a power point slide set on HMMs for spelling recognition.**Play with****the demo**executable here to interact with the models.- Browse Chapter 3 of
*Explorations in Artificial Intelligence and Machine Learning*(CRC Press). - Send in your reading report through Biola Canvas.

**Programming #1B**: Wednesday, Sept. 26.

- Continue your
project from Programming #1A and finish the implementation of all three
functions
*prCharGivenCharOfState**, getPrTableForPossibleInitialStates,*and*getPrTableForPossibleNextStates*in model.cpp. See the global variables for storing parameter settings of the keyboard model and the spelling model in mode.cpp. Read these comments about these functions in model.cpp. These functions implement the keyboard model and the spelling model. Check to make sure your program works to provide both options*K*and*S*and produce the same probabilities as the given demo program in Reading #1 does. Note that let¡¦s assume the keyboard only generates lower-case letters and just implement your functions to handle lower-case letters only. **Submission: (i)**Compress your entire Program 1B folder into a zip file and upload it through Biola Canvas.**(ii)**Carefully fill out this self-evaluation report and upload it through Biola Canvas.

*************************************************************************************************

**Week 4**. Simulation of Typing using Hidden Markov
Models (HMMs) + Basics of Probabilistic Reasoning Using HMMs

**Reading 4**: Wednesday, Oct 3.

**Carefully read**pages 257~261 of*A tutorial on hidden Markov models and selected applications in speech recognition*by Rabiner. In particular, carefully read P. 261 of*A tutorial on hidden Markov models and selected applications in speech recognition*to make sure you understand (i) how you can use a hidden Markov model to simulate the generation of a sequence of observations and (ii) the three basic problems for hidden Markov models.**Carefully read**up to Section 2.4 in Chapter 3 of*Explorations in Artificial Intelligence and Machine Learning*(CRC Press) on the three fundamental problems of HMMs.**Note**: the descriptions in Chapter 3 of*Explorations in Artificial Intelligence and Machine Learning*is mostly a rewriting parallel to the contents in*A tutorial on hidden Markov models and selected applications in speech recognition*by Rabiner.**Optional**: Read Chapter 1 and browse Chapter 2 of*Python Data Science Handbook*by Jake VanderPlas.- Send in your reading report through Biola Canvas.

**Programming #2A**: due Wednesday, Oct 3.

- Download this expanded C++ project framework and
finish the implementation of two additional functions
and*getKeyboardProbabilityTable*in model.cpp according to the descriptions here about these functions. They are needed to simulate the behavior of typing a given character. Check to make sure your program works to produce behavior of typing individual characters (under option A) consistent with that produced by the demo executable.*typeOneChar* **Submission: (i)**Compress your entire Program 2A folder into a zip file and upload it through Biola Canvas.**(ii)**Carefully fill out this self-evaluation report and upload it through Biola Canvas.

**Homework #1**: due** **Wednesday**, Oct**. **3**.

- Read and answer the questions described in the
homework regarding probabilistic reasoning using Hidden Markov Models for
spelling recognition.
- Send in your work through Biola
Canvas.

*************************************************************************************************

**Weeks 5-6**. Hidden Markov Models (HMMs): Simulation of
Typing + More on Probabilistic Reasoning

**Reading 5-6**: Wednesday, Oct 17.

**Carefully read**pages 260~262 of*A tutorial on hidden Markov models and selected applications in speech recognition*by Rabiner. In particular, carefully read the first column on P. 262 of*A tutorial on hidden Markov models and selected applications in speech recognition*to make sure you understand the mathematical framework we went through for doing Questions 3-5 in Homework #1.**Carefully read**up to Section 2.5.1 in Chapter 3 of*Explorations in Artificial Intelligence and Machine Learning*(CRC Press) on the three fundamental problems of HMMs, which contains descriptions parallel to the contents in p. 262 of*A tutorial on hidden Markov models and selected applications in speech recognition*above.

**Programming #2B**: due Wednesday, **Oct**.
**17**.

- Download this expanded C++ project framework and finish
the implementation of on additional functions
in model.cpp as described in here about these functions. It is needed to simulate behavior of typing a given word based on the keyboard model and the spelling model. Check to make sure your program works to produce typing behavior (under option A) consistent with that produced by the demo executable. In other words, it should produce traces explaining*typeOneWord***(i)****the****state transitions**according to the simulation of the spelling model and**(ii)****the****hidden states leading to the characters observed**according to the simulation of the keyboard model. **Notes on simulation**: Carefully read the descriptions in P. 261 of*A tutorial on hidden Markov models and selected applications in speech recognition*to make sure you understand how you can use a hidden Markov model (**i.e.**the spelling model + the keyboard model in our spelling recognition domain) to simulate the generation of a sequence of observations (**i.e.**a sequence of characters*F*or (ii) as soon as the simulation process has generated the maximal number of characters allowed by the call to*typeOneWord***.****Notes on using C**-**style character arrays**for storing strings: (i) The null character '\0' is used to signal the end of the string storeed in the character array and (ii) you can use the*strlen*function to determine the length of the string currently stored in a character array. See more about the C-style character arrays for storing strings and related string functions in Sections 3.5.4 of C++ Primer and the C library reference guide.**Submission: (i)**Compress your entire Program 2B folder into a zip file and upload it through Biola Canvas.**(ii)**Carefully fill out this self-evaluation report and upload it through Biola Canvas.

**Lab #2 **(Supervised leaning for classification using
WEKA): Wednesday**, Oct**. **17**.

- Browse Chapter 2
of
*Data Mining: Practical Machine Learning Tools and Techniques*(Full-text contents of 3^{rd}ed. available online through the Biola Library account) to understand the arff data file format used by WEKA. - Install WEKA and unzip this zip file to get the zoo data set and the Iris data set.
- (i) Run WEKA and use the
**Explorer**in WEKA. Open the zoo data set. Select the classifier J48 (under the tree section of the classifier menu) and apply it to learn a decision tree from the dataset, and report the decision tree you found in your homework. (ii) Repeat the steps above to open the Iris data set and learn a decision tree from the data set report the decision tree you found in your homework. - Submit in your findings through Biola Canvas.

*************************************************************************************************

**Week 7**. Supervised Learning: Naïve Bayes Classification| Forward
Algorithm for Probabilistic Reasoning on HMMs

Reading 7: Report due Wednesday**, Oct**.** 24**.

- Carefully read Sections 4.1~4.2 (especially 4.2 on
Naïve Bayes classification) of Chapter
4 of
*Data Mining: Practical Machine Learning Tools and Techniques*(Full-text contents of 3^{rd}ed. available online through the Biola Library account, also see the ppts of Chapter 4 here) - Carefully re-examine pp. 262~263 of
*A tutorial on hidden Markov models and selected applications in speech recognition*to make sure you understand how the Forward Algorithm can the first problem for hidden Markov models. When doing Homework #2 below, you are essentially calculating the¡¦s in systematic way column by column with each column corresponding to a particular point of time*£\*_{t}(i)*t*. - Carefully
*Explorations in Artificial Intelligence and Machine Learning*(CRC Press) on the three fundamental problems of HMMs, which contains descriptions parallel to the contents in pp. 262~263 of*A tutorial on hidden Markov models and selected applications in speech recognition*above.

**Homework #2: Forward
algorithm for solving the first HMM Problem**: Wednesday**, Oct**.** 24**.

On the forward algorithm** **for probabilistic reasoning** **

- Read and answer the questions described in the homework regarding probabilistic reasoning using the forward algorithm for Hidden Markov Models.
- Send in your work through Biola Canvas.

**Programming #2C**: due Wednesday, **Oct. 24**.

- Implement one
additional function
in model.cpp as described here. Add an option B into the menu provided by the main function such that it can simulate typing the entire Biola vision statement twice and store the results (of typing the words in the vision statement) into a file by calling*typeOneArticle*. See this demo executable including this feature.*typeOneArticle* - Note that
this is an easy assignment since you can easily implement
with the help of*typeOneArticle*in*typeOneWord***Programming #2B.** **Submission: (i)**Compress your entire Program 2C folder into a zip file and upload it through Biola Canvas.**(ii)**Carefully fill out this self-evaluation report and upload it through Biola Canvas.

*************************************************************************************************

**Week 8**. Supervised Learning:
Decision Trees |
Implementation of the Forward Algorithm

**Reading 8**: Wednesday, **Oct**. **31**.

- Carefully review Section 4.2 on Naïve Bayes
classification in Chapter 4 of
*Data Mining: Practical Machine Learning Tools and Techniques*(Full-text contents of 3^{rd}ed. available online through the Biola Library account, , also see the ppts of Chapter 4 here) - Carefully read Section 4.3 on decision-tree classification
in Chapter 4 of
*Data Mining: Practical Machine Learning Tools and Techniques*(Full-text contents of 3^{rd}ed. available online through the Biola Library account, also see the ppts of Chapter 4 here) - Send in your reading report through Biola Canvas.

**Homework#3**
(Naïve Bayes classification): Wednesday**, Oct**. **31**.

- Purpose: Naïve Bayes classification.
- Send in your work through Biola Canvas.

**Programming #3A**: due Wednesday, **Nov**.
**7**.

- Implement this function
*prOf1CharSeriesWhenTyping1Word*so that we can determine*Pr**(*the probability that a person**d |w, p**)will end in a series of characters*p*while trying to type the word*d*.*w**demo executable*, which supports this feature.**You should implement the forward algorithm you study in Homework #2**into the function*prOf1CharSeriesWhenTyping1Word***to automatically calculate the probability of observing**a series of characterswhile trying to type the word*d**w*. **Submission**:**(i)**Compress your entire Program 3A folder into a zip file and upload it through Biola Canvas.**(ii)**Carefully fill out this self-evaluation report and upload it through Biola Canvas.

*************************************************************************************************

**Week 9**. Supervised Learning: Basics of Linear Models | Identity Recognition Based on Typing/Spelling Behaviors

**Reading 9**: Wednesday, **Nov**. **7**.

- Carefully read Section 4.6 on linear models in Chapter 4 of
*Data Mining: Practical Machine Learning Tools and Techniques*(Full-text contents of 3^{rd}ed. available online through the Biola Library account, also see the ppts of Chapter 4 here)

**Homework #4**: (Decision tree induction based on entropy
and information gain): Wednesday,
**Nov**. **7**

- Note: If there is a
tie of entropy reduction (i.e. information gain), break the tie
arbitrarily.
- Purpose: Concepts about entropy and
decision tree induction.
- Send in your work through Biola Canvas.

**Programming #3B**: due Wednesday**, Nov**.** 14**.

- Carefully
read the conceptual approach to
identity recognition identifying the most likely authors of documents
based on their typing/spelling behaviors. For Programming #3B, we want to implement this function
*logPrOfGettingDocument1WhenTypingDocument2*, which should determine and return the logarithm of the probability of getting*document1*as the result of a person*p**document2*. Note that as shown in the conceptual approach to identity recognition, the tasks can be accomplished by calculating*∑ log Pr(*where**d**)_{i}|w_{i}, p*Pr**(*is the probability that person**d**)_{i}|w_{i}, pwill end in the*p**i*th character stringin the file of*d*_{i}*document1*while trying to type*i*th word*w*_{i}in the file of*document2*. See this*demo executable*, which supports this feature. - Note: See here
about logarithm functions available in C.
**Experiment**: Do Experiment for Programming 3B and (as described in the experiment) record the required findings into the self-evaluation report for Programming 3B.**Submission**:**(i)**Compress your entire Program 3B folder into a zip file and upload it through Biola Canvas.**(ii)**Carefully fill out this self-evaluation report,**including the findings from Experiment for Programming 3B**. Upload the report through Biola Canvas.

*************************************************************************************************

**Week 10**. Supervised Learning: Support Vector Machines
and More on Linear Models | Identity Recognition Based on Typing/Spelling
Behaviors

**Reading 10**: Wednesday, **Nov**. **14**.

- Carefully read Section 6.4 on extending linear models
in Chapter 6 of
*Data Mining: Practical Machine Learning Tools and Techniques*(Full-text contents of 3^{rd}ed. available online through the Biola Library account) - Hand-writing simulation using deep neural networks: Note that what is accomplished here for hand-writing simulation is essentially an analogy to what you implemented in 2B for simulation of typing.

*************************************************************************************************

**Weeks 11-12**.
Neural Networks and Deep Learning | Learning HMM Models I

**Faith and Learning Integration Assignment on
Creation and Computer Science** due: Monday,
Nov. 19

¡P
**Dr. Lin will be out of town for a conference on Nov. 19.
Please use the class time for reflection needed to do this assignment.**

¡P
You should put down what you
have in the reflection process according to the requirement in the assignment.

¡P
**Submit your reflection
report accordingly through Canvas.**

**Reading 11**: Wednesday, **Nov**. **21**. (submission open till Nov. 26 without
penalty)

**Carefully read Section 6.4 again**on extending linear models in Chapter 6 of*Data Mining: Practical Machine Learning Tools and Techniques*(Full-text contents of 3^{rd}ed. available online through the Biola Library account).

**Homework #5:** (Linear
Regression and Linear Models): Wednesday**, Nov**.** 21**. (submission open till Nov. 26 without penalty)

- Purpose: Linear regression.
- Send in your work through Biola Canvas.

**Reading 12**: Wednesday, **Nov**. **28**.

- Browse an overview tutorial here
from a graduate course on deep learning and explore demos of deep-learning
applications here.

**Programming #4A**: Wednesday**, Nov**.** 28**.

**Training and automatic learning of parameter values**:**See**here**.****Submission**:**(i)**Compress your entire Program 4A folder into a zip file and upload it through Biola Canvas.**(ii)**Carefully fill out this self-evaluation report, including the findings from Experiment 4A. Upload the report through Biola Canvas.

*************************************************************************************************

**Links to**** online resources**

- About
Jupiter Notebook
- Python Tutorials:
*Python in a Notebook*by Valerio Maggio (GitHub repository) - Python Tutorials:
*A Whirlwind Tour of Python*by Jake VanderPlas (GitHub repository) *Python Data Science Handbook*by Jake VanderPlas (GitHub repository)- NumPy QuickStart on scipy.org
- Matplotlib beginner¡¦s guide on
matpltlib.org
- Pandas tutorials by Tom
Augspurge (GitHub repository): Modern Pandas, Method Chaining, Indexes, Performance, Tidy Data, Visualization, Timeseries
- Python Scientific Lecture Notes on scipy-lecture,org
- Weka 3: Data Mining
Software in Java

*************************************************************************************************

**To the ****Top**** of the Page **