Artificial Intelligence

CSCI 440, fall semester, 2018

To the Bottom of the Page

Instructor: Dr. Shieu-Hong Lin () Course Syllabus

Class: MW 12:00-13:15 pm at Busn 210

Submission of all your work: go to Biola Canvas Your grades: see them under Biola Canvas

*************************************************************************************************

Week 1. Overview of the Landscape of Machine Learning

Reading 1: Report due Wednesday, Due: Wednesday, Sept. 12

Read Chapter 1 of Explorations in Artificial Intelligence and Machine Learning (CRC Press).
Browse this survey article (Communications of the ACM, Vol. 53 No. 12) on Bayesian Networks.
Browse this general introduction to the key tools in the SciPy ecosystem.
Explore the Pyplot tutorial on Matplotlib.org and this Jupiter notebook.
Send in your reading report through Biola Canvas.

Showcase: Application of Hidden Markov Models (HMMs) as Bayesian Networks

Browse this spelling recognition showcase
Browse the handout and a power point slide set on HMMs for spelling recognition.

Lab # 1 Rock-Paper-Scissor: Report due Wednesday, Due: Wednesday, Sept. 12

1. Collecting data: Download, unzip, and run rock-paper-scissor Agent#1 (or this alternative x64 executable) for a couple of times. Each time the program would require you to play with the agent for 100 matches and yield a transcript file RPS_transcript.txt about the outcomes of these 100 matches in the same folder. You can rename these text transcripts and then put them together into a single combined transcript file of all matches. What is percentage of matches in which you won? What is percentage of matches in which you lost?

2. Learning from data: Try to learn from the results in the transcript of matches to improve your chance of winning the game. Then play with Agent #1 again based on what you have learned from the data in Step #1. Put down (i) what you have learned from the data and (ii) whether it did help you to improve the chance of winning the game into a WORD or text document.

3. Submission of your work: Upload the combined transcript in Step #1 and the file of your thoughts and exploration in Step #2 file under canvas.

*************************************************************************************************

Week 2. Probabilistic Models for Reasoning: An Introduction to Hidden Markov Models (HMMs) and Bayesian Networks

Reading 2: Report due Wednesday, Due: Wednesday, Sept. 19

Carefully read the handout and a power point slide set on HMMs for spelling recognition. Play with the demo executable here to interact with the models.
Browse an Application of Machine Learning in Computer Graphics and think about the analogy between this application and our spelling recognition project.
Browse Chapter 3 of Explorations in Artificial Intelligence and Machine Learning (CRC Press).
Send in your reading report through Biola Canvas.

Programming #1A: Wednesday, Sept. 19.

Download and carefully examine this C++ project framework to finish the implementation of the function prCharGivenCharOfState in model.cpp. This will make the keyboard model work to provide the service in option k. (If you have time, you can go on to implement getPrTableForPossibleInitialStates and getPrTableForPossibleNextStates in model.cpp to finish Programming #1B too.) See the global variables for storing parameter settings of the keyboard model and the spelling model in mode.cpp. Read these comments about the functions in model.cpp. The function prCharGivenCharOfState implements the keyboard model. Check to make sure your program works to provide option K and produce the same probabilities as the given demo program demo executable here does. Note that let’s assume the keyboard only generates lower-case letters and just implement your functions to handle lower-case letters only.
Submission: (i) Compress your entire Program 1A folder into a zip file and upload it through Biola Canvas. (ii) Carefully fill out this self-evaluation report and upload it through Biola Canvas.

*************************************************************************************************

Week 3. Hidden Markov Models (HMMs) for Spelling Recognition: Implementation of the keyboard model and the spelling model

Reading 3: Report due Wednesday, Due: Wednesday, Sept. 26

One more time: carefully read the handout and a power point slide set on HMMs for spelling recognition. Play with the demo executable here to interact with the models.
Browse Chapter 3 of Explorations in Artificial Intelligence and Machine Learning (CRC Press).
Send in your reading report through Biola Canvas.

Programming #1B: Wednesday, Sept. 26.

Continue your project from Programming #1A and finish the implementation of all three functions prCharGivenCharOfState, getPrTableForPossibleInitialStates, and getPrTableForPossibleNextStates in model.cpp. See the global variables for storing parameter settings of the keyboard model and the spelling model in mode.cpp. Read these comments about these functions in model.cpp. These functions implement the keyboard model and the spelling model. Check to make sure your program works to provide both options K and S and produce the same probabilities as the given demo program in Reading #1 does. Note that let’s assume the keyboard only generates lower-case letters and just implement your functions to handle lower-case letters only.
Submission: (i) Compress your entire Program 1B folder into a zip file and upload it through Biola Canvas. (ii) Carefully fill out this self-evaluation report and upload it through Biola Canvas.

*************************************************************************************************

Week 4. Simulation of Typing using Hidden Markov Models (HMMs) + Basics of Probabilistic Reasoning Using HMMs

Reading 4: Wednesday, Oct 3.

Carefully read pages 257~261 of A tutorial on hidden Markov models and selected applications in speech recognition by Rabiner. In particular, carefully read P. 261 of A tutorial on hidden Markov models and selected applications in speech recognition to make sure you understand (i) how you can use a hidden Markov model to simulate the generation of a sequence of observations and (ii) the three basic problems for hidden Markov models.
Carefully read up to Section 2.4 in Chapter 3 of Explorations in Artificial Intelligence and Machine Learning (CRC Press) on the three fundamental problems of HMMs.
Note: the descriptions in Chapter 3 of Explorations in Artificial Intelligence and Machine Learning is mostly a rewriting parallel to the contents in A tutorial on hidden Markov models and selected applications in speech recognition by Rabiner.
Optional: Read Chapter 1 and browse Chapter 2 of Python Data Science Handbook by Jake VanderPlas.
Send in your reading report through Biola Canvas.

Programming #2A: due Wednesday, Oct 3.

Download this expanded C++ project framework and finish the implementation of two additional functions getKeyboardProbabilityTable and typeOneChar in model.cpp according to the descriptions here about these functions. They are needed to simulate the behavior of typing a given character. Check to make sure your program works to produce behavior of typing individual characters (under option A) consistent with that produced by the demo executable.
Submission: (i) Compress your entire Program 2A folder into a zip file and upload it through Biola Canvas. (ii) Carefully fill out this self-evaluation report and upload it through Biola Canvas.

Homework #1: due Wednesday, Oct. 3.

Read and answer the questions described in the homework regarding probabilistic reasoning using Hidden Markov Models for spelling recognition.
Send in your work through Biola Canvas.

*************************************************************************************************

Weeks 5-6. Hidden Markov Models (HMMs): Simulation of Typing + More on Probabilistic Reasoning

Reading 5-6: Wednesday, Oct 17.

Carefully read pages 260~262 of A tutorial on hidden Markov models and selected applications in speech recognition by Rabiner. In particular, carefully read the first column on P. 262 of A tutorial on hidden Markov models and selected applications in speech recognition to make sure you understand the mathematical framework we went through for doing Questions 3-5 in Homework #1.
Carefully read up to Section 2.5.1 in Chapter 3 of Explorations in Artificial Intelligence and Machine Learning (CRC Press) on the three fundamental problems of HMMs, which contains descriptions parallel to the contents in p. 262 of A tutorial on hidden Markov models and selected applications in speech recognition above.

Programming #2B: due Wednesday, Oct. 17.

Download this expanded C++ project framework and finish the implementation of on additional functions typeOneWord in model.cpp as described in here about these functions. It is needed to simulate behavior of typing a given word based on the keyboard model and the spelling model. Check to make sure your program works to produce typing behavior (under option A) consistent with that produced by the demo executable. In other words, it should produce traces explaining (i) the state transitions according to the simulation of the spelling model and (ii) the hidden states leading to the characters observed according to the simulation of the keyboard model.
Notes on simulation: Carefully read the descriptions in P. 261 of A tutorial on hidden Markov models and selected applications in speech recognition to make sure you understand how you can use a hidden Markov model (i.e. the spelling model + the keyboard model in our spelling recognition domain) to simulate the generation of a sequence of observations ( i.e. a sequence of characters as corrupted word in our spelling recognition domain). For programing #2B, the simulation process stops (i) as soon as we transition into the special final state F or (ii) as soon as the simulation process has generated the maximal number of characters allowed by the call to typeOneWord.
Notes on using C-style character arrays for storing strings: (i) The null character '\0' is used to signal the end of the string storeed in the character array and (ii) you can use the strlen function to determine the length of the string currently stored in a character array. See more about the C-style character arrays for storing strings and related string functions in Sections 3.5.4 of C++ Primer and the C library reference guide.
Submission: (i) Compress your entire Program 2B folder into a zip file and upload it through Biola Canvas. (ii) Carefully fill out this self-evaluation report and upload it through Biola Canvas.

Lab #2 (Supervised leaning for classification using WEKA): Wednesday, Oct. 17.

Browse Chapter 2 of Data Mining: Practical Machine Learning Tools and Techniques (Full-text contents of 3^rd ed. available online through the Biola Library account) to understand the arff data file format used by WEKA.
Install WEKA and unzip this zip file to get the zoo data set and the Iris data set.
(i) Run WEKA and use the Explorer in WEKA. Open the zoo data set. Select the classifier J48 (under the tree section of the classifier menu) and apply it to learn a decision tree from the dataset, and report the decision tree you found in your homework. (ii) Repeat the steps above to open the Iris data set and learn a decision tree from the data set report the decision tree you found in your homework.
Submit in your findings through Biola Canvas.

*************************************************************************************************

Week 7. Supervised Learning: Naïve Bayes Classification| Forward Algorithm for Probabilistic Reasoning on HMMs

Reading 7: Report due Wednesday, Oct. 24.

Carefully read Sections 4.1~4.2 (especially 4.2 on Naïve Bayes classification) of Chapter 4 of Data Mining: Practical Machine Learning Tools and Techniques (Full-text contents of 3^rd ed. available online through the Biola Library account, also see the ppts of Chapter 4 here)
Carefully re-examine pp. 262~263 of A tutorial on hidden Markov models and selected applications in speech recognition to make sure you understand how the Forward Algorithm can the first problem for hidden Markov models. When doing Homework #2 below, you are essentially calculating the α_t(i)’s in systematic way column by column with each column corresponding to a particular point of time t.
Carefully re-examine Section 2.5.1 (especially algorithm 2.1) in Chapter 3 of Explorations in Artificial Intelligence and Machine Learning (CRC Press) on the three fundamental problems of HMMs, which contains descriptions parallel to the contents in pp. 262~263 of A tutorial on hidden Markov models and selected applications in speech recognition above.

Homework #2: Forward algorithm for solving the first HMM Problem: Wednesday, Oct. 24.

On the forward algorithm for probabilistic reasoning

Read and answer the questions described in the homework regarding probabilistic reasoning using the forward algorithm for Hidden Markov Models.
Send in your work through Biola Canvas.

Programming #2C: due Wednesday, Oct. 24.

Implement one additional function typeOneArticle in model.cpp as described here. Add an option B into the menu provided by the main function such that it can simulate typing the entire Biola vision statement twice and store the results (of typing the words in the vision statement) into a file by calling typeOneArticle. See this demo executable including this feature.
Note that this is an easy assignment since you can easily implement typeOneArticle with the help of typeOneWord in Programming #2B.
Submission: (i) Compress your entire Program 2C folder into a zip file and upload it through Biola Canvas. (ii) Carefully fill out this self-evaluation report and upload it through Biola Canvas.

*************************************************************************************************

Week 8. Supervised Learning: Decision Trees | Implementation of the Forward Algorithm

Reading 8: Wednesday, Oct. 31.

Carefully review Section 4.2 on Naïve Bayes classification in Chapter 4 of Data Mining: Practical Machine Learning Tools and Techniques (Full-text contents of 3^rd ed. available online through the Biola Library account, , also see the ppts of Chapter 4 here)
Carefully read Section 4.3 on decision-tree classification in Chapter 4 of Data Mining: Practical Machine Learning Tools and Techniques (Full-text contents of 3^rd ed. available online through the Biola Library account, also see the ppts of Chapter 4 here)
Send in your reading report through Biola Canvas.

Homework#3 (Naïve Bayes classification): Wednesday, Oct. 31.

Purpose: Naïve Bayes classification.
Send in your work through Biola Canvas.

Programming #3A: due Wednesday, Nov. 7.

Implement this function prOf1CharSeriesWhenTyping1Word so that we can determine Pr(d |w, p) the probability that a person p will end in a series of characters d while trying to type the word w. See this demo executable, which supports this feature. You should implement the forward algorithm you study in Homework #2 into the function prOf1CharSeriesWhenTyping1Word to automatically calculate the probability of observing a series of characters d while trying to type the word w.
Submission: (i) Compress your entire Program 3A folder into a zip file and upload it through Biola Canvas. (ii) Carefully fill out this self-evaluation report and upload it through Biola Canvas.

*************************************************************************************************

Week 9. Supervised Learning: Basics of Linear Models | Identity Recognition Based on Typing/Spelling Behaviors

Reading 9: Wednesday, Nov. 7.

Carefully read Section 4.6 on linear models in Chapter 4 of Data Mining: Practical Machine Learning Tools and Techniques (Full-text contents of 3^rd ed. available online through the Biola Library account, also see the ppts of Chapter 4 here)

Homework #4: (Decision tree induction based on entropy and information gain): Wednesday, Nov. 7

Note: If there is a tie of entropy reduction (i.e. information gain), break the tie arbitrarily.
Purpose: Concepts about entropy and decision tree induction.
Send in your work through Biola Canvas.

Programming #3B: due Wednesday, Nov. 14.

Carefully read the conceptual approach to identity recognition identifying the most likely authors of documents based on their typing/spelling behaviors. For Programming #3B, we want to implement this function logPrOfGettingDocument1WhenTypingDocument2, which should determine and return the logarithm of the probability of getting document1 as the result of a person p trying to type the words in document2. Note that as shown in the conceptual approach to identity recognition, the tasks can be accomplished by calculating ∑ log Pr(d_i |w_i, p) where Pr(d_i |w_i, p) is the probability that person p will end in the i th character string d_i in the file of document1 while trying to type i th word w_i in the file of document2. See this demo executable, which supports this feature.
Note: See here about logarithm functions available in C.
Experiment: Do Experiment for Programming 3B and (as described in the experiment) record the required findings into the self-evaluation report for Programming 3B.
Submission: (i) Compress your entire Program 3B folder into a zip file and upload it through Biola Canvas. (ii) Carefully fill out this self-evaluation report, including the findings from Experiment for Programming 3B. Upload the report through Biola Canvas.

*************************************************************************************************

Week 10. Supervised Learning: Support Vector Machines and More on Linear Models | Identity Recognition Based on Typing/Spelling Behaviors

Reading 10: Wednesday, Nov. 14.

Carefully read Section 6.4 on extending linear models in Chapter 6 of Data Mining: Practical Machine Learning Tools and Techniques (Full-text contents of 3^rd ed. available online through the Biola Library account)
Hand-writing simulation using deep neural networks: Note that what is accomplished here for hand-writing simulation is essentially an analogy to what you implemented in 2B for simulation of typing.

*************************************************************************************************

Weeks 11-12. Neural Networks and Deep Learning | Learning HMM Models I

Faith and Learning Integration Assignment on Creation and Computer Science due: Monday, Nov. 19

· Dr. Lin will be out of town for a conference on Nov. 19. Please use the class time for reflection needed to do this assignment.

· You should put down what you have in the reflection process according to the requirement in the assignment.

· Submit your reflection report accordingly through Canvas.

Reading 11: Wednesday, Nov. 21. (submission open till Nov. 26 without penalty)

Carefully read Section 6.4 again on extending linear models in Chapter 6 of Data Mining: Practical Machine Learning Tools and Techniques (Full-text contents of 3^rd ed. available online through the Biola Library account).

Homework #5: (Linear Regression and Linear Models): Wednesday, Nov. 21. (submission open till Nov. 26 without penalty)

Purpose: Linear regression.
Send in your work through Biola Canvas.

Reading 12: Wednesday, Nov. 28.

Browse an overview tutorial here from a graduate course on deep learning and explore demos of deep-learning applications here.

Programming #4A: Wednesday, Nov. 28.

Training and automatic learning of parameter values using grid search: See here.
Submission: (i) Compress your entire Program 4A folder into a zip file and upload it through Biola Canvas. (ii) Carefully fill out this self-evaluation report, including the findings from Experiment 4A. Upload the report through Biola Canvas.

*************************************************************************************************

Week 13. Machine Learning Using Scikitlearn vs Weka | Learning HMM Models II

Reading 13: Wednesday, Dec. 5

Carefully examine the first 2 sections (Intro to ML, Intro to Scikitlearn, Naïve Bayes, Learn Regression) in Chapter 5 on machine learning using Scikitlearn in Python Data Science Handbook by Jake VanderPlas.
Browse the paper: Shieu-Hong Lin, Data Mining for Student Retention Management, Journal of Computing Sciences in Colleges, 27:4 (2012), 92-99.

*************************************************************************************************

Week 14. Machine Learning Using Scikitlearn vs Weka

Reading 14: Wednesday, Dec. 12

Carefully examine the Sections 3-6 (Parameters and Validation, Feature Engineering, ) in Chapter 5 on machine learning using Scikitlearn in Python Data Science Handbook by Jake VanderPlas.
Review the paper: Shieu-Hong Lin, Data Mining for Student Retention Management, Journal of Computing Sciences in Colleges, 27:4 (2012), 92-99.

Programming 4B: Spelling-Recognition with Training Data. Due Wednesday, Dec. 12

· Demo executable: Please download and carefully play with the demo executable for automatic recovery of a message X described below. Demo executable and the programming task: Please download and carefully play with options L R, T, and U provided in the new demo executable for automatic recovery of a message X described below. It correctly ranks the 4 most likely candidate words (for each corrupted word) in the descending order of their probabilities. It also correctly calculates the rates of accuracy of the recovered message according to the top 1 list, the top 2 list, the top 3 list, and the top 4 list respectively.

· Programming Task: In your Programming #4A, you have already implemented Option L, and now you need to enhance the new options R, T, and U such that you can go through the steps to recognize from corruptedMessage1.txt and corruptedMessage2.txt as the results of Mr. X trying to type an unknown document recorded in messageX.txt twice given that all the original words are in vocabulary.txt.

Lab #3: Due: due Wednesday, Dec. 12.

1) Knowing more about WEKA explorer: Read the manual of WEKA 3.7.10 to see (i) how you can open up datasets in CSV (comma separated values) files and save them in arff format for classification tasks and (ii) how you can use classifiers to datasets to learn predictive models and run cross validation experiments. Read the description of cross validation here.

2) Getting the retention datasets and sign the agreement: Log into the Canvas to download 2016_Project.zip under File. Unzip the zip file and explore the contents inside. Use the cleaned data sets in the folder with files in the Arff format. Please do carefully read the enclosed confidential agreement and sign it before you use the retention datasets for this lab assignment. Copy and paste the agreement and the signature into your report.

3) Learning tree models using J48: Use WEKA and apply the J48 method under decision tree to the training datasets in the Data folder separately to learn to learn decision trees as predictive models. Put down the resulting decision trees in your report.

4) Cross validation experiments using J48: Do (3) above again and conduct 10-fold cross validation experiments together accordingly. Based on the results of the cross validation experiments, put down the expected precision and recall of the prediction model in (3) in terms in your report.

5) Cross validation experiments using Naïve Bayes: Instead of the J48 classifier under decision tree, use any Naïve Bayes classifier under lazy and do (4) above again and conduct 10-fold cross validation experiments together accordingly. Based on the results of the cross validation experiments, put down the expected precision and recall of the prediction model in (5) in terms in your report.

6) Cross validation experiments using IBk: Instead of the J48 classifier under decision tree, use the IBk classifier under lazy and do (4) above again and conduct 10-fold cross validation experiments together accordingly. Based on the results of the cross validation experiments, put down the expected precision and recall of the prediction model in (5) in terms in your report.

7) Submission: Upload your report for Lab #3 with the results from Step 2 to Step 5 under Canvas.

*************************************************************************************************

Final take-home exam: Submission due: Dec. 19

· Open-book test. To be announced and discussed on Dec. 12 in the class. Find the problem set under Canvas | Files | FinalTest.zip

*************************************************************************************************

Links to online resources

About Jupiter Notebook
Python Tutorials: Python in a Notebook by Valerio Maggio (GitHub repository)
Python Tutorials: A Whirlwind Tour of Python by Jake VanderPlas (GitHub repository)
Python Data Science Handbook by Jake VanderPlas (GitHub repository)
NumPy QuickStart on scipy.org
Matplotlib beginner’s guide on matpltlib.org
Pandas tutorials by Tom Augspurge (GitHub repository): Modern Pandas, Method Chaining, Indexes, Performance, Tidy Data, Visualization, Timeseries
Python Scientific Lecture Notes on scipy-lecture,org
Weka 3: Data Mining Software in Java

*************************************************************************************************

To the Top of the Page