**To the Bottom
of the Page**

Instructor: Dr. Shieu-Hong Lin () Course Syllabus

Class: TR 12:00-13:15 pm at Busn 210

**Submission of all your work**: go to Biola Canvas **Your grades: **see them under Biola Canvas

*************************************************************************************************

**Week 1**. Overview of the Landscape of Machine Learning

**Reading 1**: Report due: Thursday, Sept. 13

- Read Chapter 1 of
*Explorations in Artificial Intelligence and Machine Learning*(CRC Press). - Browse this Jupiter Notebook tutorial.
- Browse this general introduction to the key tools in the SciPy ecosystem.

**Explorations**:

- Download and install Anaconda (Python 3.6 version) on your own computer.
- Explore the Pyplot tutorial on Matplotlib.org and this Jupiter notebook.
- Explore
*Python Data Science Handbook*by Jake VanderPlas (and his notebooks on Github)

**Lab
# 1 Rock-Paper-Scissor:** Report due: Thursday, Sept.
13

1. **Collecting data: **Download, unzip, and** run** rock-paper-scissor Agent#1
(or this alternative x64
executable) **for a couple of times.
Each time the program would require you to play with the agent for 100 matches
and yield a transcript file ***RPS_transcript.txt*** about the outcomes of these 100 matches in
the same folder**. You can rename these text transcripts and then **put them together
**into a single combined transcript file of all matches. What is percentage
of matches in which you won? What is percentage of matches in which you lost?

2. **Learning from data**: Try to learn from** **the results in the transcript of
matches to improve your chance of winning the game. Then play with Agent #1
again based on what you have learned from the data in Step #1. **Put down** (i) what you have learned
from the data and (ii) whether it did help you to improve the chance of winning
the game into a WORD or text document.

3. **Submission of your work**: Upload **the combined transcript** in Step #1 and
the file of your thoughts and exploration in Step #2 file under canvas.

*************************************************************************************************

**Week 2**. Basics of NumPy I + Concepts of Inputs for
Data Mining

**Reading 2**: Report due: Thursday, Sept. 20

- Read Chapter
1 and Chapter 2 of
*Python Data Science Handbook*by Jake VanderPlas. - Browse Chapter 2
of
*Data Mining: Practical Machine Learning Tools and Techniques*(Full-text contents of 3^{rd}ed. available online through the Biola Library account)

**Lab
# 2 Rock-Paper-Scissor analysis using Python:** Report
due: Thursday, Sept. 20

4. **Loading data collected in Lab#1 into a Numpy array: **Consider the combined transcript file of
all matches in Lab #1 again. Use numpy.loadtext (example,
documentation)
to load the data into a numpy array.

5. **Exploring the data using Numpy**: Try to apply some basic Numpy
facilities you have learned from the reading to analyze the data. **Put down**
what you have done in your explorations and your findings in a file.

6. **Submission of your work**: Upload **the combined transcript** in Step #1 and
the file of your thoughts and exploration in Step #2 file under canvas.

*************************************************************************************************

**Week 3**. Basics of NumPy II + Data Mining and
Knowledge Representation

**Reading 3**: Report due: Thursday, Sept. 27

- Browse
Chapter 3 of
*Python Data Science Handbook*by Jake VanderPlas. - Browse Chapter 3
of
*Data Mining: Practical Machine Learning Tools and Techniques*(Full-text contents of 3^{rd}ed. available online through the Biola Library account)

**Quiz #1 on the basics of Numpy**: Thursday, Sept. 27

**Open-book test**in the class,**but no Python interpreter in any form allowed during the test**.

*************************************************************************************************

**Week 4**. Basics of Pandas I + Data Mining and
Knowledge Representation

**Reading 4**: Report due: Thursday, Oct. 4

**Carefully read**the first 4 sections (up to handling missing data) in Chapter 3 of*Python Data Science Handbook*by Jake VanderPlas.- Browse Chapter 3
of
*Data Mining: Practical Machine Learning Tools and Techniques*(Full-text contents of 3^{rd}ed. available online through the Biola Library account)

*************************************************************************************************

**Weeks 5-6**. Basics of Pandas II + Supervised Learning
+ Torrey Conference

**Reading 5-6**: Report due: Thursday, Oct. 18

**Carefully read**the next 4 sections (up to the section on*aggregation and grouping*) in Chapter 3 of*Python Data Science Handbook*by Jake VanderPlas.- Browse Chapter 3
of
*Data Mining: Practical Machine Learning Tools and Techniques*(Full-text contents of 3^{rd}ed. available online through the Biola Library account)

**Lab #3 **(Supervised leaning for classification using
WEKA): Thursday, Oct. 18

- If needed,
browse Chapter
2 of
*Data Mining: Practical Machine Learning Tools and Techniques*(Full-text contents of 3^{rd}ed. available online through the Biola Library account) to understand the arff data file format used by WEKA. - Install WEKA and unzip this zip file to get the zoo data set and the Iris data set.
- (i) Run WEKA and use the
**Explorer**in WEKA. Open the zoo data set. Select the classifier J48 (under the tree section of the classifier menu) and apply it to learn a decision tree from the dataset, and report the decision tree you found in your homework. (ii) Repeat the steps above to open the Iris data set and learn a decision tree from the data set report the decision tree you found in your homework. - Submit in your findings through Biola Canvas.

**Quiz #2 on the basics of
Panda I & II**: Thursday, Oct. 18

**Open-book test**. No collaboration with others allowed.**Submission link open till**11:59am Tuesday, Oct. 23- Emphasis on ¡¥Data Indexing and Selection¡¦, ¡¥Operating
on Data in Pandas¡¦, and ¡¥Aggregation and Grouping¡¦ in Chapter 3 of
*Python Data Science Handbook*.

*************************************************************************************************

**Week 7**. More on Pandas + Naïve Bayes for Supervised
Learning

Reading 7: Report due Thursday, Oct. 25

- Carefully read Sections 4.1~4.2 (especially 4.2 on
Naïve Bayes classification) of Chapter 4
of
*Data Mining: Practical Machine Learning Tools and Techniques*(Full-text contents of 3^{rd}ed. available online through the Biola Library account) - Carefully
read the next 3 sections (from the section on
*pivot tables*to the section on*working with time series*) in Chapter 3 of*Python Data Science Handbook*by Jake VanderPlas.

*************************************************************************************************

**Week 8**. Matplotlib + More on Naïve Bayes for
Supervised Learning

Reading 8: Report due Thursday, Nov. 1

- Carefully
read the first 5 sections (up to the section on
*Visualizing Errors*) in Chapter 4 of*Python Data Science Handbook*by Jake VanderPlas. - Carefully review Section 4.2 on Naïve Bayes
classification in Chapter 4
of
*Data Mining: Practical Machine Learning Tools and Techniques*(Full-text contents of 3^{rd}ed. available online through the Biola Library account)

**Lab
4** (Naïve Bayes classification): Thursday, Nov.
1

- Purpose: Naïve Bayes classification.
- Note: We¡¦ll have a follow-up lab later that requires you to create a Jupyter notebook to automate the steps using Pandas and Numpy.
- Send in your work through Biola Canvas.

*************************************************************************************************

**Week 9**. More on Matplotlib + Supervised Learning:
Decision Trees

**Reading 9**: Thursday, Nov. 8

- Read the next
5 sections (up to the section on
*Multiple Subplots*) in Chapter 4 of*Python Data Science Handbook*by Jake VanderPlas. - Carefully read Section 4.3 on decision-tree
classification in Chapter 4 of
*Data Mining: Practical Machine Learning Tools and Techniques*(Full-text contents of 3^{rd}ed. available online through the Biola Library account, also see the ppts of Chapter 4 here)

**Lab 5** (Naïve Bayes classification): Thursday, Nov. 8

- Task: Redo
**Lab 4**by using Pandas and the data set in this csv file. Create a Jupyter notebook using Pandas to read the csv data file and automate the steps you went through in**Lab 4**for collecting statistics and doing Naïve Bayes classification using Pandas and Numpy. You need to show in the end of the notebook how you use the statistics collected to classify the test case in Lab 4. - Hint: You may want examine the class note "
*Class1030RPS_Pandas_Groupby - y.ipynb**"*under*Canvas | File*to see how either*unstack*, or*pivot_table*, or*crosstab*may play a convenient role in this lab assignment. - Send in your Jupyter notebook through Biola Canvas.

**Homework #1**: (Decision tree induction based on entropy
and information gain): Thursday,
Nov. 8

- Note: If there is a
tie of entropy reduction (i.e. information gain), break the tie
arbitrarily.
- Purpose: Concepts about entropy and
decision tree induction.
- Send in your work through Biola Canvas.

*************************************************************************************************

**Week 10**. Review of Matplotlib + Supervised Learning:
Linear Model

**Reading 10**: Thursday,
Nov. 15

- Finish all sections
on Matplotlib described in Chapter 4 of
*Python Data Science Handbook*by Jake VanderPlas. - Carefully read Section 4.6 on linear models in Chapter 4 of
*Data Mining: Practical Machine Learning Tools and Techniques*(Full-text contents of 3^{rd}ed. available online through the Biola Library account, also see the ppts of Chapter 4 here)

**Quiz#3**: Due Thursday, Nov. 15

¡P
Create a
Jupiter notebook with Python code completed as described in the 5 problems in
Quiz#3 for using
Pandas to work on the data set in this csv file in Lab 4 and Lab 5.

¡P
Open-book
quiz, but no collaboration with others allowed.

*************************************************************************************************

**Weeks 11-12**.
Supervised Learning: Linear Model + Neural Networks and Deep Learning

**Faith and Learning Integration Assignment on
Creation and Computer Science** due: Tuesday,
Nov. 20

¡P
**Dr. Lin will be out of town for a conference on Nov. 20.
Please use the class time for reflection needed to do this assignment.**

¡P
You should put down what you
have in the reflection process according to the requirement in the assignment.

¡P
**Submit your reflection
report accordingly through Canvas.**

**Reading 11**: Thursday,
Nov. 22 (submission open till Nov. 27 without
penalty)

- Browse Section 6.4 on extending linear models in Chapter 6 of
*Data Mining: Practical Machine Learning Tools and Techniques*(Full-text contents of 3^{rd}ed. available online through the Biola Library account) - Browse an overview tutorial here from a graduate course on deep learning and explore demos of deep-learning applications here.

**Homework #2:** (Linear
Regression and Linear Models): Thursday, Nov. 22 (submission open till Nov. 27 without penalty)

- Purpose: Linear
regression.
- Send in your work through Biola Canvas.

**Reading 12**: Thursday,
Nov. 29

- Read the
first 4 sections in Chapter 5 on machine learning using sklearn in
*Python Data Science Handbook*by Jake VanderPlas.

**Lab 6** (TBA):

*************************************************************************************************

**Links to**** online resources**

- About
Jupiter Notebook
- Python Tutorials:
*Python in a Notebook*by Valerio Maggio (GitHub repository) - Python Tutorials:
*A Whirlwind Tour of Python*by Jake VanderPlas (GitHub repository) *Python Data Science Handbook*by Jake VanderPlas (GitHub repository)- NumPy QuickStart on scipy.org
- Matplotlib beginner¡¦s guide on
matpltlib.org
- Pandas tutorials by Tom
Augspurge (GitHub repository): Modern Pandas, Method Chaining, Indexes, Performance, Tidy Data, Visualization, Timeseries
- Python Scientific Lecture Notes on scipy-lecture,org

*************************************************************************************************

**To the ****Top**** of the Page **