Least-square Linear
Regression Models
For data mining,
often we need to learn a model from the training data set so that we can use
the model to predict the unknown value of the class attribute of any given new
instance based on the known values of the other attributes. When the class
attribute is a nominal attribute (i.e. there are only a finite number c of possible values for the class
attribute), this is a discrete classification problem since our job is to
classify the new instance into one of the c
categories corresponding to the c
possible values for the class attribute we know.
However, when the class attribute is a numeric attribute (for example when
the value can be any real number in an interval), there are an infinite number
of possible values for the class attribute and in the training data set we can
only see a finite number of the possible values showing up. Obviously we need
to learn a model that can possibly classify new instances into any of the
possible values instead of the values observed in the training data set only.
Sometime people refer to the task of predicting the value of such numeric class
attribute as a regression problem.
There are many regression algorithms for the job. This exercise is about the
very simple linear regression algorithm for prediction to give you an idea
about the role of regression algorithms.
Linear
Regression Model and Least square solution:
· Ideally, a solution x to the linear system Ax = b would provide us with n perfect coefficients for combining the values of the first n attributes to generate the exact value for the (n+1) th attribute. Unfortunately Ax=b in general has no solution if m is greater than n. In other words, it is impossible to use a linear equation to exactly predict the value for the (n+1) th attribute based on the values of the first n attributes. Instead, we can look for the least square solution x to give us a way that will minimize the sum of the squares of prediction errors over the m instances. See the next section about how to find the least square error solution for over-determined linear equations.
Finding the least-square-error
solution for over-determined linear equations:
· Typically Ax=b has no solution if m is greater than n. In other words, Ax-b is not a zero vector for any given vector x. Instead of an exact solution, we can look for a least square solution x such that the length of the error vector Ax -b is minimized. In other words, we would like to minimize the sum of the squares of individual components of the error vector Ax-b.
· You can find the least square solution x by solving the following linear system (ATA) x = (AT b) where AT is the transpose matrix of A.
· Note that (ATA) is an n by n matrix and (AT b) is an n by 1 vector and now you are solving a new linear system (ATA)x = (AT b) of n equations with n variables to find the least square solution for the original system Ax=b.
Questions:
|
1 |
2 |
3 |
4 |
5 |
H |
147 |
155 |
163 |
173 |
180 |
W |
51 |
57 |
58 |
68 |
72 |
Part 1:
Part 2:
Note that often a dummy attribute with a constant value 1 is artificially included into all the instances of the data set to allow extra flexibility for finding a good linear regression leaner. Let’s apply it to our case here to see whether it provides a better linear regression model.