Least-square Linear
Regression
For data mining,
often we need to learn a model from the training data set so that we can use
the model to predict the unknown value of the class attribute of any given new instance
based on the known values of the other attributes. When the class attribute is
a nominal attribute (i.e. there are only a finite number c of possible values for the class attribute), this is a discrete
classification problem since our job is to classify the new instance into one
of the c categories corresponding to
the c possible values for the class
attribute we know.
However, when the class attribute is a numeric attribute (for example when
the value can be any real number in an interval), there are an infinite number
of possible values for the class attribute and in the training
data set we can only see a finite number of the possible values showing up. Obviously we need to learn a model that can possibly
classify new instances into any of the possible values instead of the values
observed in the training data set only. Sometime people refer to the task of
predicting the value of such numeric class attribute as a regression problem. There are many regression algorithms for the
job. This exercise is about the very simple linear regression algorithm for
prediction to give you an idea about the role of regression algorithms.
Linear
Regression Model and Least square solution:
· Ideally, a solution x to the linear system Ax = b would provide us with n perfect coefficients for combining the values of the first n attributes to generate the exact value for the (n+1) th attribute. Unfortunately Ax=b in general has no solution if m is greater than n. In other words, it is impossible to use a linear equation to exactly predict the value for the (n+1) th attribute based on the values of the first n attributes. Instead, we can look for the least square solution x to give us a way that will minimize the sum of the squares of prediction errors over the m instances. See the next section about how to find the least square error solution for over-determined linear equations.
Finding the least-square-error
solution for over-determined linear equations:
· Typically Ax=b has no solution if m is greater than n. In other words, Ax-b is not a zero vector for any given vector x. Instead of an exact solution, we can look for a least square solution x such that the length of the error vector Ax -b is minimized. In other words, we would like to minimize the sum of the squares of individual components of the error vector Ax-b.
· You can find the least square solution x by solving the following linear system (ATA) x = (AT b) where AT is the transpose matrix of A.
· Note that (ATA) is an n by n matrix and (AT b) is an n by 1 vector and now you are solving a new linear system (ATA)x = (AT b) of n equations with n variables to find the least square solution for the original system Ax=b.
Questions:
|
1 |
2 |
3 |
4 |
5 |
H |
147 |
155 |
163 |
173 |
180 |
W |
51 |
57 |
58 |
68 |
72 |
Part 1:
Part 2:
Note that often a dummy attribute with a constant value 1 is artificially included into all the instances of the data set to allow extra flexibility for finding a good linear regression leaner. Let’s apply it to our case here to see whether it provides a better linear regression model.