When you browse standard web sources
like Wikipedia to learn about Singular
Value Decomposition or SVD you find many equations, but
not an intuitive explanation of what it is or how it works.
SVD is a way of factoring matrices into a series of linear
approximations that expose the underlying structure of the
matrix. Two important properties are that the linear
factoring is exact and optimal. Exact means that the series
of linear factors, added together, exactly equal the original
matrix. Optimal means that, for the standard means of
measuring matrix similarity (the Frobenius norm), these
factors give the best possible linear approximation at each
step in the series.
SVD is extraordinarily useful and has many applications
such as data analysis, signal processing, pattern
recognition, image compression, weather prediction, and Latent
Sematic Analysis or LSA (also referred to as Latent
Semantic Indexing). Why is SVD so useful and how does it
work?
As a simple example, let's look at golf scores. Suppose
Phil, Tiger, and Vijay play together for 9 holes and they
each make par on every hole. Their scorecard, which can also
be viewed as a (hole x player) matrix might look like
this.
| Hole |
Par |
Phil |
Tiger |
Vijay |
| 1 |
4 |
4 |
4 |
4 |
| 2 |
5 |
5 |
5 |
5 |
| 3 |
3 |
3 |
3 |
3 |
| 4 |
4 |
4 |
4 |
4 |
| 5 |
4 |
4 |
4 |
4 |
| 6 |
4 |
4 |
4 |
4 |
| 7 |
4 |
4 |
4 |
4 |
| 8 |
3 |
3 |
3 |
3 |
| 9 |
5 |
5 |
5 |
5 |
Let's look at the problem of trying to predict what score
each player will make on a given hole. One idea is give each
hole a HoleDifficulty factor, and each player a PlayerAbility
factor. The actual score is predicted by multiplying these
two factors together.
PredictedScore = HoleDifficulty * PlayerAbility
For the first attempt, let's make the HoleDifficulty be
the par score for the hole, and let's make the player ability
equal to 1. So on the first hole, which is par 4, we would
expect a player of ability 1 to get a score of 4.
PredictedScore = HoleDifficulty * PlayerAbility = 4 * 1 =
4
For our entire scorecard or matrix, all we have to do is
multiply the PlayerAbility (assumed to be 1 for all players)
by the HoleDifficulty (ranges from par 3 to par 5) and we can
exactly predict all the scores in our example.
In fact, this is the one dimensional (1-D) SVD
factorization of the scorecard. We can represent our
scorecard or matrix as the product of two vectors, the
HoleDifficulty vector and the PlayerAbility vector. To
predict any score, simply multiply the appropriate
HoleDifficulty factor by the appropriate PlayerAbility
factor. Following normal vector multiplication rules, we can
generate the matrix of scores by multiplying the
HoleDifficulty vector by the PlayerAbility vector, according
to the following equation.
|
| Phil |
Tiger |
Vijay |
| 4 |
4 |
4 |
| 5 |
5 |
5 |
| 3 |
3 |
3 |
| 4 |
4 |
4 |
| 4 |
4 |
4 |
| 4 |
4 |
4 |
| 4 |
4 |
4 |
| 3 |
3 |
3 |
| 5 |
5 |
5 |
|
= |
| HoleDifficulty |
| 4 |
| 5 |
| 3 |
| 4 |
| 4 |
| 4 |
| 4 |
| 3 |
| 5 |
|
* |
| PlayerAbility |
| Phil |
Tiger |
Vijay |
| 1 |
1 |
1 |
|
Mathematicians like to keep everything orderly, so the
convention is that all vectors should be scaled so they have
length 1. For example, the PlayerAbility vector is modified
so that the sum of the squares of its elements add to 1,
instead of the current 12 + 12 +
12 = 3. To do this, we have to divide each element
by the square root of 3, so that when we square it, it
becomes 1/3 and the three elements add to 1. Similarly, we
have to divide each HoleDifficulty element by the square root
of 148. The square root of 3 times the square root of 148 is
our scaling factor 21.07. The complete 1-D SVD factorization
(to 2 decimal places) is:
|
| Phil |
Tiger |
Vijay |
| 4 |
4 |
4 |
| 5 |
5 |
5 |
| 3 |
3 |
3 |
| 4 |
4 |
4 |
| 4 |
4 |
4 |
| 4 |
4 |
4 |
| 4 |
4 |
4 |
| 3 |
3 |
3 |
| 5 |
5 |
5 |
|
= |
| HoleDifficulty |
| 0.33 |
| 0.41 |
| 0.25 |
| 0.33 |
| 0.33 |
| 0.33 |
| 0.33 |
| 0.25 |
| 0.41 |
|
* |
|
* |
| PlayerAbility |
| Phil |
Tiger |
Vijay |
| 0.58 |
0.58 |
0.58 |
|
Our HoleDifficulty vector, that starts with 0.33, is
called the Left Singular Vector. The ScaleFactor is the
Singular Value, and our PlayerAbility vector, that starts
with 0.58 is the Right Singular Vector. If we represent these
3 parts exactly, and multiply them together, we get the exact
original scores. This means our matrix is a rank 1 matrix,
another way of saying it has a simple and predictable
pattern.
More complicated matrices cannot be completely predicted
just by using one set of factors as we have done. In that
case, we have to introduce a second set of factors to refine
our predictions. To do that, we subtract our predicted scores
from the actual scores, getting the residual scores. Then we
find a second set of HoleDifficulty2 and PlayerAbility2
numbers that best predict the residual scores.
Rather than guessing HoleDifficulty and PlayerAbility
factors and subtracting predicted scores, there exist
powerful algorithms than can calculate SVD factorizations for
you. Let's look at the actual scores from the first 9 holes
of the 2007 Players Championship as played by Phil, Tiger,
and Vijay.
| Hole |
Par |
Phil |
Tiger |
Vijay |
| 1 |
4 |
4 |
4 |
5 |
| 2 |
5 |
4 |
5 |
5 |
| 3 |
3 |
3 |
3 |
2 |
| 4 |
4 |
4 |
5 |
4 |
| 5 |
4 |
4 |
4 |
4 |
| 6 |
4 |
3 |
5 |
4 |
| 7 |
4 |
4 |
4 |
3 |
| 8 |
3 |
2 |
4 |
4 |
| 9 |
5 |
5 |
5 |
5 |
The 1-D SVD factorization of the scores is shown below. To
make this example easier to understand, I have incorporated
the ScaleFactor into the PlayerAbility and HoleDifficulty
vectors so we can ignore the ScaleFactor for this example.
|
| Phil |
Tiger |
Vijay |
| 3.95 |
4.64 |
4.34 |
| 4.27 |
5.02 |
4.69 |
| 2.42 |
2.85 |
2.66 |
| 3.97 |
4.67 |
4.36 |
| 3.64 |
4.28 |
4.00 |
| 3.69 |
4.33 |
4.05 |
| 3.33 |
3.92 |
3.66 |
| 3.08 |
3.63 |
3.39 |
| 4.55 |
5.35 |
5.00 |
|
= |
| HoleDifficulty |
| 4.34 |
| 4.69 |
| 2.66 |
| 4.36 |
| 4.00 |
| 4.05 |
| 3.66 |
| 3.39 |
| 5.00 |
|
* |
| PlayerAbility |
| Phil |
Tiger |
Vijay |
| 0.91 |
1.07 |
1.00 |
|
Notice that the HoleDifficulty factor is almost the
average of that hole for the 3 players. For example hole 5,
where everyone scored 4, does have a factor of 4.00. However
hole 6, where the average score is also 4, has a factor of
4.05 instead of 4.00. Similarly, the PlayerAbility is almost
the percentage of par that the player achieved, For example
Tiger shot 39 with par being 36, and 39/36 = 1.08 which is
almost his PlayerAbility factor (for these 9 holes) of
1.07.
Why don't the hole averages and par percentages exactly
match the 1-D SVD factors? The answer is that SVD further
refines those numbers in a cycle. For example, we can start
by assuming HoleDifficulty is the hole average and then ask
what PlayerAbility best matches the scores, given those
HoleDifficulty numbers? Once we have that answer we can go
back and ask what HoleDifficulty best matches the scores
given those PlayerAbility numbers? We keep iterating this way
until we converge to a set of factors that best predict the
score. SVD shortcuts this process and immediately give us the
factors that we would have converged to if we carried out the
process.
One very useful property of SVD is that it always finds
the optimal set of factors that best predict the scores,
according to the standard matrix similarity measure (the
Frobenius norm). That is, if we use SVD to find the factors
of a matrix, those are the best factors that can be found.
This optimality property means that we don't have to wonder
if a different set of numbers might predict scores better.
Now let's look at the difference between the actual scores
and our 1-D approximation. A plus difference means that the
actual score is higher than the predicted score, a minus
difference means the actual score is lower than the
prediction. For example, on the first hole Tiger got a 4 and
the predicted score was 4.64 so we get 4 - 4.64 = -0.64. In
other words, we must add -0.64 to our prediction to get the
actual score.
Once these differences have been found, we can do the same
thing again and predict these differences using the formula
HoleDifficulty2 * PlayerAbility2. Since these factors are
trying to predict the differences, they are the 2-D factors
and we have put a 2 after their names (ex. HoleDifficulty2)
to show they are the second set of factors.
|
| Phil |
Tiger |
Vijay |
| 0.05 |
-0.64 |
0.66 |
| -0.28 |
-0.02 |
0.31 |
| 0.58 |
0.15 |
-0.66 |
| 0.03 |
0.33 |
-0.36 |
| 0.36 |
-0.28 |
0.00 |
| -0.69 |
0.67 |
-0.05 |
| 0.67 |
0.08 |
-0.66 |
| -1.08 |
0.37 |
0.61 |
| 0.45 |
-0.35 |
0.00 |
|
= |
| HoleDifficulty2 |
| -0.18 |
| -0.38 |
| 0.80 |
| 0.15 |
| 0.35 |
| -0.67 |
| 0.89 |
| -1.29 |
| 0.44 |
|
* |
| PlayerAbility2 |
| Phil |
Tiger |
Vijay |
| 0.82 |
-0.20 |
-0.53 |
|
There are some interesting observations we can make about
these factors. Notice that hole 8 has the most significant
HoleDifficulty2 factor (1.29). That means that it is the
hardest hole to predict. Indeed, it was the only hole on
which none of the 3 players made par. It was especially hard
to predict because it was the most difficult hole relative to
par (HoleDifficulty - par) = (3.39 - 3) = 0.39, and yet Phil
birdied it making his score more than a stroke below his
predicted score (he scored 2 versus his predicted score of
3.08). Other holes that were hard to predict were holes 3
(0.80) and 7 (0.89) because Vijay beat Phil on those holes
even though, in general, Phil was playing better.
The full SVD for this example matrix (9 holes by 3
players) has 3 sets of factors. In general, a m x n matrix
where m >= n can have at most n factors, so our 9 x 3
matrix cannot have more than 3 sets of factors. Here is the
full SVD factorization (to two decimal places).
|
| Phil |
Tiger |
Vijay |
| 4 |
4 |
5 |
| 4 |
5 |
5 |
| 3 |
3 |
2 |
| 4 |
5 |
4 |
| 4 |
4 |
4 |
| 3 |
5 |
4 |
| 4 |
4 |
3 |
| 2 |
4 |
4 |
| 5 |
5 |
5 |
|
= |
| HoleDifficulty 1-3 |
| 4.34 |
-0.18 |
-0.90 |
| 4.69 |
-0.38 |
-0.15 |
| 2.66 |
0.80 |
0.40 |
| 4.36 |
0.15 |
0.47 |
| 4.00 |
0.35 |
-0.29 |
| 4.05 |
-0.67 |
0.68 |
| 3.66 |
0.89 |
0.33 |
| 3.39 |
-1.29 |
0.14 |
| 5.00 |
0.44 |
-0.36 |
|
* |
| PlayerAbility 1-3 |
| Phil |
Tiger |
Vijay |
| 0.91 |
1.07 |
1.00 |
| 0.82 |
-0.20 |
-0.53 |
| -0.21 |
0.76 |
-0.62 |
|
By SVD convention, the HoleDifficulty and PlayerAbility
vectors should all have length 1, so the conventional SVD
factorization is:
|
| Phil |
Tiger |
Vijay |
| 4 |
4 |
5 |
| 4 |
5 |
5 |
| 3 |
3 |
2 |
| 4 |
5 |
4 |
| 4 |
4 |
4 |
| 3 |
5 |
4 |
| 4 |
4 |
3 |
| 2 |
4 |
4 |
| 5 |
5 |
5 |
|
= |
| HoleDifficulty 1-3 |
| 0.35 |
0.09 |
-0.64 |
| 0.38 |
0.19 |
-0.10 |
| 0.22 |
-0.40 |
0.28 |
| 0.36 |
-0.08 |
0.33 |
| 0.33 |
-0.18 |
-0.20 |
| 0.33 |
0.33 |
0.48 |
| 0.30 |
-0.44 |
0.23 |
| 0.28 |
0.64 |
0.10 |
| 0.41 |
-0.22 |
-0.25 |
|
* |
| ScaleFactor 1-3 |
| 21.07 |
0 |
0 |
| 0 |
2.01 |
0 |
| 0 |
0 |
1.42 |
|
* |
| PlayerAbility 1-3 |
| Phil |
Tiger |
Vijay |
| 0.53 |
0.62 |
0.58 |
| -0.82 |
0.20 |
0.53 |
| -0.21 |
0.76 |
-0.62 |
|
We hope that you have some idea of what SVD is and how it
can be used. The next section covers applying SVD to Latent
Sematic Analysis or LSA. Although the domain is different,
the concepts are the same. We are trying to predict patterns
of how words occur in documents instead of trying to predict
patterns of how players score on holes.
|