|
Singular Value Decomposition
|
Previous Top Next |
| 1. | All words not in the StopWords file are extracted from all results. All phrases less than a parameter number of words that consist of non-StopWords are also extracted.
|
| 2. | The words/phrases and results become the rows and columns of a large matrix, the (word x result) matrix. This matrix has large dimensions but is very sparse. Each cell in the matrix is the number of times that word/phrase occurs in that result.
|
| 3. | The cells of the matrix are processed. They are adjusted by the entropy value of each word (how significant the word is, a word that occurs in every result is less significant than a word that occurs in 5 results), the log of the count instead of the actual count is used, and each column (result) is normalized so longer results don't have the advantage of having higher counts just because they contain more words.
|
| 4. | The Singular Value Demposition (SVD) algorithm is run on the matrix, resulting in 3 other matrices. The purpose of SVD is to discover the degree to which patterns are present in the matrix and to discover how best to represent those patterns. SVD is sometimes used in compressing data, clustering data, or in removing noise from a system by concentrating on just the most significant patterns in the data.
|
| 5. | The results and words are each mapped to points in an n-dimensional space based on the values in the 3 matrices calculated by the SVD algorithm.
|