Singular Value Decomposition
Previous  Top  Next

iMetaSearch maps all the search results and extracted words/phrases into points in an n-dimensional space where the more closely associated the results and words are, the closer they are in the space. Items that are not similar are farther apart in the space.

This is done by the following steps:

1.All words not in the StopWords file are extracted from all results. All phrases less than a parameter number of words that consist of non-StopWords are also extracted.  
2.The words/phrases and results become the rows and columns of a large matrix, the (word x result) matrix. This matrix has large dimensions but is very sparse. Each cell in the matrix is the number of times that word/phrase occurs in that result.  
3.The cells of the matrix are processed. They are adjusted by the entropy value of each word (how significant the word is, a word that occurs in every result is less significant than a word that occurs in 5 results), the log of the count instead of the actual count is used, and each column (result) is normalized so longer results don't have the advantage of having higher counts just because they contain more words.  
4.The Singular Value Demposition (SVD) algorithm is run on the matrix, resulting in 3 other matrices. The purpose of SVD is to discover the degree to which patterns are present in the matrix and to discover how best to represent those patterns. SVD is sometimes used in compressing data, clustering data, or in removing noise from a system by concentrating on just the most significant patterns in the data.  
5.The results and words are each mapped to points in an n-dimensional space based on the values in the 3 matrices calculated by the SVD algorithm.  

These 3 matrices from SVD provide the coordinates of every word and result in an n-dimensional space where the number of dimensions n is a parameter chosen high enough to include significant patterns, but low enough to exclude noise.

These points representing results and words can be thought of as a mapping of the results and words into a similarity space so that closely associated words and results end up close together. When results or words are marked in iMetaSearch, then their average position in the space is computed and all documents and keywords are sorted by how far away they are from the average position.

Closer documents and keywords are sorted to the top and more distant documents and keywords are sorted to the bottom. The relevance bars measure the distance from any hit or word to the average position of the marked results or words.