|
Page 8 of 8
Advantages, Disadvantages, and Applications of LSA
Latent Semantic Analysis has many nice properties that make it widely applicable to many problems.
- First, the documents and words end up being mapped to the same concept space. In this space we can cluster documents, cluster words, and most importantly, see how these clusters coincide so we can retrieve documents based on words and vice versa.
- Second, the concept space has vastly fewer dimensions compared to the original matrix. Not only that, but these dimensions have been chosen specifically because they contain the most information and least noise. This makes the new concept space ideal for running further algorithms such as testing different clustering algorithms.
- Last, LSA is an inherently global algorithm that looks at trends and patterns from all documents and all words so it can find things that may not be apparent to a more locally based algorithm. It can also be usefully combined with a more local algorithm such as nearest neighbors to become more useful than either algorithm by itself.
There are a few limitations that must be considered when deciding whether to use LSA. Some of these are:
- LSA assumes a Gaussian distribution and Frobenius norm which may not fit all problems. For example, words in documents seem to follow a Poisson distribution rather than a Gaussian distribution.
- LSA cannot handle polysemy (words with multiple meanings) effectively. It assumes that the same word means the same concept which causes problems for words like bank that have multiple meanings depending on which contexts they appear in.
- LSA depends heavily on SVD which is computationally intensive and hard to update as new documents appear. However recent work has led to a new efficient algorithm which can update SVD based on new documents in a theoretically exact sense.
In spite of these limitations, LSA is widely used for finding and organizing search results, grouping documents into clusters, spam filtering, speech recognition, patent searches, automated essay evaluation, etc.
As an example, iMetaSearch uses LSA to map search results and words to a “concept” space. Users can then find which results are closest to which words and vice versa. The LSA results are also used to cluster search results together so that you save time when looking for related results.
revised on January 30, 2010
|