Latent Semantic Analysis (LSA) Tutorial
Latent Semantic Analysis (LSA) Tutorial - Part 5 - Clustering by Value PDF Print E-mail
Article Index
Latent Semantic Analysis (LSA) Tutorial
A Small Example
Part 1 - Creating the Count Matrix
Part 2 - Modify the Counts with TFIDF
Part 3 - Using the Singular Value Decomposition
Part 4 - Clustering by Color
Part 5 - Clustering by Value
Advantages, Disadvantages, and Applications of LSA
All Pages

Part 5 - Clustering by Value

Leaving out the first dimension, as we discussed, let's graph the second and third dimensions using a XY graph. We'll put the second dimension on the X axis and the third dimension on the Y axis and graph each word and title. It's interesting to compare the XY graph with the table we just created that clusters the documents.

In the graph below, words are represented by red squares and titles are represented by blue circles. For example the word "book" has dimension values (0.15, -0.27, 0.04). We ignore the first dimension value 0.15 and graph "book" to position (x = -0.27, y = 0.04) as can be seen in the graph. Titles are similarly graphed.

xygraph2

One advantage of this technique is that both words and titles are placed on the same graph. Not only can we identify clusters of titles, but we can also label the clusters by looking at what words are also in the cluster. For example, the lower left cluster has titles 1 and 3 which are both about stock market investing. The words "stock" and "market" are conveniently located in the cluster, making it easy to see what the cluster is about. Another example is the middle cluster which has titles 2, 4, 5, and, to a somewhat lesser extent, title 8. Titles 2, 4, and 5 are close to the words "value" and "investing" which summarizes those titles quite well.



 
Joomla Templates by Joomlashack