| Latent Semantic Analysis (LSA) Tutorial - Part 5 - Clustering by Value |
|
|
|
|
Page 7 of 8
Part 5 - Clustering by ValueLeaving out the first dimension, as we discussed, let's graph the second and third dimensions using a XY graph. We'll put the second dimension on the X axis and the third dimension on the Y axis and graph each word and title. It's interesting to compare the XY graph with the table we just created that clusters the documents. In the graph below, words are represented by red squares and titles are represented by blue circles. For example the word "book" has dimension values (0.15, -0.27, 0.04). We ignore the first dimension value 0.15 and graph "book" to position (x = -0.27, y = 0.04) as can be seen in the graph. Titles are similarly graphed.
One advantage of this technique is that both words and titles are placed on the same graph. Not only can we identify clusters of titles, but we can also label the clusters by looking at what words are also in the cluster. For example, the lower left cluster has titles 1 and 3 which are both about stock market investing. The words "stock" and "market" are conveniently located in the cluster, making it easy to see what the cluster is about. Another example is the middle cluster which has titles 2, 4, 5, and, to a somewhat lesser extent, title 8. Titles 2, 4, and 5 are close to the words "value" and "investing" which summarizes those titles quite well.
|



