iMetaSearch Minimum Spanning Tree
Previous  Top 

Clusters in iMetaSearch are based on finding groups of similar search results. In order to find clusters, the SVD algorithm in the previous topic is run on the search. This maps results into high dimensional space with the property that similar results tend to end up close to each other. In order to turn this information into clusters the following steps are performed:

1) A minimum spanning tree is calculated for all the results in the space. By definition, this means all the results are linked together into a tree with the shortest links possible.

2) The links of the minimum spanning tree are broken one at a time, starting with the longest link and progressing down to the shortest link. Each broken link splits the group of results into two smaller groups, and the larger of the two smaller groups is always sorted to the left. Then the largest link in each of those two groups is broken in the same way and so on, until only groups of size 2 or less remain.

3) The number of clusters set by the cluster slider control determines how many links are broken, and thus how many clusters are shown. Fewer broken links mean a smaller number of larger clusters. More broken links mean a larger number of smaller clusters.

The result of this series of steps is that the results are divided into clusters. Results are initially sorted by cluster when a search is first displayed and can also be sorted by clicking the Cluster column header.

The Cluster section has a list of buttons, one for each cluster. The caption on each button has 3 index words/phrases that describe the cluster. Clicking the cluster button marks the first result of that cluster, and updates the relevance scores of all results as usual.