AAAI Publications, The Twenty-Sixth International FLAIRS Conference

Font Size: 
Cluster Analysis for Optimal Indexing
Tim Wylie, Michael A. Schuh, John W. Sheppard, Rafal A. Angryk

Last modified: 2013-05-19


High-dimensional indexing is an important area of current research, especially for range and kNN queries. This work introduces clustering for the sake of indexing. The goal is to develop new clustering methods designed to optimize the data partitioning for an indexing-specific tree structure instead of finding data distribution-based clusters. We focus on iDistance, a state-of-the-art high-dimensional indexing method, and take a basic approach to solving this new problem. By utilizing spherical clusters in an unsupervised Expectation Maximization algorithm dependent upon local density and cluster overlap, we create a partitioning of the space providing balanced segmentation for a B+-tree. We also look at the novel idea of reclustering for a specific indexing method by taking the output of one clustering method and reclustering it for use in an index. The algorithms are then tested and evaluated based on our error metric and iDistance query performance.


iDistance; indexing; expectation maximization; query optimization

Full Text: PDF