Hui Wang and Ivo Duntsch and David Bell
Data reduction makes datasets smaller but preserves classification structures of interest. In this paper we present a novel approach to data reduction based on lattice and hyper relations. Hyper relations are a generalization of conventional database relations in the sense that we allow sets of values as tuple entries. The advantage of this is that raw data and reduced data can both be represented by hyper relations. The collection of hyper relations can be naturally made into a complete Boolean algebra, and so for any collection of hyper tuples we can find its unique least upper bound (lub) as a reduction of it. We show that the lub may not qualify as a reduced version of the given set of tuples, but the interior cover -- the subset of internal elements covered by the \lubb -- does qualify. We establish the theoretical result that such an interior cover exists, and find a way to find it. The proposed method was evaluated using 7 real world datasets. The results were quite remarkable compared with those obtained by C4.5, and the datasets were reduced with reduction ratios up to 99 percent.