Lili He and Ian R. Greenshields
Data analysis is critical to many (if not all) Homeland Security missions. Data to be fielded in this domain is typically immense in cardinality (number of records) and immense in dimensionality (features per record). Random Projections (RP) have been proposed as an effective technique for embedding a metric space of dimension d into one of dimension k while retaining bounds on the distortion of the metric under embedding. More recently, Achlioptas has shown from work founded on the Johnson-Lindenstrauss lemma on embeddings that a sparse random matrix could give additional computational savings in random projection, and he also gives a theorical embedding bound k0. However, empirical evidence shows that one can choose k << k0 and yet retain low distortion in the metric. In the paper we propose an experimental approach to find the optimum lower bound of k0 for any distribution models in terms of classification.