Rick L. Riolo and Mark P. Line
Under NASA’s new Earth Observing System (EOS), satellite imagery is expected to arrive back on Earth at rates of gigabytes/day. Techniques for the extraction of useful information from such massive data streams must be efficient and scalable in order to survive in petabyte archive situations, and they must overcome the opacity inherent in the data by classifying or estimating pixels according to user-specified categories such as crop-type or forest health. We are in the process of applying GP to several related satellite remote sensing (RS) classification and estimation problems in such a way as to surmount the usual obstacles to large-scale exploitation of imagery. The fitness functions used for training are based on how well the discovered programs perform on a set of cases from Landsat Thematic Mapper (TM) imagery. Programs are rated on how well they perform on out-of-training-set samples of cases from the same imagery. We have carried out a number of preliminary experiments on a relatively simple binary classification task. Each case is a set of 7 spectral intensity readings for a pixel and an associated ground truth class: 1 for surface water, 0 for none. The GP system very rapidly discovers simple relations that correctly predict 98%+ for training and testing data sets. The key problem with the results we have observed so far is that the simple solutions rapidly drive out diversity in the population. Several approaches will be taken in further study in order to try to maintain diversity in the population.