From Digitized Images to Online Catalogs Data Mining a Sky Survey

  • Usama M. Fayyad
  • S. G. Djorgovski
  • Nicholas Weir


The value of scientific digital-image libraries seldom lies in the pixels of images. For large collections of images, such as those resulting from astronomy sky surveys, the typical useful product is an online database cataloging entries of interest. We focus on the automation of the cataloging effort of a major sky survey and the availability of digital libraries in general. The SKICAT system automates the reduction and analysis of the three terabytes worth of images, expected to contain on the order of 2 billion sky objects. For the primary scientific analysis of these data, it is necessary to detect, measure, and classify every sky object. SKICAT integrates techniques for image processing, classification learning, database management, and visualization. The learning algorithms are trained to classify the detected objects and can classify objects too faint for visual classification with an accuracy level exceeding 90 percent. This accuracy level increases the number of classified objects in the final catalog threefold relative to the best results from digitized photographic sky surveys to date. Hence, learning algorithms played a powerful and enabling role and solved a difficult, scientifically significant problem, enabling the consistent, accurate classification and the ease of access and analysis of an otherwise unfathomable data set.