Padhraic Smyth, Michael Burl, Usama Fayyad, and Pietro Perona
This paper discusses the problem of knowledge discovery in image databases with particular focus on the issues which arise when absolute ground truth is not available. The problem of searching the Magellan image data set in order to automatically locate and catalog small volcanoes on the planet Venus is used as a case study. In the absence of calibrated ground truth, planetary scientists provide subjective estimates of ground truth based on visual inspection of Magellan images. The paper discusses issues which arise in terms of elicitation of subjective probabilistic opinion, learning from probabilistic labels, and effective evaluation of both scientist and algorithm performance in the absence of ground truth. Data from the Magellan volcano detection project is used to illustrate the various techniques which we have developed to handle these issues. The primary conclusion of the paper is that knowledge discovery methodologies can be modified to handle lack of absolute ground truth provided the sources of uncertainty in the data are carefully handled.