Chuck P. Lam and David G. Stork
One of the most resource intensive tasks in building a pattern recognition system is data collection, specifically the acquisition of sample labels from subject experts. The first part of this paper explores an EM algorithm to train classifiers using labelers of various reliability. Exploiting unreliable labelers opens up the possibility of assigning multiple labelers to judge the same sample. The second part of this paper examines an optimal strategy such that labelers are assigned to judge samples to maximize information given to the learning system. The optimal labeling strategy for the idealized case of two labelers with two samples is examined and illustrated.