Font Size:
Improving Consensus Accuracy via Z-Score and Weighted Voting
Last modified: 2011-08-24
Abstract
Using supervised and unsupervised features individually or together, we (a) detect and filter out noisy workers via Z-score, and (b) weight worker votes for consensus labeling. We evaluate on noisy labels from Amazon Mechanical Turk in which workers judge Web search relevance of query/document pairs. In comparison to a majority vote baseline, results show a 6% error reduction (48.83% to 51.91%) for graded accuracy and 5% error reduction (64.88% to 68.33%) for binary accuracy.
Full Text:
PDF