AAAI Publications, Twenty-Fifth AAAI Conference on Artificial Intelligence

Font Size: 
How to Calibrate the Scores of Biased Reviewers by Quadratic Programming
Magnus Roos, Jörg Rothe, Björn Scheuermann

Last modified: 2011-08-04


Peer reviewing is the key ingredient of evaluating the quality of scientific work. Based on the review scores assigned by the individual reviewers to the submissions, program committees of conferences and journal editors decide which papers to accept for publication and which to reject. However, some reviewers may be more rigorous than others, they may be biased one way or the other, and they often have highly subjective preferences over the papers they review. Moreover, each reviewer usually has only a very local view, as he or she evaluates only a small fraction of the submissions. Despite all these shortcomings, the review scores obtained need to be aggregrated in order to globally rank all submissions and to make the acceptance/rejection decision. A common method is to simply take the average of each submission's review scores, possibly weighted by the reviewers' confidence levels. Unfortunately, the global ranking thus produced often suffers a certain unfairness, as the reviewers' biases and limitations are not taken into account. We propose a method for calibrating the scores of reviewers that are potentially biased and blindfolded by having only partial information. Our method uses a maximum likelihood estimator, which estimates both the bias of each individual reviewer and the unknown "ideal" score of each submission. This yields a quadratic program whose solution transforms the individual review scores into calibrated, globally comparable scores. We argue why our method results in a fairer and more reasonable global ranking than simply taking the average of scores. To show its usefulness, we test our method empirically using real-world data.

Full Text: PDF