Font Size:

The Crowd-Median Algorithm

Last modified: 2013-11-03

#### Abstract

The power of human computation is founded on the capabilities of humans to process qualitative information in a manner that is hard to reproduce with a computer. However, all machine learning algorithms rely on mathematical operations, such as sums, averages, least squares etc. that are less suitable for human computation. This paper is an effort to combine these two aspects of data processing. We consider the problem of computing a centroid of a data set, a key component in many data-analysis applications such as clustering, using a very simple human intelligence task (HIT). In this task the workers must choose the outlier from a set of three items. After presenting a number of such triplets to the workers, the item chosen the least number of times as the outlier is selected as the centroid. We provide a proof that the centroid determined by this procedure is equal the mean of a univariate normal distribution. Furthermore, as a demonstration of the viability of our method, we implement a human computation based variant of the k-means clustering algorithm. We present experiments where the proposed method is used to find an "average" image in a collection, and cluster images to semantic categories.

#### Keywords

human computation; crowdsourcing; algorithms; median; clustering; kmeans

Full Text:
PDF