Machine Learning with Crowdsourcing: A Brief Summary of the Past Research and Future Directions

  • Victor S. Sheng University of Central Arkansas
  • Jing Zhang Nanjing University of Science and Technology


With crowdsourcing systems, labels can be obtained with low cost, which facilitates the creation of training sets for prediction model learning. However, the labels obtained from crowdsourcing are often imperfect, which brings great challenges in model learning. Since 2008, the machine learning community has noticed the great opportunities brought by crowdsourcing and has developed a large number of techniques to deal with inaccuracy, randomness, and uncertainty issues when learning with crowdsourcing. This paper summarizes the technical progress in this field during past eleven years. We focus on two fundamental issues: the data (label) quality and the prediction model quality. For data quality, we summarize ground truth inference methods and some machine learning based methods to further improve data quality. For the prediction model quality, we summarize several learning paradigms developed under the crowdsourcing scenario. Finally, we further discuss several promising future research directions to attract researchers to make contributions in crowdsourcing.

Senior Member Presentation Track: Summary Talks