AAAI Publications, Second AAAI Conference on Human Computation and Crowdsourcing

Font Size: 
Predicting Next Label Quality: A Time-Series Model of Crowdwork
Hyun Joon Jung, Yubin Park, Matthew Lease

Last modified: 2014-09-05


While temporal behavioral patterns can be discerned to underlie real crowd work, prior studies have typically modeled worker performance under a simplified i.i.d. assumption. To better model such temporal worker behavior, we propose a time-series label prediction model for crowd work. This latent variable model captures and summarizes past worker behavior, enabling us to better predict the quality of each worker's next label. Given inherent uncertainty in prediction, we also investigate a decision reject option to balance the tradeoff between prediction accuracy vs. coverage. Results show our model improves accuracy of both label prediction on real crowd worker data, as well as data quality overall.


task routing; recommendation; time series


Bartlett, P. L., and Wegkamp, M. H. 2008. Classification with a reject option using a hinge loss. J. Mach. Learn. Res. 9:1823–1840.

Bernstein, M. S.; Karger, D. R.; Miller, R. C.; and Brandt, J. 2012. Analytic methods for optimizing realtime crowdsourcing. In Collective Intelligence.

Box, G.; Jenkins, G. M.; and Reinsel, G. C. 1994. Time Series Analysis: Forecasting and Control. Prentice-Hall, third edition.

Buckley, C.; Lease, M.; and Smucker, M. D. 2010. Overview of the TREC 2010 Relevance Feedback Track (Notebook). In The Nineteenth Text Retrieval Conference (TREC) Notebook.

Burg, J. P. 1967. Maximum entropy spectral analysis. In Proc. 37th Meeting of the Society of Exploration Geophysi- cists.

Canova, F., and Cicarelli, M. 2013. Panel vector autoregressive models: A survey. European Central Bank: Working Paper Series.

Carterette, B., and Soboroff, I. 2010. The effect of assessor error on ir system evaluation. In Proceedings of the 33rd international ACM SIGIR conference on Research and devel- opment in information retrieval, SIGIR ’10, 539–546. New York, NY, USA: ACM.

Cosley, D.; Frankowski, D.; Terveen, L.; and Riedl, J. 2007. Suggestbot: using intelligent task routing to help people find work in wikipedia. In 12th ACM IUI Conference, 32–41.

Dai, P.; Mausam; and Weld, D. S. 2010. Decision-theoretic control of crowd-sourced workflows. In Proc. AAAI.

Donmez, P.; Carbonell, J. and Schneider, J. 2010. A probabilistic framework to learn from multiple annotators with time-varying accuracy. In SIAM International Conference on Data Mining (SDM), 826–837.

Engle, R. F. 1982. Autoregressive conditional heteroscedasticity with estimates of the variance of united kingdom inflation. Econometrica 50(4):987–1007.

Gneiting, T., and Raftery, A. E. 2007. Strictly proper scoring rules, prediction, and estimation. Journal of the American Statistical Association 102:359–378.

Grady, C., and Lease, M. 2010. Crowdsourcing document relevance assessment with mechanical turk. In Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon’s Mechanical Turk, CSLDAMT ’10, 172–179. Stroudsburg, PA, USA: As- sociation for Computational Linguistics.

Jacobs, P. A., and Lewis, P. A. W. 1983. Stationary discrete autoregressive-moving average time series generated by mixtures. Journal of Time Series Analysis 4(1):19–36.

Juang, B. H., and Rabiner, L. R. 1991. Hidden Markov Models for speech recognition. Technometrics 33(3):251– 272.

Kamar, E.; Hacker, S.; and Horovitz, E. 2012. Combining human and machine intelligence in large-scale crowdsourcing. In Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 467–474.

Kaufmann, H. 1987. Regression models for nonstationary categorical time series: asymptotic estimation theory. The Annals of Statistics 15(1):79–98.

Kittur, A.; Nickerson, J.; Bernstein, M. S.; Gerber, E.; Shaw, A.; Zimmerman, J.; Lease, M.; and Horton, J. J. 2013. The future of crowd work. In In Proceedings of the ACM Con- ference on Computer Supported Cooperative Work (CSCW), 1301–1318.

Law, E.; Bennett, P.; and Horvitz, E. 2011. The effects of choice in routing relevance judgments. In Proceedings of the 34th ACM SIGIR, 7–8.

Li, H.; Zhao, B.; ; and Fuxman, A. 2014. The Wisdom of Minority: Discovering and Targeting the Right Group of Workers for Crowdsourcing. In Proceedings of the 23rd WWW conference.

Litterman, R. B. 1984. Specifying vector autoregressions for macroeconomic forecasting. Federal Reserve Bank of Minneapolis Staff report 1(92).

Nadeem, M. S. A.; Zucker, J.-D.; and Hanczar, B. 2010. Accuracy-rejection curves (arcs) for comparing classification methods with a reject option. In Machine Learning in System Biology, Journal of Machine Learning, volume 8 of JMLR Proceedings, 65–81.

Park, Y.; Carvalho, C.; and Ghosh, J. 2014. Lamore: A stable, scalable approach to latent vector autoregressive modeling of categorical time series. In 17th International confer- ence AISTAT.

Petuchowski, E., and Lease, M. 2014. TurKPF: TurKontrol as a Particle Filter. Technical report, University of Texas at Austin. arXiv:1404.5078.

Pillai, I.; Fumera, G.; and Roli, F. 2013. Multi-label classification with a reject option. Pattern Recognition 46(8):2256 – 2266.

Raftery, A. E. 1985. A model for high-order markov chains. Journal of the Royal Statistical Society. Series B (Method- ological) 47(3):528–539.

Shahaf, D., and Horvitz, E. 2010. Generalized task markets for human and machine computation. In Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence, 986–993.

Viterbi, A. J. 1967. Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE Trans. Information Theory 13(2):260–269.

Yi, J.; Jin, R.; Jain, S.; and Jain, A. K. 2013. Inferring Users’ Preferences from Crowdsourced Pairwise Comparisons: A Matrix Completion Approach. In 1st AAAI Conference on Human Computation (HCOMP).

Yuen, M.; King, I.; and Leung, K.-S. 2012. Task recommendation in crowdsourcing systems. In Proceedings of the First International Workshop on Crowdsourcing and Data Mining, 22–26.

Zeger, S. L.; Liang, K.-Y.; and Albert, P. S. 1988. Models for longituidinal data: A generalized estimating equation approach. Biometrics 44:1049–1060.

Zhen, X., and Basawa, I. V. 2009. Observation-driven generalized state space models for categorical time series. Statis- tics and Probability Letters 79:2462–2468.

Zucchini, W., and MacDonald, I. L. 2009. Hidden Markov Models for Time Series: An Introduction Using R. Chapman and Hall/CRC.

Full Text: PDF