Viral, Quality, and Junk Videos on YouTube: Separating Content From Noise in an Information-Rich Environment

Riley Crane, Didier Sornette

The emergence of the internet as a vehicle for news, commerce, and social activity has created a wealth of information and content. While Google and others have successfully exploited the web's static structure to identify relevance, the proliferation of user generated content on sites like YouTube and Flickr has created a landscape in which quality is not easily identifiable. Here we show how to identify relevant content using information revealed by collective human behavior. We study the dynamics of the daily viewing activity for nearly 5 million videos on YouTube and find an ubiquitous power law relaxation governing the timing of views. Using simple filters, relaxation exponents cluster into three distinct classes, which correspond naturally to the labels of viral, quality, and junk. These results are consistent with an epidemic model on a social network containing two ingredients: A power law distribution of waiting times between cause and action and an epidemic cascade of actions becoming the causes of future actions.

Subjects: 15.7 Search; 12.2 Scientific Discovery

Submitted: Jan 24, 2008

This page is copyrighted by AAAI. All rights reserved. Your use of this site constitutes acceptance of all of AAAI's terms and conditions and privacy policy.