AAAI Publications, Sixth International AAAI Conference on Weblogs and Social Media

Font Size: 
Hybrid Browser / Server Collection of Streaming Social Media Data for Scalable Real-Time Analysis
Lance Reagan Vick, Titus Soporan, Daniel Robert Lewis, Jane Brooks Zurn

Last modified: 2012-05-20


We present a novel approach to collecting and distributing social media data in web service projects using both clients and servers for real-time analysis, ultimately providing an inexpensive and scalable method of a quality that has not been available to date. Current challenges to social data mining include vendor enforced API limits and infrastructure costs. Our hybrid client / server approach allows data to be collected via JavaScript in browsers as well as by servers. This allows applications to compute a wide range of data analytics. We present pure client and server based collection strategies, then demonstrate how our method has substantial advantages over both. Specific advantages include lower infrastructure requirements and greater efficiency in API utilization. Our approach distributes the majority of data collection tasks to client web browsers while using servers to supply more complex analysis techniques. In addition, we provide details on two open source tools we have released to facilitate implementation by researchers in their own projects. We close by detailing a use case scenario describing a large scale public web service project followed by a solution accomplished using our approach and open source tools.


hyve, synt, kral, sockjs, websocket, ajax, JSON, JSON-P, twitter, digg, facebook, reddit, github, google, google+, jquery, jquery-livestream, tawlk, real-time, social media, haproxy, redis, javascript, python, open source, data mining

Full Text: PDF