Mike Perkowitz, Oren Etzioni
The creation of a complex web site is a thorny problem in user interface design. In IJCAI '97, we challenged the AI community to address this problem by creating adaptive web sites: sites that automatically improve their organization and presentation by mining visitor access data collected in Web server logs. In this paper we introduce our own approach to this broad challenge. Specifically, we investigate the problem of index page synthesis | the automatic creation of pages that facilitate a visitor’s navigation of a Web site. First, we formalize this problem as a clustering problem and introduce a novel approach to clustering, which we call cluster mining: Instead of attempting to partition the entire data space into disjoint clusters, we search for a small number of cohesive (and possibly overlapping) clusters. Next, we present PageGather, a cluster mining algorithm that takes Web server logs as input and outputs the contents of candidate index pages. Finally, we show experimentally that PageGather is both faster (by a factor of three) and more effective than traditional clustering algorithms on this task. Our experiment relies on access logs collected over a month from an actual web site.