In the context of web personalization and dynamic content recommendation, it is crucial to learn typical user profiles. Although there exists several approaches to mine user profiles (such as association rules or sequential patterns extraction), this paper focuses on the application of relational clustering algorithms on web usage data to characterize user access profiles. These methods rely on the definition of a distance (or dissimilarity) measure between user sessions and thus can carry more information (content, sequence of page views, context of navigation) than simple transactions. Moreover, as web user sessions are often noisy, uncertain or inaccurate (because of proxy web server, local browser cache and sessions building heuristics), we propose to use two clustering algorithms: the leader Ant clustering algorithm that is inspired by the chemical recognition system of ants and a new variant of the fuzzy C Medoids. The paper also describes the similarity measures used to compare these algorithms with the traditional fuzzy C Medoids on real web usage data sets from French museums. The evaluation is conducted according to the quality of the output partitions and the interpretability of each cluster based on its content.
Subjects: 12. Machine Learning and Discovery; 6. Computer-Human Interaction
Submitted: May 10, 2007