Yi Wu, Belle L. Tseng
In this paper, we propose the architecture for a weblog data mining system. Our objective is to allow users to interactively understand the blogspace by providing a system framework for retrieving relevant weblogs and obtaining highlighted information. We focus on two important technical components in the system. The first is weblog ranking. We introduce weighted link-based weblog ranking, which ranks the popularity of weblogs according to their entry semantic content and time delay of citation. Furthermore, our weblog ranking algorithms provide the flexibility to rank weblogs not only based on their different roles in the society, but also based on end-users’ different ranking interests. The second component is hot story summarization. A hot story is the discussion that attracts various weblogs’ attention. Influential bloggers are useful in identifying hot conversations because these bloggers are likely to be the leader in such conversations. We propose a method based on first discovering weblogs that take important roles in the society, and then extracting hot story from these important weblogs.