Discovering Trends in Text Databases

Brian Lent, Rakesh Agrawal, Ramakrishnan Srikant

We address the problem of discovering trends in text databases. Trends can be used, for example, to discover that a company is shifting interests from one domain to another. We are given a database D of documents. Each document consists of one or more text fields and a timestamp. The unit of text is a word and a phrase is a list of words. (We defer the discussion of more complex structures till the "Methodology" section.) Associates with each phrase is a history of the frequency of occurrence of the phrase, obtained by partitioning the documents based upon their timestamps. The frequency of occurrence in a particular time period is the number of documents that contain the phrase. (Other measures of frequency are possible, e.g. counting each occurrence of the phrase in a document.) A trend is a specific subsequence of the history of a phrase that satisfies the users’ query over the histories. For example, the user may specify a "spike" query to finds those phrases whose frequency of occurrence increased and then decreased.


This page is copyrighted by AAAI. All rights reserved. Your use of this site constitutes acceptance of all of AAAI's terms and conditions and privacy policy.