Sandip Debnath, Prasenjit Mitra, and C. Lee Giles, The Pennsylvania State University
An event without a time-line does not carry much information. Description of an event is useful only when it can be augmented with the time-line of its occurrence. This is more important with the on-line publishing of news articles. News articles are nothing but a set of text-based descriptions of events. Therefore the actual time-lines of the article as well as each individual event are most important ingredients for their informativeness. We introduce a novel approach to find the actual time-lines of news articles whenever available, and tag them with this temporal information. This involves a temporal baseline, which needs to be established for the entire article. Temporal baseline is defined as the date (and possibly time) of when the article had first been published, as stated in the article itself. Without a precise and correct temporal baseline, no further processing of individual events can be possible. We approached this problem of accurately finding the temporal baseline, with a Support-Vector based classification method. We found that the proper choice of parameters to train the Support-Vector classifier can result in high accuracy. We showed the data collection phase, training phase, and the testing phase and report the accuracy of our method for news articles from 26 different Web-sites. From this result we can claim that our approach can be used to find the temporal baseline of a news article very accurately.