Estimating the rate of Web page updates helps in improving the Web crawler's scheduling policy. But, most of the Web sources are autonomous and updated independently. Clients like Web crawlers are not aware of when and how often the sources change. Unlike other studies, the process of Web page updates is modeled as non-homogeneous Poisson process and focus on determining localized rate of updates. Then various rate estimators are discussed, showing experimentally how precise they are. This paper explores two classes of problems. Firstly the localized rate of updates is estimated by dividing the given sequence of independent and inconsistent update points into consistent windows. From various experimental comparisons, the proposed Weibull estimator outperforms Duane plot(another proposed estimator) and other estimators proposed by Cho et la. and Norman Matloff in 91.5%(90.6%) of the whole windows for synthetic(real Web) datasets. Secondly, the future update points are predicted based on most recent window and it is found that Weibull estimator has higher precision compared to other estimators.
Subjects: 15.8 Simulation; 12.2 Scientific Discovery
Submitted: Oct 13, 2006