Jude Shavlik and Jeremy Goecks
For intelligent Web browsers attempting to learn a user’s interests, the cost of obtaining labeled training instances can be prohibitive because the user must directly label each training instance, and few users are willing to do so. We have been developing an approach that circumvents the need for human-labeled pages. Instead, we learn "surrogate" tasks where the desired output is already being measured by modern operating systems and Web browsers, such as the number of hyperlinks clicked on a page, the amount of scrolling performed, and the number of CPU cycles used. Our assumption is that some weighted combination of these easily obtained measurements will highly correlate with the user’s interests. In other words, by unobtrusively "observing" the user’s behavior we are able to automatically construct labeled training examples for learning useful functions with which we can estimate the user’s interest in a Web page. We report the results of a pilot study.