Ray Liere and Prasad Tadepalli
With the advent of large distributed and dynamic document collections (such as are on the World Wide Web), it is becoming increasingly important to automate the task of text categ~zation. The use of machine learning in text categorization is difficult due to characteristics of the domain~ including a very large number of input features, noise, and the problems associated with semantic analysis of text. As a result, the use of mpervised learning requires a relatively large number of labeled examples. We explore the possibility of using (almost) unsupervised learning and propose some novel approaches to using machine learning in this domain.