Wee Sun Lee and Bing Liu
The problem of learning with positive and unlabeled examples arises frequently in retrieval applications. We transform the problem into a problem of learning with noise by labeling all unlabeled examples as negative and use a linear function to learn from the noisy examples. To learn a linear function with noise, we perform logistic regression after weighting the examples to handle noise rates of greater than a half. With appropriate regularization, the cost function of the logistic regression problem is convex, allowing the problem to be solved efficiently. We also propose a performance measure that can be estimated from positive and unlabeled examples for evaluating retrieval performance. The measure, which is proportional to the product of precision and recall, can be used with a validation set to select regularization parameters for logistic regression. Experiments on a text classification corpus show that the methods proposed are effective.