Kazuo J. Ezawa, Steve W. Norton
Fraud and uncollectible debt are multi-billion dollar problems in the telecommunications industry. Because it is difficult to know which accounts will go bad, we are faced with the difficult knowledge-discovery task of characterizing a rare binary outcome using large amounts of noisy, high-dimensional data. Binary characterizations may be of interest but will not be especially useful in this domain. Instead, proposing an action requires an estimate of the probability that a customer or a call is uncollectible. This paper addresses the discovery of predictive knowledge bearing on fraud and uncollectible debt using a supervised machine learning method that constructs Bayesian network models. The new method is able to predict rare event outcomes and cope with the quirks and copious amounts of input data. The Baysian network models it produces serve as an input module to a normative decision-support system and suggest ways to reinforce or redirect existing efforts in the problem area. We compare the performance of several conditionally independent models with the conditionally dependent models discovered by the new learning system using real-world datasets of 4-6 million records and 603,800 million bytes.