Bing Liu, Minqing Hu, and Wynne Hsu, National University of Singapore
Producing too many rules is a major problem with many data mining techniques. This paper argues that one of the key reasons for the large number of rules is that an inefficient knowledge representation scheme has been used. The current predominant representation of the discovered knowledge is the if-then rules. This representation often severely fragments the knowledge that exists in the data, thereby resulting in a large number of rules. The fragmentation also makes the discovered rules hard to understand and to use. In this paper, we propose a more efficient representation scheme, called general rules and exceptions. In this representation, a unit of knowledge consists of a single general rule and a set of exceptions. This scheme reduces the complexity of the discovered knowledge substantially. It is also intuitive and easy to understand. This paper focuses on using the representation to express the knowledge embedded in a decision tree. An algorithm that converts a decision tree to the new representation is presented. Experiment results show that the new representation dramatically simplifies the decision tree. Real-life applications also confirm that this representation is more intuitive to the human user.