Ping Chen, Rakesh Verma, Janet C. Meininger, Wenyaw Chan
When applying association mining to real datasets, a major obstacle is that often a huge number of rules are generated even with very reasonable support and confidence. Among these rules, many are trivial, redundant, semantically wrong, or already known by end-users. Association rule post-processing aims to remove these undesired rules. Existing work mainly focuses on reducing redundant or finding unexpected association rules. In this paper, we propose an innovative method based on semantic network. We semantically divide association rules into five categories: trivial, known and correct, unknown and correct, known and incorrect, unknown and incorrect. Our method can be efficiently integrated with existing rule reduction techniques to construct a concise, high-quality, and user-specific association rule set. We evaluate our approach on a real public-health dataset, the Heartfelt study, and we can prune off 97.81% of association rules that are trivial or incorrect. The remaining rules are confirmed by either health science literature or a high-quality biomedical knowledge base.
Subjects: 12. Machine Learning and Discovery; 12.2 Scientific Discovery
Submitted: Feb 21, 2008