Linear-Time Rule Induction

Authors

Pedro Domingos

Track:

All Contents

Downloads:

Abstract:

The recent emergence of data mining as a major application of machine learning has led to increased interest in fast rule induction algorithms. These are able to efficiently pro cess large numbers of examples, under the constraint of still achieving good accuracy. If e is the number of examples, many rule learners have O(e^4) asymptotic time complexity in noisy domains, and C4.5RULES has been empirically observed to sometimes require O(e^3). Recent advances have brought this bound down to O(elog^2(e)), while maintaining accuracy at the level of C4.5RULES’s. In this paper we present CWS, a new algorithm with guaranteed O(e) complexity, and verify that it outperforms C4.5RULES and CN2 in time, accuracy and output size on two large datasets. For example, on NASA’s space shuttle database, running time is reduced from over a month (for C4.5RULES) to a few hours, with a slight gain in accuracy. CWS is based on interleaving the induction of all the rules and evaluating performance globally instead of locally (i.e., it uses a "conquering without separating" strategy as opposed to a "separate and conquer" one). Its bias is appropriate to domains where the underlying concept is simple and the data is plentiful but noisy.

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.