Handling Imbalanced Data Sets in Insurance Risk Modeling

Authors

Edwin P. D. Pednault

Barry K. Rosen

and Chidanand Apte

Track:

Contents

Downloads:

Download PDF

Abstract:

As owners of cars, homes, and other property, consumers buy property and casualty insurance to protect themselves against the unexpected: i.e., accidents, fire, theft, etc. Such events occur very rarely at the level of individual policyholders. Data sets constructed for the purpose of insurance risk modeling are therefore highly imbalanced. In any given time period, most policyholders file no claims, a small percentage file one claim, and an even smaller percentage file two or more claims. This paper presents some of the tree-based learning techniques we have developed to model insurance risks. Two important aspects of our approach that distinguish it from other tree-based methods are that it incorporates a split-selection criterion tailored to the specific statistical characteristics of insurance data, and it uses constraints on the statistical accuracies of model parameter estimates to guide the construction of splits in order to overcome selection biases that arise because of the imbalance that is present in the data.

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.