## Probabilistic estimation based data mining for discovering insurance risks (1999)

Venue: | IEEE Intelligent Systems |

Citations: | 12 - 7 self |

### BibTeX

@ARTICLE{Apte99probabilisticestimation,

author = {C. Apte and E. Grossman and E. Pednault and B. Rosen and F. Tipu and B. White},

title = {Probabilistic estimation based data mining for discovering insurance risks},

journal = {IEEE Intelligent Systems},

year = {1999},

volume = {14},

pages = {49--58}

}

### OpenURL

### Abstract

The UPA (Underwriting Profitability Analysis) application embodies a new approach to mining Property & Casualty (P&C) insurance policy and claims data for the purpose of constructing predictive models for insurance risks. UPA utilizes the ProbE (Probabilistic Estimation) predictive modeling data mining kernel to discover risk characterization rules by analyzing large and noisy insurance data sets. Each rule defines a distinct risk group and its level of risk. To satisfy regulatory constraints, the risk groups are mutually exclusive and exhaustive. The rules generated by ProbE are statisticallyrigorous, interpretable, and credible from an actuarial standpoint. Our approach to modeling insurance risks and the implementation of that approach have been validated in an actual engagement with a P&C firm. The benefit assessment of the results suggest that this methodology provides significant value to the P&C insurance risk management process.

### Citations

8958 | Statistical Learning Theory
- Vapnik
- 1998
(Show Context)
Citation Context ...ly underestimates the expected value of the score that would be obtained if the true statistical properties of the data were already known. Results from statistical learning theory (see, for example, =-=[10]-=-) demonstrate that, although there is always some probability that underestimation will occur for a given model, both the probability and the degree of underestimation are increased by the fact that w... |

4921 |
C4.5 Programs for Machine Learning
- Quinlan
- 1993
(Show Context)
Citation Context ...t a probability. Claim severity is more straightforward. It is simply the average dollar amount per claim. If one were forced to use standard data mining algorithms, such as CHAID [1], CART [2], C4.5 =-=[3]-=-, or SPRINT [4], one might try to view frequency modeling as a classification problem and severity modeling as a regression problem. However, further examination suggests that these modeling tasks are... |

3894 |
Classification and regression trees
- Breiman
- 1984
(Show Context)
Citation Context ...a rate, not a probability. Claim severity is more straightforward. It is simply the average dollar amount per claim. If one were forced to use standard data mining algorithms, such as CHAID [1], CART =-=[2]-=-, C4.5 [3], or SPRINT [4], one might try to view frequency modeling as a classification problem and severity modeling as a regression problem. However, further examination suggests that these modeling... |

250 | M.: SPRINT: A scalable parallel classifier for data mining
- Shafer, Agrawal, et al.
- 1996
(Show Context)
Citation Context .... Claim severity is more straightforward. It is simply the average dollar amount per claim. If one were forced to use standard data mining algorithms, such as CHAID [1], CART [2], C4.5 [3], or SPRINT =-=[4]-=-, one might try to view frequency modeling as a classification problem and severity modeling as a regression problem. However, further examination suggests that these modeling tasks are unlike standar... |

157 |
An Exploratory Technique for Investigating Large Quantities of Categorical Data
- Kass
- 1980
(Show Context)
Citation Context ...refers to a rate, not a probability. Claim severity is more straightforward. It is simply the average dollar amount per claim. If one were forced to use standard data mining algorithms, such as CHAID =-=[1]-=-, CART [2], C4.5 [3], or SPRINT [4], one might try to view frequency modeling as a classification problem and severity modeling as a regression problem. However, further examination suggests that thes... |

94 |
Loss models: From data to decisions
- Klugman, Panjer, et al.
- 1998
(Show Context)
Citation Context ... of modeling insurance risks. Actuarial science is based on the construction and analysis of statistical models that describe the process by which claims are filed by policyholders (see, for example, =-=[9]-=-). Different types of insurance often require the use of different statistical models. The statistical models that are incorporated into the current version of ProbE are geared toward property and cas... |

91 |
Introduction to robust estimation and hypothesis testing
- Wilcox
- 1997
(Show Context)
Citation Context ...wed with long thick tails. Reliance on the Gaussian assumption for modeling individual claims can lead to suboptimal results, which is a well-known problem from the point of view of robust estimation =-=[5]-=-. A more fundamental obstacle for standard data mining algorithms is that specialized, domain-specific equations must be used for estimating frequency and severity. Equations for estimating frequency ... |

44 | Large data sets lead to overly complex models: an explanation and a solution
- Oates, Jensen
- 1998
(Show Context)
Citation Context ..., as indicated by the increase in lift. Accurate risk models are thus obtained only from large training sets. On the surface, these results seem to contradict the results obtained by Oates and Jensen =-=[12]-=- for classification tree algorithms. Their experiments demonstrate that the error rates of decision tree classifiers tend to rapidly reach a plateau as the number of training records increases. In fac... |

11 |
A Statistical Perspective on Data Mining. Future Generation Computer Systems
- Hosking, Pednault, et al.
- 1997
(Show Context)
Citation Context ...to handle because they are not designed to perform the necessary calculations to ensure that only actuarially credible risk groups are identified. The above challenges have motivated our own research =-=[6, 7]-=- and have lead to the development of the IBM ProbE TM (Probabilistic Estimation) predictive modeling kernel. This C++ kernel embodies several innovations that address the challenges posed by insurance... |

4 | Statistical Learning Theory
- Pednault
- 1998
(Show Context)
Citation Context ...to handle because they are not designed to perform the necessary calculations to ensure that only actuarially credible risk groups are identified. The above challenges have motivated our own research =-=[6, 7]-=- and have lead to the development of the IBM ProbE TM (Probabilistic Estimation) predictive modeling kernel. This C++ kernel embodies several innovations that address the challenges posed by insurance... |

1 | Insurance Risk Modeling Using Data Mining Technology
- Apte, Grossman, et al.
- 1998
(Show Context)
Citation Context ... modeling insurance risks are not only integrated into the algorithms, they are in fact used to help guide the search for risk groups. The IBM UPA TM (Underwriting Profitability Analysis) application =-=[8]-=- is built around ProbE and provides the infrastructure for using ProbE to construct rule-based risk models. UPA was designed with input from marketing, underwriting, and actuarial end-users. The graph... |

1 |
Looking for Patterns. The Wall Street Journal
- Bransten
- 1999
(Show Context)
Citation Context ...s than are other motorists, the UPA discovered that if the sports car was not the only vehicle in the household, then the accident rate is not much greater than that of a regular car. In one estimate =-=[11]-=-, “just letting Corvettes and Porsches into [the insurer’s] ‘preferred premium’ plan could bring in an additional $4.5 million in premium revenue over the next two years without a significant rise in ... |