Werner10.Multivariate

From CAS Exam 5
Jump to navigation Jump to search

Reading: BASIC RATEMAKING, Fifth Edition, May 2016, Geoff Werner, FCAS, MAAA & Claudine Modlin, FCAS, MAAA Willis Towers Watson

Chapter 10: Multivariate Classification

Pop Quiz

Identify criteria for evaluating rating variables [Hint: LOSS]   Click for Answer 

Study Tips

VIDEO: W-10 (001) Multivariate Methods → 3:30 Forum

Almost every exam question from this chapter asks you to analyze a graph of GLM output (Generalized Linear Model). The fastest way to get up to speed on this chapter is to scan the wiki article then do the old exam problems starting from the most recent and working your way backwards. It's an easy chapter because it's mostly common sense. All you need is some basic background information and then practice in analyzing graphical results.

There is further information in Appendix F of Werner which you should cover at the same time you study this chapter. I'll remind you about further down in this wiki article.

Estimate study time: 1 day (not including subsequent review time)

BattleTable

Based on past exams, the main things you need to know (in rough order of importance) are:

  • analyze graph - assess model results and/or determine whether a specific rating variable should be used in the classification plan
  • advantages and disadvantages of multivariate methods
reference part (a) part (b) part (c) part (d)
E (2019.Fall #10) analyze graph
- identify test
analyze graph
- state conclusion
alternative tests
- identify options
E (2019.Fall #11) analyze graph
- relativities
analyze graph
- assess model results
one-way analysis
- impact on profitability 1
E (2019.Spring #9) analyze graph
- assess model results
E (2018.Fall #9) analyze graph
- rating variables
GLMs
- challenges
GLMs
- versus univariate
(2018.Spring #13) BattleActs PowerPack
E (2017.Fall #11) analyze graph
- rating variables
multivariate methods
- versus univariate
spatial smoothing
- territory relativities
E (2017.Spring #8) analyze graph
- rating variables
multivariate methods
- when they don't work
E (2016.Fall #12) analyze graph
- rating variables
GLMs
- loss cost / loss ratio
E (2015.Fall #10) analyze graph
- rating variables
evaluate rating variable
- 2 criteria
evaluate rating variable
- mileage
E (2014.Fall #10) analyze graph
- rating variables
data mining
- enhancing GLMs
E (2014.Spring #9) analyze graph
- rating variables
E (2013.Spring #12) analyze graph
- propose factors
1 The first sample answer for part (c) of this question has an error. It states that for industries 1-4, the one-way analysis produces lower factors than the GLM analysis. This is reversed: For industries 1-4, the one-way analysis produces higher factors and would overcharge customers.

Full BattleQuiz You must be logged in or this will not work.

In Plain English!

Univariate Methods - Shortcomings

Let's step back for a moment and think about what we did in earlier chapters. When doing a rate analysis, we first determined either an overall average rate using the pure premium method or an overall average rate change using the loss ratio method. But that isn't enough. We can't charge everyone the same rate. Doing so would mean some risks are overcharged while others are undercharged and this leads to adverse selection. An insurer's book of business generally has a wide variety of risks and this variation has to be reflected in the rates.

To avoid adverse selection, we create a risk classification system, which is a system for assigning risks to groups based on expected cost. Each risk group then receives a modifier or relativity that adjusts the average rate (or rate change) either up or down depending the group's risk profile. In chapter 9, we calculated these relativities using univariate methods. The problem with univariate methods however is that they consider each variable in isolation from all others. In other words, adjustments to relativities for a particular rating variable do not take into account the effect on other rating variables. For example, older cars may exhibit higher losses, but this may be because older cars have younger drivers. A univariate analysis might not recognize this correlation and so adverse selection could still be present.

This chapter covers methods to address these issues.

Minimum Bias Procedures

Minimum bias is described in the text as an iteratively standardized approach. The text also provides a detailed numerical example. Note however that the minimum bias procedure is not listed under Learning Objective 8 in the Ratemaking section of the CAS Exam 5 syllabus.

Multivariate Methods

The main topic in this chapter is GLMs or Generalized Linear Models. Before diving in however, let's put multivariate methods in context. Their theoretical basis existed well before they were put into common practice.

Question: why were multivariate methods adopted near the turn of the 21st century
  • increase in computing power - multivariate methods require significant computing power
  • data warehouse initiatives - stored data is more granular (unaggregated) and multivariate methods can leverage this granularity
  • competitive pressure - if your competitors are getting better at differentiating risks using multivariate methods then you must do the same or be subject to adverse selection
Question: identify benefits of multivariate methods [Hint: RIDS - they rid univariate methods of their shortcomings]
  • they Remove noise and capture signal
  • they allow consideration of Interactions between rating variables
  • they produce Diagnostics
  • they consider all rating variables Simultaneously (and automatically adjust for exposure correlations between rating variables)

mini BattleQuiz 1

GLMs

Mathematical Foundation

Exam 5 Syllabus Note 1: The CAS Exam 5 syllabus omits the section from Werner on the mathematical foundation of GLMs. Taking the syllabus at face value, you should be able to skip this section.

GLMs have become the standard method for classification ratemaking and generally outperform other multivariate methods such as neural networks. Also, the output is a series of multipliers which fits perfectly into existing rating algorithms.

The text provides only a very brief description of the math behind GLMs so you should not have to perform any heavy calculations. The emphasis is very clearly on interpreting graphs with GLM outputs. You probably shouldn't skip this entirely, but the good news is that you can cover it in a few minutes. (Ian-the-Intern might need a little longer.)

The key to ratemaking is being able to predict loss costs. If there is a linear relationship between rating variables and loss costs you can use linear regression and you don't need GLMs. But what if the relationship isn't linear?

Question: How do you find an equation that predicts loss costs if the relationship to the rating variables isn't linear?

If you're given a hard problem like finding a nonlinear relationship, a standard mathematical trick is to transform the hard problem into an easy problem and solve the easy problem instead. Once you have the solution to the easy problem, you apply the inverse transformation to get back to the solution to the hard problem. This is the idea behind GLMs. You transform a nonlinear (exponential) problem into a linear problem, solve the linear problem, then apply the inverse transformation to solution of the linear problem to get the solution to the original problem. The key to making this work is something called the link function.

  • Suppose you're given a data set where loss cost is the response variable and the rating variables are the predictor variables.
→ Assume there is a nonlinear (exponential) relationship between the response variable and predictor variables.
  • Transform the data set using an appropriately chosen link function:
→ The link function is chosen so that the transformed data exhibits a linear relationship
  • Perform linear regression on the transformed data set to find the appropriate parameters
→ Apply the inverse of the link function to the results of the linear regression.

The link function g(x) is very often log(x). GLMs also relax the assumptions on the error term ε made by linear models Y = μ + ε. (Linear models assume ε has a mean 0 and constant variance, σ2). There is a lot more to it, but the full methodology is beyond the scope of the syllabus.

GLM Output

Exam 5 Syllabus Note 2: The rest of the GLM material below is listed on the syllabus. Don't skip it!

The text has a really great graphical example of a GLM and I've linked to it below, along with a few summary comments from Alice. First however, you need to know that a GLM analysis should be performed on loss cost data not loss ratio data. Actually, it's even better to split loss costs into frequency and severity.

Question: identify statistical and practical reasons for using loss cost data rather than loss ratio data in a GLM analysis
  • loss ratios require EP @ CRL (Earned Premium at Current Rate Level) and this may not be available on a granular level
  • actuaries have intuition for frequency and severity thus can distinguish signal versus noise (not true for loss ratios because they depend on EP @ CRL)
  • loss ratio models become obsolete when rating structures change
  • loss ratios have no commonly accepted distribution

Memorize that! That exact question was part (b) of:

E (2016.Fall #12)

Ok, here's the example: (see below for Alice's comments)

Werner10 (010) GLM example.png

Here's what you're supposed to notice about this graph:

  • a vehicle symbol represents a group of policies with similar risk characteristics
  • axis labels:
    • bottom axis: vehicle symbol
    • left axis: relativity
    • right axis: frequency
  • the base level of the rating variable is vehicle symbol 4:
→ you know this because the relativity for symbol 4 on the left axis is 1.00
→ the base level is normally the level with the highest exposure volume (or among the highest as the distribution can change over time)
  • model output:
→ circles on the green line show the multivariate or GLM result (considers all rating variables simultaneously)
→ squares on the pink line how the univariate or one-way result (considers vehicle symbol in isolation from other rating variables)
→ both results show that vehicle symbol has predictive power because there is a clear upward trend in both lines (GLM results are stronger)
  • difference between GLM and one-way result:
→ vehicle symbol is likely correlated with another rating variable
→ the GLM result identified this whereas the one-way result did not (that's the reason for the difference)

The final selected relativities are subject to judgment but with GLMs the results of one variable are only valid if the results for all other key variables are also being used. In other words, you can't mess around with the symbol relativities without messing up the relativities for all the other rating variables. With GLMs, everything is connected (unlike with the one-way analysis.)

You can give part (a) of this exam problem a try but you need to know the chi-square test from Appendix F - Intro.

E (2017.Fall #11)

GLM Diagnostics

Earlier in this wiki article, we identified benefits of multivariate methods. One of those benefits is that multivariate methods produce diagnostics. Statistical diagnostics help the modeler determine whether a variable has a systematic effect on losses and should therefore be included.

You should spend about half a day studying Appendix F (click link) for the following 2 topics:

Deciding whether a variable has a systematic effect on losses:

  • parameters and standard errors test
  • consistency test (over time or random grouping)
  • statistical test (deviance test)

Model validation:

  • fitted vs actual results for the entire data set
  • fitted vs actual results for a holdout sample

Practical Considerations

This is a short section. It mentions that GLM routines are generally included in commercial software packages so actuaries don't need to program the routines themselves. There are still areas where attention is required however such as:

  • ensuring data is appropriate
  • reviewing model results - with regard to statistical considerations (further analysis may be necessary)
  • reviewing model results - with regard to business considerations
  • communicating results - policyholder dislocation, competitive position

Data Mining

This is another fairly short section and you can cover it pretty quickly. The main concept is that data mining methods can be used to enhance a classification analysis.

Question identify and briefly describe 5 data mining methods   [Hint: FC-CART-MARS-NN  ← worst hint ever - its almost as long as the actual answer]
Factor analysis
  • reduces the number of rating variables by recognizing correlations between variables
Cluster analysis
  • combines small groups of similar risks into larger homogenous clusters (often used to combine geographical areas)
CART or Classification and Regression Trees
  • classification of risks is done using a decision tree (Ex: if gender = "male" then take left branch, otherwise take right branch...)
MARS or Multivariate Adaptive Regression Spline
  • selects breakpoints and ranges for a continuous variable such as Amount of Insurance (uses piecewise linear regression)
NN or Neural Networks
  • uses training algorithms so the computer can "learn" the structure of the data
Question how can data mining methods enhance a classification analysis
  • reduce the number of explanatory variables that need to be considered
  • reduce the number of levels of multi-level discrete variables (by grouping similar levels together)
  • suggest how to categorize discrete variables
  • identify candidates for interaction variables within GLMs (by detecting patterns of interdependency between variables)
→ My memory trick is to remember the underlined words: reduce – reduce – suggest – identify.

External Data

Sorry to give you yet another bullet list to memorize. It isn't likely that this will be asked on the exam so it's probably not a big deal if you skip it. I'd suggest at least reading it over once however because it isn't hard to remember.

Question: what types of external data might an insurer use to augment their GLM analysis
  • geo-demographics (Ex: population density)
  • weather (Ex: annual rainfall)
  • property characteristics (Ex: square footage of home)
  • personal information about insured (Ex: credit rating)

Here's a quiz covering some miscellaneous facts from this chapter.

mini BattleQuiz 2

Exam Problems

If you've read over the examples in from this chapter (chapter 10) and the examples from Appendix F then you're ready for the old exam problems. I don't think they are very hard. They mainly rely on common sense and just a few basic facts about GLMs.

More recent problems:

mini BattleQuiz 3

Older problems:

mini BattleQuiz 4

Full BattleQuiz

POP QUIZ ANSWERS

Go back