Werner10.Multivariate

Reading: BASIC RATEMAKING, Fifth Edition, May 2016, Geoff Werner, FCAS, MAAA & Claudine Modlin, FCAS, MAAA Willis Towers Watson

Chapter 10: Multivariate Classification

Pop Quiz

Identify criteria for evaluating rating variables [Hint: LOSS] Click for Answer

Study Tips

VIDEO: W-10 (001) Multivariate Methods → 3:30 Forum

Almost every exam question from this chapter asks you to analyze a graph of GLM output (Generalized Linear Model). The fastest way to get up to speed on this chapter is to scan the wiki article then do the old exam problems starting from the most recent and working your way backwards. It's an easy chapter because it's mostly common sense. All you need is some basic background information and then practice in analyzing graphical results.

There is further information in Appendix F of Werner which you should cover at the same time you study this chapter. I'll remind you about further down in this wiki article.

Estimate study time: 1 day (not including subsequent review time)

BattleTable

Based on past exams, the main things you need to know (in rough order of importance) are:

analyze graph - assess model results and/or determine whether a specific rating variable should be used in the classification plan
advantages and disadvantages of multivariate methods

reference	part (a)	part (b)	part (c)
E (2019.Fall #10)	analyze graph - identify test	analyze graph - state conclusion	alternative tests - identify options
E (2019.Fall #11)	analyze graph - relativities	analyze graph - assess model results	one-way analysis - impact on profitability ¹
E (2019.Spring #9)	analyze graph - assess model results
E (2018.Fall #9)	analyze graph - rating variables	GLMs - challenges	GLMs - versus univariate
(2018.Spring #13)	BattleActs PowerPack
E (2017.Fall #11)	analyze graph - rating variables	multivariate methods - versus univariate	spatial smoothing - territory relativities
E (2017.Spring #8)	analyze graph - rating variables	multivariate methods - when they don't work
E (2016.Fall #12)	analyze graph - rating variables	GLMs - loss cost / loss ratio
E (2015.Fall #10)	analyze graph - rating variables	evaluate rating variable - 2 criteria	evaluate rating variable - mileage
E (2014.Fall #10)	analyze graph - rating variables	data mining - enhancing GLMs
E (2014.Spring #9)	analyze graph - rating variables
E (2013.Spring #12)	analyze graph - propose factors

¹ The first sample answer for part (c) of this question has an error. It states that for industries 1-4, the one-way analysis produces lower factors than the GLM analysis. This is reversed: For industries 1-4, the one-way analysis produces higher factors and would overcharge customers.

Full BattleQuiz You must be logged in or this will not work.

In Plain English!

Univariate Methods - Shortcomings

Let's step back for a moment and think about what we did in earlier chapters. When doing a rate analysis, we first determined either an overall average rate using the pure premium method or an overall average rate change using the loss ratio method. But that isn't enough. We can't charge everyone the same rate. Doing so would mean some risks are overcharged while others are undercharged and this leads to adverse selection. An insurer's book of business generally has a wide variety of risks and this variation has to be reflected in the rates.

To avoid adverse selection, we create a risk classification system, which is a system for assigning risks to groups based on expected cost. Each risk group then receives a modifier or relativity that adjusts the average rate (or rate change) either up or down depending the group's risk profile. In chapter 9, we calculated these relativities using univariate methods. The problem with univariate methods however is that they consider each variable in isolation from all others. In other words, adjustments to relativities for a particular rating variable do not take into account the effect on other rating variables. For example, older cars may exhibit higher losses, but this may be because older cars have younger drivers. A univariate analysis might not recognize this correlation and so adverse selection could still be present.

This chapter covers methods to address these issues.

Minimum Bias Procedures

Minimum bias is described in the text as an iteratively standardized approach. The text also provides a detailed numerical example. Note however that the minimum bias procedure is not listed under Learning Objective 8 in the Ratemaking section of the CAS Exam 5 syllabus.

Multivariate Methods

The main topic in this chapter is GLMs or Generalized Linear Models. Before diving in however, let's put multivariate methods in context. Their theoretical basis existed well before they were put into common practice.

Question: why were multivariate methods adopted near the turn of the 21^st century

increase in computing power - multivariate methods require significant computing power
data warehouse initiatives - stored data is more granular (unaggregated) and multivariate methods can leverage this granularity
competitive pressure - if your competitors are getting better at differentiating risks using multivariate methods then you must do the same or be subject to adverse selection

Question: identify benefits of multivariate methods [Hint: RIDS - they rid univariate methods of their shortcomings]

they Remove noise and capture signal
they allow consideration of Interactions between rating variables
they produce Diagnostics
they consider all rating variables Simultaneously (and automatically adjust for exposure correlations between rating variables)

mini BattleQuiz 1

GLMs

Mathematical Foundation

Exam 5 Syllabus Note 1: The CAS Exam 5 syllabus omits the section from Werner on the mathematical foundation of GLMs. Taking the syllabus at face value, you should be able to skip this section.

GLMs have become the standard method for classification ratemaking and generally outperform other multivariate methods such as neural networks. Also, the output is a series of multipliers which fits perfectly into existing rating algorithms.

The text provides only a very brief description of the math behind GLMs so you should not have to perform any heavy calculations. The emphasis is very clearly on interpreting graphs with GLM outputs. You probably shouldn't skip this entirely, but the good news is that you can cover it in a few minutes. (Ian-the-Intern might need a little longer.)

The key to ratemaking is being able to predict loss costs. If there is a linear relationship between rating variables and loss costs you can use linear regression and you don't need GLMs. But what if the relationship isn't linear?

Question: How do you find an equation that predicts loss costs if the relationship to the rating variables isn't linear?

If you're given a hard problem like finding a nonlinear relationship, a standard mathematical trick is to transform the hard problem into an easy problem and solve the easy problem instead. Once you have the solution to the easy problem, you apply the inverse transformation to get back to the solution to the hard problem. This is the idea behind GLMs. You transform a nonlinear (exponential) problem into a linear problem, solve the linear problem, then apply the inverse transformation to solution of the linear problem to get the solution to the original problem. The key to making this work is something called the link function.

Suppose you're given a data set where loss cost is the response variable and the rating variables are the predictor variables.

→ Assume there is a nonlinear (exponential) relationship between the response variable and predictor variables.

Transform the data set using an appropriately chosen link function:

→ The link function is chosen so that the transformed data exhibits a linear relationship

Perform linear regression on the transformed data set to find the appropriate parameters

→ Apply the inverse of the link function to the results of the linear regression.

The link function g(x) is very often log(x). GLMs also relax the assumptions on the error term ε made by linear models Y = μ + ε. (Linear models assume ε has a mean 0 and constant variance, σ²). There is a lot more to it, but the full methodology is beyond the scope of the syllabus.

GLM Output

Exam 5 Syllabus Note 2: The rest of the GLM material below is listed on the syllabus. Don't skip it!

The text has a really great graphical example of a GLM and I've linked to it below, along with a few summary comments from Alice. First however, you need to know that a GLM analysis should be performed on loss cost data not loss ratio data. Actually, it's even better to split loss costs into frequency and severity.

Question: identify statistical and practical reasons for using loss cost data rather than loss ratio data in a GLM analysis

loss ratios require EP @ CRL (Earned Premium at Current Rate Level) and this may not be available on a granular level
actuaries have intuition for frequency and severity thus can distinguish signal versus noise (not true for loss ratios because they depend on EP @ CRL)
loss ratio models become obsolete when rating structures change
loss ratios have no commonly accepted distribution

Memorize that! That exact question was part (b) of:

E (2016.Fall #12)

Ok, here's the example: (see below for Alice's comments)

Here's what you're supposed to notice about this graph:

a vehicle symbol represents a group of policies with similar risk characteristics

axis labels:
- bottom axis: vehicle symbol
- left axis: relativity
- right axis: frequency

the base level of the rating variable is vehicle symbol 4:

→ you know this because the relativity for symbol 4 on the left axis is 1.00

→ the base level is normally the level with the highest exposure volume (or among the highest as the distribution can change over time)

model output:

→ circles on the green line show the multivariate or GLM result (considers all rating variables simultaneously)

→ squares on the pink line how the univariate or one-way result (considers vehicle symbol in isolation from other rating variables)

→ both results show that vehicle symbol has predictive power because there is a clear upward trend in both lines (GLM results are stronger)

difference between GLM and one-way result:

→ vehicle symbol is likely correlated with another rating variable

→ the GLM result identified this whereas the one-way result did not (that's the reason for the difference)

The final selected relativities are subject to judgment but with GLMs the results of one variable are only valid if the results for all other key variables are also being used. In other words, you can't mess around with the symbol relativities without messing up the relativities for all the other rating variables. With GLMs, everything is connected (unlike with the one-way analysis.)

You can give part (a) of this exam problem a try but you need to know the chi-square test from Appendix F - Intro.

E (2017.Fall #11)

GLM Diagnostics

Earlier in this wiki article, we identified benefits of multivariate methods. One of those benefits is that multivariate methods produce diagnostics. Statistical diagnostics help the modeler determine whether a variable has a systematic effect on losses and should therefore be included.

You should spend about half a day studying Appendix F (click link) for the following 2 topics:

Deciding whether a variable has a systematic effect on losses:

parameters and standard errors test
consistency test (over time or random grouping)
statistical test (deviance test)

Model validation:

fitted vs actual results for the entire data set
fitted vs actual results for a holdout sample

Practical Considerations

This is a short section. It mentions that GLM routines are generally included in commercial software packages so actuaries don't need to program the routines themselves. There are still areas where attention is required however such as:

ensuring data is appropriate
reviewing model results - with regard to statistical considerations (further analysis may be necessary)
reviewing model results - with regard to business considerations
communicating results - policyholder dislocation, competitive position

Data Mining

This is another fairly short section and you can cover it pretty quickly. The main concept is that data mining methods can be used to enhance a classification analysis.

Question identify and briefly describe 5 data mining methods [Hint: FC-CART-MARS-NN ← worst hint ever - its almost as long as the actual answer]

Factor analysis

reduces the number of rating variables by recognizing correlations between variables

Cluster analysis

combines small groups of similar risks into larger homogenous clusters (often used to combine geographical areas)

CART or Classification and Regression Trees

classification of risks is done using a decision tree (Ex: if gender = "male" then take left branch, otherwise take right branch...)

MARS or Multivariate Adaptive Regression Spline

selects breakpoints and ranges for a continuous variable such as Amount of Insurance (uses piecewise linear regression)

NN or Neural Networks

uses training algorithms so the computer can "learn" the structure of the data

Question how can data mining methods enhance a classification analysis

reduce the number of explanatory variables that need to be considered
reduce the number of levels of multi-level discrete variables (by grouping similar levels together)
suggest how to categorize discrete variables
identify candidates for interaction variables within GLMs (by detecting patterns of interdependency between variables)

→ My memory trick is to remember the underlined words: reduce – reduce – suggest – identify.

External Data

Sorry to give you yet another bullet list to memorize. It isn't likely that this will be asked on the exam so it's probably not a big deal if you skip it. I'd suggest at least reading it over once however because it isn't hard to remember.

Question: what types of external data might an insurer use to augment their GLM analysis

geo-demographics (Ex: population density)
weather (Ex: annual rainfall)
property characteristics (Ex: square footage of home)
personal information about insured (Ex: credit rating)

Here's a quiz covering some miscellaneous facts from this chapter.

mini BattleQuiz 2

Exam Problems

If you've read over the examples in from this chapter (chapter 10) and the examples from Appendix F then you're ready for the old exam problems. I don't think they are very hard. They mainly rely on common sense and just a few basic facts about GLMs.

POP QUIZ ANSWERS

Criteria for Evaluating Rating Variables

Go back

Werner10.Multivariate

Contents

Pop Quiz

Study Tips

BattleTable

In Plain English!

Univariate Methods - Shortcomings

Minimum Bias Procedures

Multivariate Methods

GLMs

Mathematical Foundation

GLM Output

GLM Diagnostics

Practical Considerations

Data Mining

External Data

Exam Problems

POP QUIZ ANSWERS

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Reserving Top 5 Topics

Pricing Top 5 Topics

Miscellaneous BattleReports

Tools