Difference between revisions of "WernerF.Multivariate"
(→Study Tips) |
(→Study Tips) |
||
Line 13: | Line 13: | ||
These examples are easy to understand and all you really have to do is read them over to get a feel for how these kinds of graphs are analyzed. There are just a few little things you have to memorize regarding the method of analysis ''(see [[WernerF.Multivariate#Intro | Intro]])'' but that should come naturally once you've covered the examples. | These examples are easy to understand and all you really have to do is read them over to get a feel for how these kinds of graphs are analyzed. There are just a few little things you have to memorize regarding the method of analysis ''(see [[WernerF.Multivariate#Intro | Intro]])'' but that should come naturally once you've covered the examples. | ||
− | '''Estimated study time''': ½ day for | + | '''Estimated study time''': ½ day for this appendix and 1 day total on all multivariate material if you include ''[[Werner10.Multivariate | Pricing - Chapter 10 - Multivariate Classification]]'' |
==BattleTable== | ==BattleTable== |
Revision as of 16:56, 25 December 2020
Reading: BASIC RATEMAKING, Fifth Edition, May 2016, Geoff Werner, FCAS, MAAA & Claudine Modlin, FCAS, MAAA Willis Towers Watson
Appendix F: Multivariate Classification Example
Contents
Pop Quiz
No pop quiz today. Alice has a hangover. Click for further instructions
Study Tips
This appendix contains further examples of interpreting graphs of GLM results. The 4 examples presented in the text are well explained, and I used these same examples for the wiki article, but slightly reorganized and with a few additional comments.
These examples are easy to understand and all you really have to do is read them over to get a feel for how these kinds of graphs are analyzed. There are just a few little things you have to memorize regarding the method of analysis (see Intro) but that should come naturally once you've covered the examples.
Estimated study time: ½ day for this appendix and 1 day total on all multivariate material if you include Pricing - Chapter 10 - Multivariate Classification
BattleTable
Based on past exams, the main things you need to know (in rough order of importance) are:
- the 5 steps in analyzing a graph of GLM results - determining whether a rating variable is a significant predictor
- the 2 ways of validating model results - fitted vs actual and holdout samples
This exam problem is a review of univariate methods from Pricing - Chapter 9 - Risk Classification.
reference part (a) part (b) part (c) part (d) E (2017.Fall #10) fitted vs actual
- explain differencePP method (adjusted)
- calculate relativities
In Plain English!
Intro
Quickly scan the bullet points below before looking at the examples. These bullet points are a handy summary.
Examples A and B analyze graphs of GLM results to determine whether the given rating variable is predictive. Both follow the same procedure:
- Parameters and Standard Errors Test
- - check that the line on the graph has a non-zero slope (indicates predictive value)
- - check that the standard error forms a relatively narrow range around the fitted line (usually 2 standard errors)
- Consistency Test
- - check consistency of GLM results using individual years or random groupings instead of all data combined
- (consistency means the graphs for individual years or random groupings are similar)
- - consistency provides additional evidence of a variable's predictive value
- - check consistency of GLM results using individual years or random groupings instead of all data combined
- Statistical Test (or Deviance Test)
- - the chi-square test is commonly used
- - this tests whether the null hypothesis (no predictive value) should be rejected
- - if chi-square < 5% then reject null hypothesis (variable has predictive value)
- Judgment
- - check for reasonableness
- Decision
- - state conclusion based on the 4 steps performed above
Example C is for model validation. (The procedure is different than for examples A & B)
- Fitted vs Actual: using entire data set
- - the fitted data should be close to the actual data
- - any large or systematic differences should be investigated
- - if the fit is "too" good, you may be modeling noise, and the results may not be valid for other data sets
- - if the fit is poor, you may be omitting significant predictor variables
Examples D is for model validation.
- Fitted vs Actual: using a holdout sample
- - before doing the GLM analysis, set aside a subset of your data (holdout sample) to be used for testing
- - apply GLM results to holdout sample (fitted results should be close to actual results from holdout sample)
- - if you don't set aside a holdout sample, you are in danger of overfitting your model
- → you may be modeling noise (versus signal) and you won't know whether your model works for data sets different from your modeling data set
- - if you set aside a holdout sample that is too large, you are in danger of underfitting your model because the data set remaining is too small and may omit significant variables
Example A: Predictive Variable
This is the first example from Appendix F. You're told:
- The first graph is sample output from a multiplicative GLM fit to homeowners water damage frequency data. It shows the indicated relativities for the whole dataset. The graphical output isolates the effect of the prior claim history variable as a significant predictor of water damage frequency, though the model contains other explanatory variables that must be considered in conjunction with the prior claims history effect.
- The second graph shows the pattern of relativities for each of the individual years included in the analysis. This graph is for the consistency test
The goal is to verify that prior claim history is indeed a significant predictor. You have to go through the 5 steps outlined in the Intro in the previous section. I've provided a link to the text solution below but you should give it a try before looking at the solution.
Example B: Unpredictive Variable
This is the first example from Appendix F. You're told:
- The first graph is sample output from a multiplicative GLM fit to homeowners wind damage frequency data. It shows the indicated relativities for the whole dataset. The output isolates the effect of fire safety devices as an insignificant predictor of wind damage frequency, though the model contains other explanatory variables that must be considered in conjunction with this variable.
- The second graph shows the pattern of relativities for each of the individual years included in the analysis. This graph is for the consistency test
The goal is to verify that fire safety devices is not a significant predictor. You have to go through the 5 steps outlined in the Intro in the previous section. I've provided a link to the text solution below but you should give it a try before looking at the solution.
Example C: Overall Model Validation - Fitted vs Actual
This example is very simple. All you have to notice is that for LOW values of Amount of Insurance, the fitted/modeled values are slightly high. It's still pretty close though. For MEDIUM values, the model is slightly low but again pretty close. But for HIGH values of AOI, the fitted/modeled values vary significantly from the actual values. This would require further investigation but it could be due simply to fewer policies (hence lower credibility) at very high levels of AOI. Lower credibility data often contains greater volatility.
Example D: Overall Model Validation - Holdout Sample
The actual results are very close to the modeled results for the first 7 levels of the rating variable shown on the bottom axis. The fit isn't as good for the last 3 levels but that could simply be due to low volume in those groups (more noise, higher volatility)
POP QUIZ ANSWERS
Further Instructions: Please bring Alice some aspirin. ASAP!!!