Friedland03.Data
Reading: Friedland, J.F., Estimating Unpaid Claims Using Basic Techniques, Casualty Actuarial Society, Third Version, July 2010. The Appendices are excluded.
Chapter 3: Understanding the Types of Data Used in the Estimation of Unpaid Claims
Contents
Pop Quiz
Alice gave you 2 mega-useful formulas in Chapter 2: The Claims Process, for total unpaid and ultimate. What are they? Click for Answer
Study Tips
VIDEO: F-03 (001) Data → 4:00 Forum
There is a lot of basic information in this chapter but aside from a few key facts to memorize, it isn't something you can fully absorb on your first pass. You are probably familiar with some of it anyway from your work duties. So don't spend too long here – you can always refer back when necessary. The most important set of facts you have to memorize is:
- advantages/disadvantages of various data aggregation methods (CY, AY, RY, PY)
There are a lot of BattleCards in the quizzes but much of it is pretty easy and won't take a tremendous amount of time to memorize. Remember: you have to get through these first 6 chapters quickly because the main reserving material is in chapters 7-17.
Estimated study time: ½ day (not including subsequent review time)
BattleTable
Based on past exams, the main things you need to know (in rough order of importance) are:
- data aggregation - advantages/disadvantages of CY/AY/PY/RY aggregation
- homogeneity & credibility - considerations for combining data
- large claims threshold - selection considerations
reference part (a) part (b) part (c) part (d) E (2016.Fall #17) data for analysis:
- compare strategiesE (2016.Spring #15) Friedland05.Triangles data aggregation:
- is CY appropriate?data aggregation:
- is AY appropriate?Friedland07.Development E (2015.Fall #15) large claim thresold:
- selection considerationsFriedland09.BornFerg E (2015.Fall #16) Friedland05.Triangles data aggregation:
- RY versus AY
Full BattleQuiz You must be logged in or this will not work.
In Plain English!
Source of Data
This short section is just memorization of the facts below. You sometimes need to draw on this background information to answer certain essay-style questions.
Question: do large insurers normally use their own internal data or rely on external/industry data
- large insurers prefer to use their own data as it's more relevant to their own experience
- large insurers might still use external data as benchmarks in relation to their own data
Question: for what purposes might external benchmarks be particularly useful to an insurer [Hint: TET]
- Tail factors
- Expected claims (loss) ratios
- (used in some reserving methods:
- → Chapter 8: Expected Claims Method
- → Chapter 9: Bornhuetter-Ferguson Method
- → Chapter 10: Cape Cod Method)
- Trends (severity & frequency of claims)
Question: in what situations might an insurer want to use external data
- small insurers without credible internal data
- a company with limitations on what data their systems can provide (often the case with a small company)
- entering a new LOB (Line of Business) or geographical region where the company wouldn't have any prior data
Question: identify 1 U.S. source and 1 Canadian source of external data
- U.S.: ISO (Insurance Services Office)
- Canada: IBC (Insurance Bureau of Canada)
Question: identify potential differences in external versus internal data that may reduce the relevance of external data
- insurance product
- insurer operations
- case reserving and case settlement practices
- mix of business
mini BattleQuiz 1 You must be logged in or this will not work.
Homogeneity and Credibility
Suppose you have 5 different LOBs: A, B, C, D, E (Lines of Business), as part of your reserve analysis:
- LOB A: 20,000 claims
- LOB B: 100 claims
- LOB C: 20,000 claims
- LOB D: 100 claims
- LOB E: 100 claims
Let's also suppose that LOBs A & B are similar to each other, LOBs C & D are similar to each other, but LOB E is not similar to any of the others.
Question: what is a reasonable way to group these LOBs for a reserve analysis
The key concepts in making that decision are homogeneity and credibility. We would like to maximize both.
- → homogeneity refers to how similar claims are within a grouping (a grouping of similar claims is called a homogeneous grouping)
- → credibility refers to the statistical significance of a grouping (the more claims in a grouping, the more statistically significant it is)
So a reasonable answer to the above question is:
- group LOBs A & B:
- - LOB A is homogeneous and has enough claims to be credible (and could be analyzed on its own)
- - LOB B is homogeneous but is not credible (cannot be analyzed on its own)
- - since LOBs A & B are similar, they can be combined and the resulting grouping is credible and still homogeneous
- group LOBs C & D:
- - the reasoning is the same as for A & B
- LOB E should be supplemented with external data
- - LOB E is homogeneous but not credible (cannot be analyzed on its own)
- - LOB E is not similar to any other LOBs (so cannot be grouped with them)
- - a valid alternative is to look for credible external data that is similar to LOB E
I said above that we'd like to maximize homogeneity and credibility simultaneously but that isn't always possible. Often, increasing homogeneity of a group of claims means getting rid of some individual claims that aren't like the others. That means the group gets smaller. But a smaller group has less credibility.
Inconvenient Fact: homogeneity is inversely proportional to credibility
Homogeneity and credibility have to be balanced. The overall goal is to produce an accurate reserve analysis and you have to strike a balance in terms of how to group your data. The source text provides a list of considerations in making this determination. If you want to glance ahead, there's an example of combining data for analysis in Chapter 7: Influence of a Changing Environment
Question: identify considerations in determining data groupings for analysis (many answers are possible)
- number of claims in grouping (more claims means higher credibility, but potentially less homogeneity)
- claim development patterns (better if each group member has a similar pattern – this is discussed much more fully in Chapter 7: Development)
- severity distribution of claims
- case reserving strength
- likelihood of a claim reopening
mini BattleQuiz 2 You must be logged in or this will not work.
Types of Data Used by Actuaries
This whole section in the source text seems pretty pointless – bedtime reading at best. There was 1 exam question where you had to memorize a list of considerations regarding thresholds for large claims but other than that, the material is either obvious or covered in later chapters. Read the summary below and memorize the answers to the 3 questions.
- Claim and Claim Count Data
- - a list of types of data that actuaries use in their analysis: claims/losses, counts, claims closed with payment, claims closed without payment,...
- Claim-Related Expenses
- - you need to know how the insurer handles expenses before using the data
- - see Chapter 16: ALAE and Chapter 17: ULAE
- Multiple Currencies
- - make sure all your data is in the same currency
- Large Claims
- - large claims can distort your data and subsequent analysis
- - sometimes it's preferable to remove large claims from your basic analysis and deal with them separately
Question: identify considerations in establishing a large claims threshold
- number of claims over threshold
- size of claim relative to policy limits
- size of claim relative to reinsurance limits
- credibility of internal data regarding large claims
- availability of relevant external data
- Here's the exam problem that asked you for this list (part a only): E (2015.Fall #15)
- Recoveries
- - includes deductibles, salvage & subgration, reinsurance
- - see Chapter 14: Recoveries
- Reinsurance
- - see Chapter 14: Recoveries
Question: identify 3 possible treatments of ALAE in excess-of-loss reinsurance
- included with the claim amount in determining excess of loss coverage (most common treatment)
- not included in the coverage
- included on a pro rata basis; the ratio of the excess portion of the claim to the total claim amount determines coverage for ALAE
- Exposure Data
- - EP (Earned Premium) is a very common exposure base
- - other exposure bases are: ECYs (Earned Car-Years), payroll (common in Worker's Comp), miles driven (auto), square footage (General Liability for corporations),...
- Insurer Reporting & Understanding the Data
- - the idea here is that you must understand your data before you begin your analysis
- Verification of the Data
Question: identify some components of a data review process
- reconcile data with financial statements
- check consistency of current data against prior data
- check that the data look reasonable
- check data definitions (Ex: are counts tallied by claim or by claimant – because 1 claims could have multiple claimants)
Here's a short quiz on this material...
mini BattleQuiz 3 You must be logged in or this will not work.
Organizing the Data
We'll start with some very basic information. This would likely never be asked on the exam but it's something you should know.
Question: what are the 5 key dates for the organization of claim data [Hint: PARAV]
- Policy effective dates: beginning and ending dates of the policy term
- Accident date: when the accident or event occurred that triggered coverage
- Report date: when the claim was reported/recorded in the claim system
- Accounting date: defines a group of claims for which liability exists, often Dec 31 of the given year
- Valuation date: the date through which transactions are included in the data used by the actuary for the analysis, often Dec 31 of the given year
- The first 3 term above are completely obvious. The accounting date is often referred to as when the books close, either for the month, quarter, of year. The accounting date and valuation date are usually the same. If the books close on Dec 31, the actuary would normally do the reserve analysis using data through Dec 31 (valuation date of Dec 31).
The rest of this section is stuff you definitely have to know as it does come up on the exam.
Question: identify 4 common ways of aggregating data
- CY, AY, PY, RY
- → The terms are used very often and I don't want to write them out in full every time. They are, respectively, Calendar Year, Accident Year, Policy Year, Report Year. Note that Policy Year is also sometimes called Underwriting Year.
define & describe: CY aggregation of data
- definition: CY data includes all data with a transaction date within the given year.
- use:
- - premium and exposure data is usually aggregated by CY
- - loss data is not usually aggregated by CY, except possibly for diagnostics
- - here's a very useful formula for CYEP, Calendar Year Earned Premium,
CYEP = WP + ( UEPbeg – UEPend )
- advantages:
- - readily available
- - no future development so CY data is recent (data doesn't change after accounting date so it's available immediately)
- disadvantages:
- - no future development (generally cannot use CY loss data for a reserve analysis)
- advantages:
define & describe: AY aggregation of data
- definition: AY data includes all data with an occurrence date within the given year.
- use:
- - loss data is very commonly aggregated by AY
- advantages:
- - AY loss aggregation is the accepted norm in the U.S. and Canada (uniformity is good)
- - easy to obtain & understand
- - industry benchmarks are available
- disadvantages:
- - mismatch between AY losses and CY premiums or exposures (premiums & exposures are usually aggregated by CY)
- - AY losses may contain policies written at different price levels (because may contain policies from different PYs)
- - AY losses may contain policies written at different retention levels (because may contain policies from different PYs)
define & describe: PY aggregation of data
- definition: PY data includes all data with a policy effective date within the given year.
- use:
- - good for self-insurers because they have only 1 policy (their own)
- advantages:
- - perfect match between losses and premiums or exposures
- - easier to isolate effects of policy changes from year to year (because all policies in a PY will have the same policy characteristics)
- disadvantages:
- - extended time frame of 24 months (an accident may occur on day 1 for a policy written on Jan 1, all the way through to day 365 for a policy written on Dec 31)
- - harder to isolate effects of catastrophes or court rulings that happen on a specific calendar date (because policies from the previous and current PYs would be affected)
define & describe: RY aggregation of data
- definition: RY data includes all data with a report date within the given year.
- use:
- - good for of claims-made policies like medical malpractice or product liability (policies where coverage is triggered by the reporting of a claim versus the date of the event)
- advantages:
- - number of claims is fixed at the close of a RY (unlike an AY where the number of claims can still increase due to late-reported claims)
- - development patterns are more stable (because number of claims is fixed, and there is only IBNER, no pure IBNR)
- disadvantages:
- - only measures development on known claims (unreported claims get dumped into the subsequent RY so there's a potential lag in understanding the true extent of an insurer's liabilities)
mini BattleQuiz 4 You must be logged in or this will not work.
Full BattleQuiz You must be logged in or this will not work.
POP QUIZ ANSWERS
Alice's mega-useful formula #1: total unpaid = Case O/S + IBNR |
Alice's mega-useful formula #2: ultimate = paid + total unpaid |