March 26 - 29, 2017 - Hyatt Regency - Jacksonville, Florida
17th Annual Intercompany
Long Term Care Insurance Conference
Navigating the Future
Pre-Workshop Homework & Resources
Start by Downloading the Software & Data Files!
1. Download the core R software here. Chose your option based on your operating software
(Mac, Windows, Linux).
2. Go to the RStudio download page (link here) and select the ‘Installers for Supported
Platforms’ which matches your own computer’s operating system.
3. Open up the ‘RStudio’ application using the new icon on your desktop.
4. Download the Workshop Data Zip File and this R code data file zip.
#1.
Materials
prepared
by
Eileen
Burns
and
Matthias
Kullowatz
of
Milliman
for
the
Practical
Predictive
Analytics
May
2016
Seminar
that
was
produce by the SOA Predictive Analytics and Futurism Section
https://www.rstudio.com/products/rstudio/download/
PPAS_IntrotoR_20160415.pdf
iris2.csv
–
This
file
is
used
to
demonstrate
table
joining
functionality
in
R,
download
it
to
the
working
directory
you
intend
to
use
for
the
introductory exercises.
https://cran.r-project.org/doc/contrib/Short-refcard.pdf
https://www.rstudio.com/resources/cheatsheets
(Data Wrangling, in particular)
•
Objective: Install RStudio and be introduced to R coding
•
Time estimate: 2-8 hours, depending on programming background
#2.
An
Introduction
to
Statistical
Learning,
Chapter
2.
Statistical
Learning,
key
sections:
2.1
What
is
statistical
learning?
and
2.2
Assessing
model
accuracy (through 2.2.2)
http://www-bcf.usc.edu/~gareth/ISL/
https://www.r-bloggers.com/in-depth-introduction-to-machine-learning-in-15-hours-of-expert-videos/
(video lecture version of the textbook)
•
Objective: high-level lay of land and opportunity to spark interest to dig deeper in text
•
Time estimate: 60-75 minutes (21 pages) or 30-35 minutes of video
#3. SOA Long Term Care Experience Basic Table Development, Appendix B. Generalized Linear Modeling Technical Background
https://www.soa.org/Files/Research/Exp-Study/2015-ltc-exp-basic-table-report.pdf
•
Objective: short, summary description of GLM
•
Time estimate: <5 minutes (1 page)
#4.
Survival
Models
by
Rodríguez,
G.
(2007),
key
sections:
7.1
The
hazard
and
survival
functions;
7.1.1
The
survival
function;
7.1.2
The
hazard
function; 7.3.7 Model fitting; 7.4.3 The equivalent Poisson model; 7.4.4 Time-varying covariates; and 7.4.5 Time-dependent effects
http://data.princeton.edu/wws509/notes/c7.pdf
These are lecture notes from Chapter 7 of Princeton University’s Generalized Linear Models course (
http://data.princeton.edu/wws509/notes/
)
•
Objective:
GLM
Poisson
survival
model
is
equivalent
to
Cox,
but
has
the
benefit
of
using
aggregated
data
along
with
introducing
partial
exposures. Provides the math and proofs.
•
Time estimate: 30-45 minutes (11 pages)
#5.
Non-Parametric
Estimation
in
Survival
Models
by
Rodríguez,
G.
(2005),
key
sections:
1
One
sample:
Kaplan-Meir;
1.1
Estimation
with
censored
data; 1.2 Non-parametric maximum likelihood; and 1.4 The Nelson-Aalen estimator
http://data.princeton.edu/pop509/NonParametricSurvival.pdf
These are lecture notes from Section 2 of Princeton University’s Survival Analysis course (
http://data.princeton.edu/pop509
)
•
Objective:
Refresher
of
survival
modeling
to
lay
the
foundation
for
making
the
bridge
between
using
traditional
methods
and
more
robust
statistical learning methods
•
Time estimate: 10-15 minutes (4 pages)
#6. Bias-variance Tradeoff podcast by SOA Predictive Analytics and Futurism Section
https://www.soa.org/prof-dev/podcasts/predictive-analytics-podcasts/
•
Objective: Introduce this key concept of predictive analytics
•
Time estimate: 12 minutes (podcast)
#7.
The
Elements
of
Statistical
Learning,
Chapter
7.
Model
Assessment
and
Selection,
key
sections:
7.1
Introduction;
7.2
Bias,
variance,
and
model
complexity; 7.5 Estimates of in-sample prediction error; and 7.10 Cross-validation
http://statweb.stanford.edu/~tibs/ElemStatLearn/
•
Objective:
Reiterates
bias-variance
and
model
complexity
and
introduces
splitting
data
into
calibration,
validation,
and
testing
data
sets.
Discuss
using
in-sample
measurements
of
fit
(AIC
BIC)
and
then
move
to
using
cross
validation
techniques.
The
latter
becoming
common
practice in statistical learning due the use of large datasets and/or advancements of computational power.
•
Time estimate: 45 minutes (14 pages)
#8. Cross validation and bootstrapping podcast by SOA Predictive Analytics and Futurism Section
https://www.soa.org/prof-dev/podcasts/predictive-analytics-podcasts/
•
Objective:
Audio
version
to
supplement
portion
of
the
reading
above,
along
with
discussion
of
second
resampling
techniques
that
aid
in
training predictive models. Focuses on how to use these techniques to train models that generalize well to new data.
•
Time estimate: 24 minutes (podcast)
#9. Penalized Regression podcast by SOA Predictive Analytics and Futurism Section
https://www.soa.org/prof-dev/podcasts/predictive-analytics-podcasts/
•
Objective: Introduce penalized regression
•
Time estimate: 16 minutes (podcast)
#10.
A
discussion
on
credibility
and
penalized
regression,
with
implications
for
actuarial
work
by
Hugh
Miller
presented
to
the
Actuaries
Institute,
key sections: 1. Background and 2. Credibility and penalized regression
http://actuaries.asn.au/Library/Events/ASTINAFIRERMColloquium/2015/MillerCredibiliyPaper.pdf
•
Objective: Connects penalized regression with “traditional” credibility methods
•
Time estimate: 15-30 minutes (8 pages)
#11. Calibrating Risk Score: Model with Partial Credibility by Shea Parkes and Brad Armstrong
https://www.soa.org/Library/Newsletters/Forecasting-Futurism/2015/July/ffn-2015-iss11-parkes-armstrong.aspx
•
Objective: Application of using penalized regression with offset to update an existing assumption
•
Time estimate: 10-15 minutes (3 pages)
#12
(Optional).
Applications
of
the
offset
in
property-casualty
predictive
modeling
from
Casualty
Actuarial
Society
E-Forum
Winter
2009,
key
sections starting on: page 370 Exposure adjustments and the offset and page 376 Sequential modeling
https://www.casact.org/pubs/forum/09wforum/yan_et_al.pdf
•
Objective: Provides deeper theory of using an offset in a multiplicative model to update an existing assumption
•
Time estimate: 10-15 minutes (4 pages)
Extra credit
Advanced or post-seminar resources
Materials
prepared
by
Eileen
Burns
and
Matthias
Kullowatz
of
Milliman
for
the
Practical
Predictive
Analytics
May
2016
Seminar
that
was
produce
by
the
SOA
Predictive
Analytics
and
Futurism
Section.
These
documents
walk
through
R
code
for
a
mini
predictive
modeling
example.
This
example
is
unrelated to what will be performed at the LTC workshop, but gives a framework to explore.
http://ppas-2016.s3-website-us-east-1.amazonaws.com/20160516/PPAS_Practical_20160516_MAMK.pdf
http://ppas-2016.s3-website-us-east-1.amazonaws.com/20160516/PPAS_DataPrep_20160513.pdf
http://ppas-2016.s3-website-us-east-1.amazonaws.com/20160516/PPAS_Modeling_Validation_20160513.pdf
http://ppas-2016.s3-website-us-east-1.amazonaws.com/index.html
(three datasets for the examples)
Generalized additive models (GAM)
http://multithreaded.stitchfix.com/blog/2015/07/30/gam/
SOA Predictive Analytics and Futurism Section podcasts:
Decision trees
Random forests and gradient boosting machines
https://www.soa.org/prof-dev/podcasts/predictive-analytics-podcasts/
Deep Learning
https://en.wikipedia.org/wiki/Deep_learning
Applied machine learning guide
http://machinelearningmastery.com/start-here/
Info on scaling R
http://blog.revolutionanalytics.com/2016/10/tutorial-scalable-r-on-spark.html
SOA Predictive Analytics and Futurism Section Newsletter, December 2015
Includes
articles
(1)
Getting
Started
in
Predictive
Analytics:
Books
and
Courses
by
Mary
Pat
Campbell
and
(2)
Johns
Hopkins
Data
Science
Specialization courses: A review by Shea Parkes
https://www.soa.org/Library/Newsletters/Predictive-Analytics-and-Futurism/2015/december/paf-iss12.pdf
John Hopkins Data Science Specialization courses reviewed in the above-referenced article
https://www.coursera.org/specializations/jhu-data-science
$29-49 per course; 10 courses available
Blogs and newsletters
http://dataelixir.com/
https://www.r-bloggers.com
http://www.statsblogs.com/
http://www.win-vector.com/blog/
Practice
https://www.kaggle.com/
Additional languages
http://www.sas.com/en_us/home.html
https://www.python.org/
http://julialang.org/downloads/
https://www.mathworks.com/products/matlab/?requestedDomain=www.mathworks.com
http://mc-stan.org/
https://www.ruby-lang.org/en/