March 26 - 29, 2017 - Hyatt Regency - Jacksonville, Florida
17th Annual Intercompany Long Term Care Insurance Conference Navigating the Future

Pre-Workshop Homework & Resources

Start by Downloading the Software & Data Files! 1.     Download the core R software here. Chose your option based on your operating software  (Mac, Windows, Linux). 2.     Go to the RStudio download page (link here) and select the ‘Installers for Supported  Platforms’ which matches your own computer’s operating system. 3.     Open up the ‘RStudio’ application using the new icon on your desktop. 4.    Download the Workshop Data Zip File and this R code data file zip. #1.   Materials   prepared   by   Eileen   Burns   and   Matthias   Kullowatz   of   Milliman   for   the   Practical   Predictive   Analytics   May   2016   Seminar   that   was produce by the SOA Predictive Analytics and Futurism Section https://www.rstudio.com/products/rstudio/download/ PPAS_IntrotoR_20160415.pdf   iris2.csv    –   This   file   is   used   to   demonstrate   table   joining   functionality   in   R,   download   it   to   the   working   directory   you   intend   to   use   for   the introductory exercises. https://cran.r-project.org/doc/contrib/Short-refcard.pdf   https://www.rstudio.com/resources/cheatsheets    (Data Wrangling, in particular) Objective:  Install RStudio and be introduced to R coding Time estimate: 2-8 hours, depending on programming background #2.   An   Introduction   to   Statistical   Learning,   Chapter   2.   Statistical   Learning,   key   sections:   2.1   What   is   statistical   learning?   and   2.2   Assessing   model accuracy (through 2.2.2) http://www-bcf.usc.edu/~gareth/ISL/   https://www.r-bloggers.com/in-depth-introduction-to-machine-learning-in-15-hours-of-expert-videos/   (video lecture version of the textbook) Objective:  high-level lay of land and opportunity to spark interest to dig deeper in text Time estimate:  60-75 minutes (21 pages) or 30-35 minutes of video #3. SOA Long Term Care Experience Basic Table Development, Appendix B. Generalized Linear Modeling Technical Background https://www.soa.org/Files/Research/Exp-Study/2015-ltc-exp-basic-table-report.pdf   Objective:  short, summary description of GLM Time estimate:  <5 minutes (1 page) #4.   Survival   Models   by   Rodríguez,   G.   (2007),   key   sections:   7.1   The   hazard   and   survival   functions;   7.1.1   The   survival   function;   7.1.2   The   hazard function; 7.3.7 Model fitting; 7.4.3 The equivalent Poisson model; 7.4.4 Time-varying covariates; and 7.4.5 Time-dependent effects http://data.princeton.edu/wws509/notes/c7.pdf   These are lecture notes from Chapter 7 of Princeton University’s Generalized Linear Models course ( http://data.princeton.edu/wws509/notes/ ) Objective:      GLM   Poisson   survival   model   is   equivalent   to   Cox,   but   has   the   benefit   of   using   aggregated   data   along   with   introducing   partial exposures.  Provides the math and proofs. Time estimate:  30-45 minutes (11 pages) #5.   Non-Parametric   Estimation   in   Survival   Models   by   Rodríguez,   G.   (2005),   key   sections:   1   One   sample:   Kaplan-Meir;   1.1   Estimation   with   censored data; 1.2 Non-parametric maximum likelihood; and 1.4 The Nelson-Aalen estimator http://data.princeton.edu/pop509/NonParametricSurvival.pdf   These are lecture notes from Section 2 of Princeton University’s Survival Analysis course ( http://data.princeton.edu/pop509 ) Objective:      Refresher   of   survival   modeling   to   lay   the   foundation   for   making   the   bridge   between   using   traditional   methods   and   more   robust statistical learning methods Time estimate:  10-15 minutes (4 pages) #6. Bias-variance Tradeoff podcast by SOA Predictive Analytics and Futurism Section https://www.soa.org/prof-dev/podcasts/predictive-analytics-podcasts/   Objective:  Introduce this key concept of predictive analytics Time estimate:  12 minutes (podcast) #7.   The   Elements   of   Statistical   Learning,   Chapter   7.   Model   Assessment   and   Selection,   key   sections:      7.1   Introduction;   7.2   Bias,   variance,   and   model complexity; 7.5 Estimates of in-sample prediction error; and 7.10 Cross-validation http://statweb.stanford.edu/~tibs/ElemStatLearn/   Objective:      Reiterates   bias-variance   and   model   complexity   and   introduces   splitting   data   into   calibration,   validation,   and   testing   data   sets.     Discuss   using   in-sample   measurements   of   fit   (AIC   BIC)   and   then   move   to   using   cross   validation   techniques.      The   latter   becoming   common practice in statistical learning due the use of large datasets and/or advancements of computational power. Time estimate:  45 minutes (14 pages) #8. Cross validation and bootstrapping podcast by SOA Predictive Analytics and Futurism Section https://www.soa.org/prof-dev/podcasts/predictive-analytics-podcasts/   Objective:      Audio   version   to   supplement   portion   of   the   reading   above,   along   with   discussion   of   second   resampling   techniques   that   aid   in training predictive models.  Focuses on how to use these techniques to train models that generalize well to new data. Time estimate:  24 minutes (podcast) #9. Penalized Regression podcast by SOA Predictive Analytics and Futurism Section https://www.soa.org/prof-dev/podcasts/predictive-analytics-podcasts/   Objective:  Introduce penalized regression Time estimate:  16 minutes (podcast) #10.   A   discussion   on   credibility   and   penalized   regression,   with   implications   for   actuarial   work   by   Hugh   Miller   presented   to   the   Actuaries   Institute, key sections:  1. Background and 2. Credibility and penalized regression http://actuaries.asn.au/Library/Events/ASTINAFIRERMColloquium/2015/MillerCredibiliyPaper.pdf   Objective:  Connects penalized regression with “traditional” credibility methods Time estimate:  15-30 minutes (8 pages) #11. Calibrating Risk Score: Model with Partial Credibility by Shea Parkes and Brad Armstrong https://www.soa.org/Library/Newsletters/Forecasting-Futurism/2015/July/ffn-2015-iss11-parkes-armstrong.aspx   Objective:  Application of using penalized regression with offset to update an existing assumption Time estimate:  10-15 minutes (3 pages) #12   (Optional).   Applications   of   the   offset   in   property-casualty   predictive   modeling   from   Casualty   Actuarial   Society   E-Forum   Winter   2009,   key sections starting on:  page 370 Exposure adjustments and the offset and page 376 Sequential modeling https://www.casact.org/pubs/forum/09wforum/yan_et_al.pdf   Objective:  Provides deeper theory of using an offset in a multiplicative model to update an existing assumption Time estimate:  10-15 minutes (4 pages) Extra credit Advanced or post-seminar resources Materials   prepared   by   Eileen   Burns   and   Matthias   Kullowatz   of   Milliman   for   the   Practical   Predictive   Analytics   May   2016   Seminar   that   was   produce   by the   SOA   Predictive   Analytics   and   Futurism   Section.      These   documents   walk   through   R   code   for   a   mini   predictive   modeling   example.      This   example   is unrelated to what will be performed at the LTC workshop, but gives a framework to explore. http://ppas-2016.s3-website-us-east-1.amazonaws.com/20160516/PPAS_Practical_20160516_MAMK.pdf   http://ppas-2016.s3-website-us-east-1.amazonaws.com/20160516/PPAS_DataPrep_20160513.pdf   http://ppas-2016.s3-website-us-east-1.amazonaws.com/20160516/PPAS_Modeling_Validation_20160513.pdf   http://ppas-2016.s3-website-us-east-1.amazonaws.com/index.html  (three datasets for the examples) Generalized additive models (GAM) http://multithreaded.stitchfix.com/blog/2015/07/30/gam/   SOA Predictive Analytics and Futurism Section podcasts: Decision trees Random forests and gradient boosting machines https://www.soa.org/prof-dev/podcasts/predictive-analytics-podcasts/   Deep Learning https://en.wikipedia.org/wiki/Deep_learning   Applied machine learning guide http://machinelearningmastery.com/start-here/   Info on scaling R http://blog.revolutionanalytics.com/2016/10/tutorial-scalable-r-on-spark.html   SOA Predictive Analytics and Futurism Section Newsletter, December 2015 Includes    articles    (1)    Getting    Started    in    Predictive    Analytics:    Books    and    Courses    by    Mary    Pat    Campbell    and    (2)    Johns    Hopkins    Data    Science Specialization courses: A review by Shea Parkes https://www.soa.org/Library/Newsletters/Predictive-Analytics-and-Futurism/2015/december/paf-iss12.pdf   John Hopkins Data Science Specialization courses reviewed in the above-referenced article https://www.coursera.org/specializations/jhu-data-science   $29-49 per course; 10 courses available Blogs and newsletters http://dataelixir.com/   https://www.r-bloggers.com   http://www.statsblogs.com/   http://www.win-vector.com/blog/   Practice https://www.kaggle.com/   Additional languages http://www.sas.com/en_us/home.html   https://www.python.org/   http://julialang.org/downloads/   https://www.mathworks.com/products/matlab/?requestedDomain=www.mathworks.com   http://mc-stan.org/   https://www.ruby-lang.org/en/