In Applied Predictive Modeling, we've chosen the R programming language to illustrate techniques. Obviously, there are other programs and languages that are well-suited for predictive modeling and machine learning. We chose R for a few reasons:
- Its free and available to everyone.
- The goal of R is to "to turn ideas into software, quickly and faithfully" (as stated by John Chambers).
- R already has extensive capabilities for modeling.
- We use R in our day to day work for good reason (i.e. we eat our own dog food).
We also wanted the primary computations to be reproducible. From Buckheit and Donoho (1995):
An article about computational science in a scientific publication is not the scholarship itself, it is merely advertising of the scholarship. The actual schoarship is the complete software development environment and the complete set of instructions which generated the figures.
To this end, most chapters have a specific computing section that outlines how to use R to preform the analyses. The AppliedPredictiveModeling package also contains more extensive (and up-to-date) scripts to create the models for each chapter.
The book contains an Appendix entitled "An Introduction to R" which is included in the sample pages on Spinger's website.
To install the packages used in the book, the Table of Contents page has links for each chapter to an R script.Version 2.15.2 (2012-10-26) of R was used in conjunction with the following package versions: AppliedPredictiveModeling (1.01), arules (1.0-12), C50 (0.1.0- 013), caret (5.15-045), coin (1.0-21), CORElearn (0.9.40), corrplot (0.70), ctv (0.7-4), Cubist (0.0.12), desirability (1.05), DMwR (0.2.3), doBy (4.5-5), doMC (1.2.5), DWD (0.10), e1071 (1.6-1), earth (3.2-3), elasticnet (1.1), ellipse (0.3- 7), gbm (1.6-3.2), glmnet (1.8-2), Hmisc (3.10-1), ipred (0.9-1), kernlab (0.9-15), klaR (0.6-7), lars (1.1), latticeExtra (0.6-24), lattice (0.20-10), MASS (7.3-22), mda (0.4-2), minerva (1.2), mlbench (2.1-1), nnet (7.3-5), pamr (1.54), partykit (0.1-4), party (1.0-3), pls (2.3-0), plyr (1.7.1), pROC (1.5.4), proxy (0.4- 9), QSARdata (1.02), randomForest (4.6-7), RColorBrewer (1.0-5), reshape2 (1.2.1), reshape (0.8.4), rms (3.6-0), rpart (4.0-3), RWeka (0.4-12), sparseLDA (0.1-6), subselect (0.12-2), svmpath (0.952), tabplot (0.12). Some of these packages are not directly related to predictive modeling, but were used to compile for format the content or for visualization.