caret has been able to utilize parallel processing for some time (before it was on CRAN in October 2007) using slightly different versions of the package. Around September of 2011,
caret started using the
foreach package was used to "harmonize" the parallel processing technologies thanks to a super smart guy named Steve Weston.
I've done a few benchmarks to quantify the benefits to parallel processing over the years. I just did another set since I just got a new iMacPro. This time, I put the code and results in a GitHub repo. The main questions that I wanted to answer were:
- Is the benefit the same across computers and operating systems?
- Is there any performance difference between parallel processing technologies (or packages)?
- For my test cases, is it better to parallelize resampling loops or the individual model fit?
The task I used was to tune a boosted classification tree model (via the
xgboost) package over 25 random values of the seven tuning parameters. The data were simulated and contained 2,000 training set points and 115 numeric predictors. Five repeats of 10-fold cross-validation was used so that 1,250 separate models could be run in parallel. An example of the code is here.
I tested this using three machines:
- A 2015 MacBookPro (version 11,4) with a 2.5 GHz Intel Core i7 processor. It has 4 physical cores.
- A DIY machine built in January of 2016 using a 3.4 GHz Intel Core i7 processor (6 cores).
- A 2017 iMacPro (1,1) with a 3 GHz Intel Xeon W processor. There are 10 cores.
All the computers have >=8GB of memory. The DIY machine was running Ubuntu 16.04 as well as Windows 10. All of the builds used R 3.4.2 or higher.
The methods for parallelization were:
- forking uses the
fork system call to create new instances of the session (i.e. workers) that run parts of the computations. This is available on MacOS, unix, and linux. It is available to
foreach via the
doParallel packages. Historically, I have used
doMC for my work but both were tested here.
sockets via the
doParallel package. I don't know a whole lot about socket connections but they have been around a long time via the
snow package and can run on all major operating systems.
xgboost has the option to use
openMP to parallelize the individual model fits.
This JSS paper from 2009 might be helpful.
Once nice thing about parallelism in
caret is that you do not have to modify the package code. With
foreach, you simply "register" a parallel backend and the
caret functions automatically utilize the workers.
To be clear, the forking/socket approaches split the 1,250 model fits across the workers while the threading affects each individual model fit (one-at-a-time).
In each case, different number of workers were applied to the problems and the execution time was measured. Since these are all recent Intel processors, hyper-threading can be used. This means that, even though you have X physical cores, there are 2X "logical cores" that can perform independent operations. For that reason, we can add more workers than physical cores. The order of all runs were randomized and run in clear R sessions.
First, let's look at the non-threading results. The raw execution times were: