Cross-validation pitfalls when selecting and assessing regression and classification models

Damjan Krstajic and friends have a great paper on pitfalls of cross-validation. Although the paper uses chemistry data, the meat of the article is broadly applicable. It does a great job of illustrating different resampling approaches and I learned more about double and nested cross-validation.

Figure 10 surprised me; I assumed that the precision in resampled estimates is mostly driven by the number of resamples. For example, a resampled estimate of the RMSE using 64 resamples has a standard error of sd/8 which is twice as good as one using 16 resamples (i.e. sd/4). In their work, the variation in 50 repeats of 10-fold CV are much better than 50 repeats of 10-fold nested CV.

Finally, the article has an excellent historical summary of the pivotal papers on this subject and does a great job of labeling and articulating the different goals that one might have when resampling predictive models.