The github page for the APM exercises has been updated with three new files for Chapters 6-8 (the section on regression).
The classifications section is in-progress.
Here's one of our fancy-pants graphs:
The github page for the APM exercises has been updated with three new files for Chapters 6-8 (the section on regression).
The classifications section is in-progress.
Here's one of our fancy-pants graphs:
Some recent features/changes:
GA
package.gafs
and safs
were added, along with helper functions and objects, were added. The package HTML was updated to expand more about feature selection. I'll talk more about these functions in an upcoming blog post.nearZerVar
based on code from Michael Benesty was added the old version is now called nzv
that uses less memory and can be used in parallel.sbfControl
now has a multivariate
option where all the predictors are exposed to the scoring function at once.SLC14_1
, SLC14_2
, LPH07_1
and LPH07_2
x
to train
, we now respect the class of the input value to accommodate other data types (such as sparse matrices).update.rfe
was added.Recently added models:
adabag
package, two new models were added: AdaBag
and AdaBoost.M1
.wsrf
package was added.bagFDAGCV
and bagEarthGCV
) were added that use the GCV statistic to prune the model. This leads to memory reductions during training.train
using method = "polr"
from MASS
adaptDA
packagerobustDA
package was added.binda
package was added.enpls
package) was added.plsRglm
was added.kernlab
package, SVM models using string kernels were added: svmBoundrangeString
, svmExpoString
, svmSpectrumString
ada
had a bug fix applied and the code was adapted to use the "sub-model trick" so it should train faster.Pfizer has an excellent group of librarians and they recently contacted people, including a few statisticians, about how we find and organize article. I've spent considerable time thinking about this over the years. I've wanted to start a discussion about this topic for a while since I can't believe that someone isn't doing this better. Comments here or via email are enthusiastically welcome.
For finding journal articles, I do a few different things.
RSS feeds are pretty straightforward to use. Most journals have RSS feeds of various types for journals (e.g. current issue, just accepted, articles ASAP etc.) In some cases, like PLOSone, you can create RSS feeds for specific search terms within that journal (see the examples at the bottom of this post). I haven't figured out how to filter RSS feeds based on whether the manuscript has supplemental materials (e.g. data).
RSS isn't perfect. For example, some of the ASA journals have mucked up their XML and I see a lot of repeats of articles on the same day. An edited list of what I keep tabs on is at the end of this post.
(As an aside, RSS feeds are also great for monitoring specific topics on Stack Overflow and Crossvalidated)
I have tried myriad RSS readers to aggregate and monitor my feeds. I'm currently using Feedly.
Also, this is only for content that you have identified as interesting. There could be something else out there that you have missed completely. That leads me to...
I have about 30 different alerts. Some are related to general topics (e.g. ["training set" "test set" -microarray -SNP -QSAR -proteom -RNA -biomarker -biomarkers]
) and others look for anything citing specific manuscripts (e.g. [Documents citing "The design and analysis of benchmark experiments"]
). See this page for examples of how to create effective alerts. There are other uses for alerts too.
Alerts are very effective. I usually get emails with the alerts in batches of 20 or so at a time. I haven't quite figured out what the trigger is; in some cases I get two batches in a single day.
One thing I would put on the wish list is to so some sort of smart aggregation. If have alerts for [ "simulated annealing" "feature selection" ]; Articles excluding patents
and [ "genetic algorithm" "feature selection" ]; Articles excluding patents
, this results in abundant redundancy since many feature selection articles mention both search algorithms.
Keep in mind that the alerts may not be new articles but items that are new to Google. This isn't really an issue for me but it is worth mentioning.
I love Google Scholar. Search on a title and you always be able to find the manuscript, links to different sources for obtaining it, plus a list of articles which reference it. Subject-based search are just as effective.
(Our librarians were surprised to find that we could get access to articles that our institution did not have licenses for via Google. For example, the scholar page for an article will list multiple versions of the reference. Some of these may correspond to the home page of one of the authors where he/she has a local copy of the PDF)
Google has good tips on searching. This presentation is excellent with some tricks that I didn't know.
So once I've found articles, how do I manage them?
Papers... I have equal parts love and hate for this program. I'll list the pros and cons below. I should say that I have been using this since the original version and have become increasingly frustrated . I'm not using the most recent version and I have tried a lot of different alternatives (e.g. Mendeley, BibDesk, Bookends, Endnote, Sente, Zotero). Unfortunately, for someone with thousands of PDFs, Papers (version 2) has some features that the others haven't mastered yet. I would love to move away from Papers.
What is good:
Open In Papers
bookmarks for most browsers. Once you find a journal article, use this link to start Papers and open the link. Often, the application automatically reads the citation information from the webpage and imports it. Clicking on the PDF link within the article's web page imports that file. The bad news
The last two issues have driven me crazy. I don't see myself upgrading any time soon.
I use LaTeX for almost all articles that I write. It is a pain when working with others who have never used it (or heard of it) but it is worth it. Also, the power you get when using LaTeX with Sweave or knitr simply cannot be underestimated. Apart from exporting bibtex from Papers, the other tools I use are:
I gave a talk at ENAR last year related to this. We've since moved the book version control to github and have translated all of our Sweave
code to knitr
.
In no particular order: