4 R and Reproducible Analysis
Can everything be re-done easily if I change one data point in the inputs?
At EHA R is our primary, though not exclusive, tool for analysis and modeling work. R is not just a piece of software for statistics and data manipulation but a computer language, meaning that our analyses are scripted. This means they thus can be automated, run again, built upon and extended.
4.2 Learn
Learning R is beyond the scope of this document, and you likely already have some experience in it, but some good starting points are:
- The Software Carpentry Lessons
- Swirl, a set of interactive lessons run right in R.
- The JHU Coursera Series
- R for Data Science by Hadley Wickam is a beginner/intermediate text that we highly recommend for getting up to speed with the particular workflows we recommend and the most recent packages that support them.
- Advanced R (Wickham) is very good for understanding how the language works.
- Efficient R by Colin Gillespie and Robin Lovelace is helpful for imporving workflows and speeding up code.
- R Packages (Wickham) is good for package development.
- Cheatsheets from RStudio are a useful references for a number of things.
Dataquest courses are also potentially useful. If you feel they would match your learning style and needs, discuss EHA purchasing a subscription for you with your supervisor.
These resources are largely about the mechanics of programming in R, rather than using it for statistical analyses. This is a far larger subject, but see the Statistical Methods section for a jumping-off point.
4.3 Additional Resources
Trouble shooting your code: Getting Help.
User groups/communities of practice: R Meetups
Specific domains: Training Plans