class: right, bottom, title-slide # Making your R project future-proof with
renv
### Noam Ross ### EcoHealth Methods and Models Meeting, 2021-09-08 --- # Goals of today * A brief (re)(re)introduction to reproducibility * Adding `renv` to your project in a few easy steps * Parts and maintenance of an `renv`-enabled project * A few other approaches to capturing computing environments --- # Parts of (Computational) Reproducibility (and related tools) * _Workflow_: Re-creating the same set of steps done originally (`targets`) <-- SEE [LAST WEEK's TALK](https://airtable.com/appwlxIzmQx5njRtQ/tbledVCO9MRKkK9MW/viwd5Kt2QVw7lyAKb/recNumz1XdgbOtuaQ?blocks=hide) * _Environment_: Computing with the same software (and hardware) (renv and Docker) <-- THIS WEEK * _Versioning_: The above with different versions for the project (git, GitHub, database snapshots) * _Testing_: Automatically checking that your code is working as expected (`testthat`, `assertr`, `validate`, continuous integration) * _Access_: Having access to the source data and resources (repositories, databases) * _Documentation_: Instructions for all of the above (markdown, roxygen) --- # `renv` captures your analysis _environment_ by freezing your R packages * R packages update frequently * Packages installed on your computer may be different versions than on collaborators' * You likely used different versions of packages at different times on different projects * `renv` fixes the package versions at the _project_ level, rather than the _computer_ level. --- # `renv` in a few easy steps: 1. `install.packages('renv')` 2. Run `renv::init()` from R in your project directory 3. Restart R 4. Re-install packages with `renv::hydrate()` 5. `git commit` your new `.Rprofile`, `renv.lock`, files and the `renv` directory --- # What did we add? 1. `.Rprofile` runs every time you start R, it sets the package library for the project. 2. It actually runs the `renv/activate.R` file 3. `renv.lock` records all the packages used in the project, their sources, and your R version. 4. `renv` contains your local project library (ignored by git), the script run by `.Rprofile`, and project settings. _`targets` doesn't actually keep copies of packages in every project, but shares a computer-wide cache of packages that reduces disk usage and speed up installations._ --- # What to do as you work? - Run `renv::snapshot()` regularly to update your `renv.lock` file. - Keeping your `packages.R` file current helps `snapshot()` detect everything. - Commit your `renv.lock` file when it changes. --- # What else to do? - You can set some `renv` options in your `.Rprofile`. - `options(renv.config.auto.snapshot = TRUE)` tries to update the lockfile when detects new packages. - More options in `?renv::config` - If you have your own `.Rprofile` you use for interactive tools, you can add some code to load that conditionally: ```r user_rprof <- Sys.getenv("R_PROFILE_USER", normalizePath("~/.Rprofile", mustWork = FALSE)) if(interactive() && file.exists(user_rprof)) { source(user_rprof) } ``` - If you really dislike re-installing packages for new projects, consider the [`capsule`](https://github.com/milesmcbain/capsule) package, which lets you switch between using system-wide packages and your project `renv` library. --- # Run, don't walk - Unlike `targets`, switching your project to use `renv` requires no other code refactoring and only takes a few minutes - Do this _right now_ for any of your active projects (unless they are R packages) - This should work fine with RStudio Connect, too. - Let us know in `#data-sci-discuss` if you run into any snafus! --- # Sharing individual scripts with `use()` and `embed()` - If you need to share an individual script for reuse, whether from an `renv` project or not, run `renv::embed('my_script.R')` - This will add a call to `renv::use()` to the top of the script. It will automatically install the relevant packages when the script is run. - You may comment this out if it is a pain, but keep it as a record. --- # Sharing individual scripts with `use()` and `embed()` ```r renv::embed("gam.R") ``` ```r renv::use( "Matrix@1.3-4", "lattice@0.20-44", "mgcv@1.8-36", "nlme@3.1-153", "renv@0.14.0" ) library(mgcv) set.seed(2) ## simulate some data... dat <- gamSim(1,n=400,dist="normal",scale=2) b <- gam(y~s(x0)+s(x1)+s(x2)+s(x3),data=dat) ``` --- # Stuff besides R packages for capturing the environment. * _Python_: If your project has python scripts, too. `renv` [has tools to handle that](https://rstudio.github.io/renv/articles/python.html). * Other computer programs require a more elaborate approach. [Docker](https://www.rocker-project.org/) is a tool to define all the software in an operating system, and we can support this as needed. * Proprietary or GUI tools are very difficult to capture and should be elaborately documented.