class: right, bottom, title-slide # Using
targets
for reproducible workflows ### Noam Ross ### EcoHealth Methods and Models Meeting, 2021-09-01 --- # Goals of today * A brief (re)introduction to reproducibility * The `targets` package as part of toolbox for analysis projects at EHA * Walkthrough of a basic project using `targets` * Resources for in-depth learning --- # Why Reproducibility? * Not so much so that others can re-create, but so others can _build on to of_ your work: > "We thank Olival and colleagues for releasing comprehensive code for their reproducible analyses, allowing us to adapt it for our own purposes and easily compare results." [Shaw et. al. 2020](https://doi.org/10.1111/mec.15463) * To reduce the effort needed for revisions, and re-activating the project after a period of dormancy. * To speed up the process of adopting the project for new applications. --- # Parts of (Computational) Reproducibility (and related tools) * _Workflow_: Re-creating the same set of steps done originally (`targets`) * _Environment_: Computing with the same software (and hardware) (renv and Docker) * _Versioning_: The above with different versions for the project (git, GitHub, database snapshots) * _Testing_: Automatically checking that your code is working as expected (`testthat`, `assertr`, `validate`, continuous integration) * _Access_: Having access to the source data and resources (repositories, databases) * _Documentation_: Instructions for all of the above (markdown, roxygen) --- # `targets` mostly supports _workflow_, which helps with other things * `targets` repositories generally have a common structure, making them easier to document and share * With good coding practices, _some_ "self-documentation" occurs in the steps of your analysis. * Some of your environment (packages used) is captured in a `targets` set up. * Testing can be applied to objects produced in a `targets` workflow --- # Leaving slides for a walkthrough of a targets repository --- # Model output from a target ```r tar_load(my_model) summary(my_model) ``` ``` ## ## Call: ## lm(formula = x ~ 0 + y, data = random_data) ## ## Residuals: ## Min 1Q Median 3Q Max ## -1.6354 -0.2412 0.3342 0.9880 2.7726 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## y 5.011e-01 2.478e-05 20222 <2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 0.9035 on 999 degrees of freedom ## Multiple R-squared: 1, Adjusted R-squared: 1 ## F-statistic: 4.089e+08 on 1 and 999 DF, p-value: < 2.2e-16 ``` --- # Displaying a plot from a target ```r tar_load(my_plot) my_plot ``` <img src="data:image/png;base64,#/Users/noamross/projects-eha/targets-renv-example/outputs/targets-presentation_files/figure-html/unnamed-chunk-2-1.png" style="display: block; margin: auto;" /> --- # Displaying the target network ```r tar_visnetwork(targets_only = TRUE) ```
--- # Good organization practices * Targets are nouns, functions are verbs (long names are fine!) * Document functions - this also helps if they are to be re-used elsewhere * Use target names as function arguments * All your packages in `packages.R`, your functions in an `R/` directory. (Also, use `renv` - more on this next week.) * Use add-Ins and keyboard shortcuts help with workflow (`targets`, `fnmate`, `tflow`) --- # The "functions as steps" mindset * I was more used to scripts: `0-fetch_data_0.R`, `1-clean_data_1.R`, `2-fit_model.R`, `3-make_plots.R` * Functions help define what goes _in_ and _out_ of each step * Big, fat specialized functions are fine, don't try to make them general * See [Miles McBains's talk](https://www.youtube.com/watch?v=jU1Zv21GvT4) about flow --- # What should `targets` be used for at EHA * On the computational team {targets} is a similar role to ODK/AirTable on empirical projects - a newer framework that we are proving out and anticipate using broadly as a foundational tool * `targets` is a good tool for any analysis or pipeline that is larger than a single script * But it's a pain to convert a project once it gets big, so plan accordingly * `targets` workflows can run on Rstudio Connect. There's a [targets flavor of R Markdown](https://books.ropensci.org/targets/markdown.html) that is useful for this. We're still learning the best way to do this. --- # Challenges: Sharing projects with large and long-running targets * Generally, one does not share the built objects in the `_targets` directory, allowing everyone to rebuild on their own. But for small objects this is fine. * Make download steps for data rather than saving locally (but beware stuff that changes online!) * `targets` lets you set up to [store targets in online storage](https://books.ropensci.org/targets/cloud.html#storage) * We are testing other options, such as the [git Large File System](https://git-lfs.github.com/) --- # Resources * The targets manual: <https://books.ropensci.org/targets/>, including a [walkthrough](https://books.ropensci.org/targets/walkthrough.html) * A good intro talk (~90 mins with questions) by Miles McBain on using targets and getting a good workflow: <https://www.youtube.com/watch?v=jU1Zv21GvT4> * A great introductory video series: 5 30-minute lessons: <https://www.youtube.com/watch?v=pbc6NX1n01Q&list=PLvgdJdJDL-APJqHy5CXs6m4N7hUVp5rb4> * This repository: <https://github.com/ecohealthalliance/targets-renv-example> * Noam's maximalist reproducible repository: <https://github.com/noamross/reprotemplate>