Using targets for reproducible workflows

class: right, bottom, title-slide

# Using <code>targets</code> for reproducible workflows
### Noam Ross
### EcoHealth Methods and Models Meeting, 2021-09-01

---

# Goals of today

* A brief (re)introduction to reproducibility

* The `targets` package as part of toolbox for analysis projects at EHA

* Walkthrough of a basic project using `targets`

* Resources for in-depth learning
---

# Why Reproducibility?

* Not so much so that others can re-create, but so others can _build on to of_ your work:

>  "We thank Olival and colleagues for releasing comprehensive code for their reproducible analyses, allowing us to adapt it for our own purposes and easily compare results." [Shaw et. al. 2020](https://doi.org/10.1111/mec.15463)

* To reduce the effort needed for revisions, and re-activating the project after a period of dormancy.

* To speed up the process of adopting the project for new applications.

---

# Parts of (Computational) Reproducibility (and related tools)

* _Workflow_: Re-creating the same set of steps done originally (`targets`)

* _Environment_: Computing with the same software (and hardware) (renv and Docker)

* _Versioning_: The above with different versions for the project (git, GitHub, database snapshots)

* _Testing_: Automatically checking that your code is working as expected (`testthat`, `assertr`, `validate`, continuous integration)

* _Access_: Having access to the source data and resources (repositories, databases)

* _Documentation_: Instructions for all of the above (markdown, roxygen)

---

# `targets` mostly supports _workflow_, which helps with other things

* `targets` repositories generally have a common structure, making them easier to
  document and share

* With good coding practices, _some_ "self-documentation" occurs in the steps
  of your analysis.
  
* Some of your environment (packages used) is captured in a `targets` set up.

* Testing can be applied to objects produced in a `targets` workflow

---

# Leaving slides for a walkthrough of a targets repository

---

# Model output from a target

```r
tar_load(my_model)
summary(my_model)
```

```
## 
## Call:
## lm(formula = x ~ 0 + y, data = random_data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.6354 -0.2412  0.3342  0.9880  2.7726 
## 
## Coefficients:
##    Estimate Std. Error t value Pr(>|t|)    
## y 5.011e-01  2.478e-05   20222   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.9035 on 999 degrees of freedom
## Multiple R-squared:      1,	Adjusted R-squared:      1 
## F-statistic: 4.089e+08 on 1 and 999 DF,  p-value: < 2.2e-16
```
---

# Displaying a plot from a target

```r
tar_load(my_plot)
my_plot
```

<img src="data:image/png;base64,#/Users/noamross/projects-eha/targets-renv-example/outputs/targets-presentation_files/figure-html/unnamed-chunk-2-1.png" style="display: block; margin: auto;" />
---

# Displaying the target network

```r
tar_visnetwork(targets_only = TRUE)
```

<div id="htmlwidget-126f93651c9752b92397" style="width:504px;height:360px;" class="visNetwork html-widget"></div>
<script type="application/json" data-for="htmlwidget-126f93651c9752b92397">{"x":{"nodes":{"name":["downloaded_data","downloaded_data_file","my_model","my_other_plot","my_plot","plot_png","random_data","readme","renv_presentation","targets_presentation"],"type":["stem","stem","stem","stem","stem","stem","stem","stem","stem","stem"],"status":["uptodate","uptodate","uptodate","uptodate","uptodate","uptodate","uptodate","uptodate","uptodate","started"],"seconds":[0.41,0.409,10.159,0.003,0.02,0.616,0.009,8.445,3.353,7.823],"bytes":[10368,180489,94905,44,59262,57844,7751,42862,1622491,3028334],"branches":[null,null,null,null,null,null,null,null,null,null],"id":["downloaded_data","downloaded_data_file","my_model","my_other_plot","my_plot","plot_png","random_data","readme","renv_presentation","targets_presentation"],"label":["downloaded_data","downloaded_data_file","my_model","my_other_plot","my_plot","plot_png","random_data","readme","renv_presentation","targets_presentation"],"level":[2,1,3,3,3,4,1,0,0,4],"color":["#354823","#354823","#354823","#354823","#354823","#354823","#354823","#354823","#354823","#DC863B"],"shape":["dot","dot","dot","dot","dot","dot","dot","dot","dot","dot"]},"edges":{"from":["downloaded_data","random_data","my_model","my_plot","my_plot","downloaded_data","downloaded_data","downloaded_data_file"],"to":["my_model","my_model","targets_presentation","targets_presentation","plot_png","my_plot","my_other_plot","downloaded_data"],"arrows":["to","to","to","to","to","to","to","to"]},"nodesToDataframe":true,"edgesToDataframe":true,"options":{"width":"100%","height":"100%","nodes":{"shape":"dot","physics":false},"manipulation":{"enabled":false},"edges":{"smooth":{"type":"cubicBezier","forceDirection":"horizontal"}},"physics":{"stabilization":false},"layout":{"hierarchical":{"enabled":true,"direction":"LR"}}},"groups":null,"width":null,"height":null,"idselection":{"enabled":false,"style":"width: 150px; height: 26px","useLabels":true,"main":"Select by id"},"byselection":{"enabled":false,"style":"width: 150px; height: 26px","multiple":false,"hideColor":"rgba(200,200,200,0.5)","highlight":false},"main":{"text":"","style":"font-family:Georgia, Times New Roman, Times, serif;font-weight:bold;font-size:20px;text-align:center;"},"submain":null,"footer":null,"background":"rgba(0, 0, 0, 0)","highlight":{"enabled":true,"hoverNearest":false,"degree":{"from":1,"to":1},"algorithm":"hierarchical","hideColor":"rgba(200,200,200,0.5)","labelOnly":true},"collapse":{"enabled":true,"fit":false,"resetHighlight":true,"clusterOptions":null,"keepCoord":true,"labelSuffix":"(cluster)"},"legend":{"width":0.2,"useGroups":false,"position":"right","ncol":1,"stepX":100,"stepY":100,"zoom":true,"nodes":{"label":["Up to date","Started","Stem"],"color":["#354823","#DC863B","#899DA4"],"shape":["dot","dot","dot"]},"nodesToDataframe":true}},"evals":[],"jsHooks":[]}</script>
---

# Good organization practices

* Targets are nouns, functions are verbs (long names are fine!)

* Document functions - this also helps if they are to be re-used elsewhere

* Use target names as function arguments

* All your packages in `packages.R`, your functions in an `R/` directory. (Also,
  use `renv` - more on this next week.)

* Use add-Ins and keyboard shortcuts help with workflow (`targets`, `fnmate`, `tflow`)

---

# The "functions as steps" mindset

* I was more used to scripts: `0-fetch_data_0.R`, `1-clean_data_1.R`, `2-fit_model.R`, `3-make_plots.R`

* Functions help define what goes _in_ and _out_ of each step

* Big, fat specialized functions are fine, don't try to make them general

* See [Miles McBains's talk](https://www.youtube.com/watch?v=jU1Zv21GvT4) about flow

---

# What should `targets` be used for at EHA

* On the computational team {targets} is a similar role to ODK/AirTable on empirical
  projects - a newer framework that we are proving out and anticipate using broadly
  as a foundational tool

* `targets` is a good tool for any analysis or pipeline that is larger than a single script

* But it's a pain to convert a project once it gets big, so plan accordingly
    
* `targets` workflows can run on Rstudio Connect.  There's a [targets flavor of
  R Markdown](https://books.ropensci.org/targets/markdown.html) that is useful 
  for this. We're still learning the best way to do this. 
  
---

# Challenges: Sharing projects with large and long-running targets

* Generally, one does not share the built objects in the `_targets` directory,
  allowing everyone to rebuild on their own.  But for small objects this is fine.

* Make download steps for data rather than saving locally (but beware stuff that changes online!)

* `targets` lets you set up to [store targets in online storage](https://books.ropensci.org/targets/cloud.html#storage)

* We are testing other options, such as the [git Large File System](https://git-lfs.github.com/)

---

# Resources

* The targets manual: <https://books.ropensci.org/targets/>, including a [walkthrough](https://books.ropensci.org/targets/walkthrough.html)

* A good intro talk (~90 mins with questions) by Miles McBain on using targets
  and getting a good workflow: <https://www.youtube.com/watch?v=jU1Zv21GvT4>
  
* A great introductory video series: 5 30-minute lessons: <https://www.youtube.com/watch?v=pbc6NX1n01Q&list=PLvgdJdJDL-APJqHy5CXs6m4N7hUVp5rb4>

* This repository: <https://github.com/ecohealthalliance/targets-renv-example>

* Noam's maximalist reproducible repository: <https://github.com/noamross/reprotemplate>