24 Training Resources and Plans
This section under revision as we consider alternate resources.
If you are planning on spending significant time improving your data science and modeling skills, you will want to create a training plan.
24.1 Training plan components
- A description of your goals for the training plans and to which EHA projects and activities you will apply your skills
- A list of courses/tutorials your plan to complete
- The total time the courses will take to complete.
- The time frame over which you expect to complete them
- The name of a peer learner. You should have a peer learner at EcoHealth who will be a partner over the course of your training. This may be someone working on a similar training plan or someone with knowledge of the material already. They should play some or all of these roles:
- Accountability: Your peer learner should know about your training plan and its time frame, and check in on how you are doing.
- Co-learning: Your learning peer and you may want to schedule times to watch course videos and complete exercises together
- Review: Especially for materials without automating, your peer learner should be able to look at your work and provide feedback
- Motivation: Your peer learner should make your training fun and tell you that you rock.
When your supervisor signs off on your training plan, contact Megan Walsh to provide you with any subscriptions for the period you need them.
24.2 Training Plans
These are some suggestions for assembling resources and courses into training plans. This is of course a small fraction of the many learning and teaching resources available. Consult your peers, supervisor, and the #data-sci-discuss Slack channel to find courses or resources on the topics you require. If you use a new resource or course, please add to to this page so others can learn from your impressions of it!
24.2.1 Introductory Programming materials
Hands-On Programming with R: An introduction to R for non-programmers with a focus on project based learning.
Introduction to R: This introduction is designed to get you familiar with R quickly. It covers the basics and explains how to work with common data types in R.
Eloquent Javascript A project based book that will take you from the basics to creating websites. For R users, the chapter on data structures is especially helpful for understanding JSON and
jsonlite
.
24.2.2 Better Managing Data
Johns Hopkins Introduction to the Tidyverse This chapter focuses on the foundational concept of tidy data. That is rectangular data where one row = one observation, all variables are store in their own column, and all cells have a value, even if that value is NA.
If you are mostly working in spreadsheets but collaborating with R or Python users, or just trying to organize a lot of spreadsheet data for your projects, work through the Data Carpentry lesson on spreadsheet organization (~2 hours) and read Hadley Wickham’s paper on tidy data (~1 hour)
Importing Data provides a good overview of basic data imports while R4DS: Import demonstrates how to properly import more complex data types.
AIRTABLE DATA - see chapter on Airtable
ODK DATA - see chapter on ODK
Reading and writing spatial data with
terra
This chapter goes over ingesting spatial data with theterra
package in R.Geographic Data in R This chapter discusses geographic data classes in R and the
sf
,sp
, andterra
packages.
24.2.3 Version Control and Git
Read through and work through the examples in Happy Git with R.
Atlassian has a nice collection of git tutorials for beginners and advanced users.
24.2.4 Reproducible reporting
Reports with R Markdown covers making the reproducible reports from the basics to parameterization.
This chapter provides an overview of using
targets
in R Markdown.
24.2.5 Improving Your Statistical Fundamentals
Probability, Statistics, and Data: A fresh approach using R A breadth first approach to probability and statistics that assumes minimal familiarity with calculus. The book includes data sets that require cleaning and wrangling before they can be used for analysis and relies on simulations to demonstrate important concepts.
An Introduction to Probability and Simulation This book covers concepts in Probability and Simulation. It encourages users to use code notebooks (google colab, Jupyter, etc) to work through problems.
Foundations of Probability in R. A short self guided course on probability and distributions.
Statistical Rethinking: A Bayesian Course. This is an excellent book and video lecture series that gives builds great foundations for doing many types of modeling. Prerequisites are Intermediate R, some experience with linear regression and probability. This course has about 19 hours of video. We recommend 2-3 months for going through the book, video, and exercises.
24.2.6 Improving your data visualization
- Fundamentals of Data Visualization by Claus Wilke is an excellent guide to making high-quality figures, focusing more on design than mechanics of programming. R code is available for all of its examples. If you feel you have a solid grasp of ggplot2 but want to improve the quality of your figures, we recommend reading this e-book, and using the accompanying code in its GitHub repository to reproduce figures.
### R Programming
Advanced R: Functions “In this chapter, you’ll learn how to turn informal, working knowledge [of functions] into more rigorous, theoretical understanding.”
24.2.7 Map-making and geospatial analysis in R
- Geocomputation in R is a comprehensive guide for understanding geographic data, mapping, and conducting spatial analysis in R. Likely, the most relevant chapters for your purposes are 1-8, 10-11. A chapter might take you 1-3 hours to work through, depending on how in depth you want to get and the number of exercises that you complete.
Data Carpentry has a course on using R for spatial data. Like other *Carpentry lessons its designed as a workshop lesson plan but can be self-taught. It presumes very little R knowledge at all, and includes stuff like setting a project in RStudio. This is a good place to start people or students with little R experience to get them making maps right away. If you just want to get a quick feel for R spatial data types, jump into Chapter 3.
Making Maps with R is a quick-start guide to mapping with ggplot2. It also introduces the gmap, maps, and mapdata packages for providing basemaps on which to overlay your spatial data. It is good for getting a map together quickly but if you are going to be doing things on a regular basis we suggest the resources above, which give you a better foundation on geographic data.
Leaflet for R is a manual on the use of the R leaflet package to harness Leaflet, an open-source JS library for creating interactive maps. Leaflet maps particularly useful for exploring and visualizing spatial data, and are easily embedded into R Markdown documents. You should take a course or have knowledge of R Markdown prior to taking this course.
24.3 Metagenomics
- Quality control
- Assembly
- Files and formats in high-throughput sequencing (In development)
- Pairwise sequence alignment
- Read alignment (In development)
- Multiple Alignment (In development)
- Variant Identification (In development)