5 Airtable
As of May 2024, the Africa CDC Database backend has shifted from Airtable to DoltHub.
This chapter has been kept here to document how the database started and to show the history of its development.
To read about DoltHub, see Chapter 2. To learn more about the use of dolt and DoltHub as the current backend for the database, read Chapter 6.
Airtable is a cloud-based software platform that allows users to create and manage databases, spreadsheets, and other types of organizational tools. It can be used for a variety of purposes, including project management, customer relationship management, inventory tracking, event planning, and much more.
One of the key features of Airtable is its flexible and customizable nature. Users can create and customize their own database structures, and can choose from a wide range of data types, including text, attachments, checkboxes, and more. This allows for a high degree of customization and adaptability to different use cases and workflows.
Airtable also offers a variety of collaboration features, including real-time syncing and commenting, as well as integrations with other popular tools such as Slack, Google Drive, and Trello. Additionally, Airtable has a robust API that allows developers to build custom integrations and applications on top of the platform.
Airtable’s flexibility, customizability, and features that support collaboration along with its spreadsheet-style interface that is familiar to most that have used other spreadsheet software such as Microsoft Excel or Google Sheets are the key reasons why it was chosen to be the database tool for RIG.
Chapter 5 provides an overview of Airtable and good practices for designing data models in relational databases.
5.1 Key Terms
- Workspace - A collection of bases
- Base - A database. Each is identified by the Airtable API via a
base id
- Table - A tabular data set within a base. Each table is identified by the Airtable API via the name of the table e.g. “Demo Table”
- Record - An individual cell within a table. Each record is identified by the Airtable API via a
record id
- Field - A property of data in a table
- Views - A specific way of displaying a table. Default is grid.
- Entity - Something that either physically or logically exists whose properties are typically stored in a table and composed of data elements.
- Element - an attribute of a Entity (a field)
5.2 Security and Access Control
Airtable maintains physical and technological security as part of its ISO IEC 27001:2013 and SOC 2 compliance measures. Data are 256-bit encrypted when storing on the server and also when transferring data over the internet. To find vulnerabilities in their software, they run daily, weekly, and monthly scans on different components of their system and regularly commission external penetration tests. They also run a bug bounty program to help identify issues. Their data centers have fire detection and suppression systems, redundant power systems, and strict control for physical access. Because Airtable relies on Amazon Web Services (AWS) for its cloud infrastructure (the same providers used by previous EHA projects), data are geo-redundantly replicated in backups across multiple zones to increase data durability. They also have a team monitoring services at all times. Airtable employees are thoroughly vetted before hiring and continually trained on data protection best practices. Their workstations are secured by using full-disk encryption, automatic locking, and strong password requirements.
5.2.1 User and Administrator Security Features
5.2.1.1 Access Controls
Airtable provides database (referred to as “base”) and workspace administrators with granular controls over who can view, edit, comment, or otherwise modify data at the base, table, and field levels. There are four levels of Airtable user permissions:
- Owner/Creator: Full administrative control of base
- Editor: Sees full base, create and modify records and views, create and modify view share links
- Commenter: Sees full base, comment on records
- Reader: Sees full base
Direct access to a base or workspace is granted or removed by base owners and creators to Airtable users. Base owners and creators can control who has access to a base and can control any “share” links created for that base. They may also restrict editing of tables or fields within a base. Any collaborator given direct access to a base at any permission level will be able to duplicate that base and share that data further. It is important that direct access to the base is limited to individuals with a need to curate or analyze the data.
5.2.1.2 Share Links and Interfaces
To further restrict access to a base, users can be given indirect access via revocable share links or interfaces.
Share links can be customized to prevent users from seeing the full base, prevent duplicating the base, and prevent copying data from the base. The ability to use the link can be password protected, restricted to people with certain email domains, and may be revoked at any time. If there are concerns about data leaks via base or table duplication, inviting people with a need to view the data via share links constrains their ability to extract data from the base.
Interfaces are dynamic dashboards built on a limited set of data in an airtable base. Users can explore or even edit data based on the permissions provided by the interface creator. Access can be further tuned by setting up a “current-user filter”. See this guide for more information.
5.3 Data in Airtable
See Airtable plan comparison for more information on the size of bases and features available. Information in this section pertains to all plans unless specified.
5.3.1 Workspaces, Bases, Tables, Fields, Records
Airtable uses workspaces, bases, tables, and fields to manage data. A workspace generally pertains to a particular project and contains all bases relevant to that project. Sharing a workspace with someone allows them to see all bases within that workspace.
Bases are equivalent to databases. They consist of a set of tables that can be linked and allow you to perform some task (e.g. IRB tracking, capturing research data, etc.). Bases can be duplicated and shared across your workspaces and you can share bases with other users.
Tables are where most of the action happens. Data is entered in tables, tables can be transformed via views into calendars, dashboards, or galleries, and tables can be manipulated via the API. They describe a data entity and are composed of fields. In the bat sampling example, each circle would be a table in the bat sampling base (see Figure 5.1).
Fields represent properties of a data entity. In a spreadsheet view, fields are columns. Airtable allows you to control field types (date, number, text, file attachments, logical, etc.) and paid plans allows you to control who can edit fields. Fields can be linked between tables creating links.
Records are the individual data points in a table. In a spreadsheet they would be the rows. Records are shaped by the structure you have created in tables and fields. Each record has a URL that can be used to access it programmatically or share it.
5.4 Views
In Airtable, tables can be displayed in different views to emphasis different components of the data. Views are great for creating concise presentations of data, especially in sprawling tables. The default view in an table is the grid (spreadsheet) view. All other views will derive from the data entered in this view.
For more on views, see the guide to views.
5.5 Internal Backups: Record History, Base Snapshots
Airtable has system for tracking changes to a base. They provide revision history for individual records (how long those histories are stored varies by plan). Any comments made in the revision history will be stored for the life of the record (on any plan). The current state of an entire base may be captured through snapshots. Should a systemic issue arise, the base can be restored to a snapshot at a later date. Restoring from a snapshot will remove the revision history for a record but comments on that record will be maintained.
As revision histories are maintained, “deleted” records may be retrieved. In the event of the need to permanently remove data, the revision history of the base may be removed.
Airtable allows data to be exported as CSVs from individual tables. At this time there is not an Airtable supported base export function.
5.6 Importing data
Data can be imported to Airtable from a number of sources including CSV files, excel, Google sheets, XML, and via copy paste. Airtable will guess what the most appropriate field type is, so make sure the field type is appropriate for the data (e.g. convert from text to date type fields). Certain sources, like Google sheets, can be imported as bases.
For more on importing data see: Importing and Adding data
5.7 Base Design
Having a good base design will make using your data easier. Generally, the process looks like this:
- Describe what the database will do and collect use cases
- Determine the roles of various stakeholders
- List out the entities in the database and define their properties
- Map out how the entities fit together (which properties link them)
- Check that the mapping meets the use cases
- Build base in Airtable
- Check that the base meets the use cases
If you are migrating from spreadsheets, you likely already have an idea of what you need the base to do and a collection of data properties. It is still a good idea to follow the steps outlined above for mapping out entities. You may find that entire sheets can be replaced by views or that the data in one sheet should actually be stored in two different tables.
5.8 Automating Airtable
Airtable has five main routes for automating processes.
- Automations - a drag and drop visual programming tool
- Extensions - pre-built applications that perform some task
- Scripting - use JavaScript to automate tasks within Airtable
- Blocks - use JavaScript to create custom applications
- REST API - use whatever programming language you like to automate processes
5.8.1 Automations with Drag and Drop Programming (pro and above)
The automations feature within Airtable allows you to visually program routines. Each automation has three basic components (see Figure 5.2):
- Status - controls whether or not the automation will run when the trigger condition is met
- Triggers - condition for automation to run: the creation or change of a record, a scheduled time, or some other external action
- Actions - what the automation will do
Automations are commonly used to augment the synced tables feature, send notifications, check data quality, and manipulate or create records.
This automation (see Figure 5.3) creates a weekly summary of applicants who applied for a position. Its trigger is time based, then it finds records that match a condition, and finally it generates an email from those records and sends to the appropriate recipients.
5.8.2 Scripting and Blocks (pro plan and above)
Scripting uses JavaScript to manipulate a base from within an application in Airtable. Scripting is flexible but has a steep learning curve. The scripting environment Airtable provides can be helpful as it provides code linting, direct access to documentation, and example scripts to build from. The major drawback to scripting is that scripts live in the base and files are not version controlled.
- There are a LOT of pre-written scripts, in a shared “marketplace” to perform automatic actions. Search the marketplace before you start writing bespoke code.
Blocks are custom applications built in JavaScript and node.js that add to base functionality. They are created in a development environment outside of Airtable then brought back into platform. There are number of tutorials for getting started with blocks.
5.9 Using the REST API
All Airtable bases are automatically accessible to authorized users via a REST API. The list of API accessible bases you have access to can be found here: https://airtable.com/api. By clicking on a base you will be able to see the full API documentation for that base.
5.9.1 Scoped Tokens
Airtable is moving to a scoped tokens based approach to api access. Scoped personal access tokens allow you to create a token for a specific base with specific permissions - e.g. token has read-only access to a bat sampling base. Using scoped tokens in this way means that if the token is compromised (leaked, stolen, accidentally committed unencrypted to a github repo, etc), you can delete that token to remove any access it might have had and the limited scope means that you know exactly what a person would have been able to access.
To create a personal access token go here: https://airtable.com/create/tokens
Remember to save the token in a secure place. Do not store unencrypted tokens on the web (e.g. pushing them to GitHub).
Airtable has also deployed OAuth tokens for all users.
5.9.2 Airtable and R
The Airtable REST API can be used via R with the airtabler package. EHA has started a fork of the package that has additional functionality so it is recommended to use that version. The original package design works well for exploring the data. Our extension adds additional functionality to help use airtabler
in automation via continuous integration such as GitHub Actions.
::install_github("ecohealthalliance/airtabler") devtools
The Airtable API serves up data as JSON, which has a hierarchical structure similar to a list in R. To handle JSON, airtabler
uses the jsonlite
package. Its helpful to understand how jsonlite
handles different JSON structures when working with more complicated Airtable data. See the jsonlite
quick-start guide for a basic overview. The purrr
package is extremely helpful when dealing with data objects derived from JSON because it facilitates navigating nested data structures.
The airtabler
package provides instructions for setting up access to the Airtable API. You will need to follow those instructions for Example 5.1 and Example 5.2 to work.
Example 5.1
library(airtabler)
<- fetch_all(base = "app49bbyLczZxX9PM",table_name = "One To One")
table1
<- fetch_all(base = "app49bbyLczZxX9PM",table_name = "Table 2")
table2
## Linked records are stored as JSON arrays so they become lists
## when part of a data frame. Because each array has a length of 1
## we can safely unlist the arrays and add them back as to the
## data frame.
$LinkedRec <- unlist(table1$LinkedRec)
table1
<- dplyr::left_join(table1,table2, by = c("LinkedRec"="id"))
joinedTables
<- joinedTables[c("Name","LinkedRec","Number")]
recordKey
recordKey
One to One join
Example 5.2
library(airtabler)
library(dplyr)
library(purrr)
<- fetch_all(base = "app49bbyLczZxX9PM", table_name = "One To Many")
oneToMany
<- fetch_all(base = "app49bbyLczZxX9PM",table_name = "Table 2")
table2
# Depending on how you want to work with the data joins are a
# little trickier here.
## to replace the values but keep the structure
$LinkedRecReplace <- purrr::map(oneToMany$LinkedRec, function(x){
oneToMany%>%
table2 filter(id %in% x) %>%
select(c("id","Number")) %>%
pull("Number")
})
data.frame(
id = unlist(oneToMany$LinkedRec),
label = unlist(oneToMany$LinkedRecReplace)
)
One to Many Join
5.10 Data Management
Because of its flexibility and ease of use, it is extremely important that data management for airtable be taken seriously. Unlike most other relational databases, the fundamental properties of your base can be changed easily by multiple users without warning! Documenting the structure and purpose of your base, as well as creating regular external backups could save you from catastrophe.
5.10.1 Metadata
- Structural metadata - Information about a resource that tells you how its put together - e.g. the relationship between a table, field, and automation. With the exception of users on the enterprise plan, you must be deliberate about creating and maintaining structural metadata for your base.
Example of Airtable Structural Metadata:
- Descriptive metadata - Information about a resource that makes it easier to find and attribute data. This includes things attributes like author, title, description, keywords, etc. This table becomes especially important when data are transitioned out of airtable, at the end of a project, or if the base will be shared broadly with collaborators.
Example of Airtable Descriptive Metadata:
To see the template base, follow this link
5.10.1.1 Metadata for automations and extensions
Automations and extensions live entirely inside airtable. As of August 2022, Airtable introduced a “Manage Fields” tab that provides metadata for a specific table, including any automation dependencies. It is not possible to export those dependency tables or the code used in automations automatically. This makes it extremely important to document any automations and extensions outside your base.
5.10.2 External Backups
It is a good idea to create regular external backups of airtable data in the event that something catastrophic happens to your base or you simply decide you no longer wish to use airtable as your data store.Unfortunately, airtable does not provide an off the shelf solution for this. Below are three options for extracting all data from your base.
5.10.2.1 Using airtabler
Using airtabler::air_dump()
and airtabler::air_dump_to_csv()
functions, you can export all tables to R then create a versioned folder of CSVs. See the air_dump_to_csv
help page for more information.
5.10.2.2 Other Open Source Projects
UnlyEd - use this template to push airtable backups to AWS S3.
5.10.2.3 Paid services
Sequin - replicate airtable to postGres database (https://www.sequin.io/sources/airtable)