Using Yenpathy

Overview

yenpathy is an R package to quickly find k shortest paths through a weighted graph using Yen’s Algorithm (Yen 1971). This algorithm has numerous applications in fields from transportation planning to social network analysis.

There are already comprehensive network analysis packages in R, notably igraph and its tidyverse-compatible interface tidygraph.yenpathy complements these by doing one thing well.

yenpathy provides the function k_shortest_paths(), which returns k shortest paths between two nodes in a graph, ordered from shortest to longest, with length determined by the sum of weights of consecutive edges.

Installation

Install yenpathy from GitHub with remotes:

library(remotes)
install_github("ecohealthalliance/yenpathy")

Usage

To find shortest paths through a network, pass k_shortest_paths() a data frame with rows representing the edges of a graph (graph_df), as well as supplying from and to nodes in graph_df. The function will return a list containing up to k (default 1) vectors representing paths through the graph:

library(yenpathy)

small_graph <- data.frame(
start = c(1, 4, 5, 1, 1, 8, 1, 2, 7, 3),
end = c(4, 5, 6, 6, 8, 6, 2, 7, 3, 6),
weight = c(1, 1, 1.5, 5, 1.5, 2.5, 1.5, 0.5, 0.5, 0.5)
)

k_shortest_paths(small_graph, from = 1, to = 6, k = 4)
#> [[1]]
#> [1] 1 2 7 3 6
#> 
#> [[2]]
#> [1] 1 4 5 6
#> 
#> [[3]]
#> [1] 1 8 6
#> 
#> [[4]]
#> [1] 1 6

You can also pass an edge_penalty, a number which is added to all of the edge weights, effectively penalizing paths consisting of more edges.

k_shortest_paths(small_graph, from = 1, to = 6, k = 4, edge_penalty = 1)
#> [[1]]
#> [1] 1 6
#> 
#> [[2]]
#> [1] 1 8 6
#> 
#> [[3]]
#> [1] 1 4 5 6
#> 
#> [[4]]
#> [1] 1 2 7 3 6

API

k_shortest_paths() takes a data frame as its first argument (graph_df). In place of a data frame, you can provide an igraph or tidygraph graph object, which will be converted to a data frame using igraph::as_data_frame().

By default, the first and second columns are assumed to contain the names of the the start and end nodes, respectively. These may be either integers or character vectors. If there is a column named “weights”, it is used for edge weights. You can change these defaults by passing column numbers or names to the col_from, col_to, and col_weights arguments.

An example using flight data

A common application for Yen’s algorithm is in transportation planning, when one wants to find the shortest routes between stops such as airports. Here we use the USairports dataset from the igraphdata package to demonstrate how to find the k shortest routes between airports in Bangor, ME (BGR) and Omaha, NE (OMA).

library(igraph)
#> 
#> Attaching package: 'igraph'
#> The following objects are masked from 'package:stats':
#> 
#>     decompose, spectrum
#> The following object is masked from 'package:base':
#> 
#>     union
library(igraphdata)
data("USairports")
paths <- k_shortest_paths(USairports, "BGR", "OMA", k = 20, col_weights = "Distance")
paths[1:5]
#> [[1]]
#> [1] "BGR" "DTW" "OMA"
#> 
#> [[2]]
#> [1] "BGR" "DTW" "ORD" "OMA"
#> 
#> [[3]]
#> [1] "BGR" "DTW" "ORD" "DSM" "OMA"
#> 
#> [[4]]
#> [1] "BGR" "DTW" "DSM" "OMA"
#> 
#> [[5]]
#> [1] "BGR" "DTW" "AZO" "ORD" "OMA"

The result shows the airport nodes, but we can use the paths object to look up the edges, or flights, that make up these itineraries. In this case, we will select just the carriers with the most seats for each leg, as USAairports includes multiple edges for each carrier operating flights between an airport pair:

usa <- as_data_frame(USairports)
itineraries <-
  lapply(paths, function(.x) {
    n_stops <- length(.x)
    legs <- cbind(.x[1:(n_stops - 1)], .x[2:n_stops])
    flights <- apply(legs, 1, function(.z) {
      fl <- subset(usa, from == .z[1] & to == .z[2])
      fl <- subset(fl, Seats == max(Seats))
    })
    do.call(rbind, flights)
  })
itineraries[1:5]
#> [[1]]
#>       from  to                     Carrier Departures Seats Passengers Aircraft
#> 8404   BGR DTW      Pinnacle Airlines Inc.         29  1450       1287      629
#> 18340  DTW OMA Atlantic Southeast Airlines         71  4615       3350      631
#>       Distance
#> 8404       750
#> 18340      651
#> 
#> [[2]]
#>       from  to                      Carrier Departures Seats Passengers
#> 8404   BGR DTW       Pinnacle Airlines Inc.         29  1450       1287
#> 20147  DTW ORD American Eagle Airlines Inc.        144  7200       6344
#> 21492  ORD OMA        United Air Lines Inc.         48  6798       5077
#>       Aircraft Distance
#> 8404       629      750
#> 20147      675      235
#> 21492      694      416
#> 
#> [[3]]
#>       from  to                      Carrier Departures Seats Passengers
#> 8404   BGR DTW       Pinnacle Airlines Inc.         29  1450       1287
#> 20147  DTW ORD American Eagle Airlines Inc.        144  7200       6344
#> 22514  ORD DSM        Shuttle America Corp.         73  5110       2759
#> 17379  DSM OMA                Allegiant Air          1   130         31
#>       Aircraft Distance
#> 8404       629      750
#> 20147      675      235
#> 22514      677      299
#> 17379      654      117
#> 
#> [[4]]
#>       from  to                Carrier Departures Seats Passengers Aircraft
#> 8404   BGR DTW Pinnacle Airlines Inc.         29  1450       1287      629
#> 8534   DTW DSM Pinnacle Airlines Inc.         42  2100       1518      629
#> 17379  DSM OMA          Allegiant Air          1   130         31      654
#>       Distance
#> 8404       750
#> 8534       534
#> 17379      117
#> 
#> [[5]]
#>       from  to                      Carrier Departures Seats Passengers
#> 8404   BGR DTW       Pinnacle Airlines Inc.         29  1450       1287
#> 8514   DTW AZO       Pinnacle Airlines Inc.         66  3299       2093
#> 19876  AZO ORD American Eagle Airlines Inc.         54  2700       1451
#> 21492  ORD OMA        United Air Lines Inc.         48  6798       5077
#>       Aircraft Distance
#> 8404       629      750
#> 8514       629      113
#> 19876      675      122
#> 21492      694      416

We can calculate the total distance of each itinerary:

sapply(itineraries, function(.x) sum(.x[["Distance"]]))
#>  [1] 1401 1401 1401 1401 1401 1401 1402 1402 1402 1402 1404 1405 1405 1407 1407
#> [16] 1408 1411 1414 1416 1416

Performance

A number of packages, such as igraph and dodgr, provide functions to find the shortest path between two nodes on a network, or all “simple” (non-looping) paths between nodes. For the single shortest path, these packages can be faster than yenpathy:

library(bench)
set.seed(42)
network <- sample_smallworld(1, 20, 4, .05)
bench::mark(
  shortest_paths(network, from = 1, to = 10),
  k_shortest_paths(network, from = 1, to = 10, k = 1),
  check = FALSE
)
#> # A tibble: 2 x 6
#>   expression                                            min median `itr/sec`
#>   <bch:expr>                                          <bch> <bch:>     <dbl>
#> 1 shortest_paths(network, from = 1, to = 10)          125µs  141µs     6795.
#> 2 k_shortest_paths(network, from = 1, to = 10, k = 1) 484µs  514µs     1910.
#> # … with 2 more variables: mem_alloc <bch:byt>, `gc/sec` <dbl>

However, to calculate k shortest paths, one generally has to calculate all such paths, and later subset. Doing this with igraph::all_simple_paths() can be considerably slower, even on a very simple network, and impossible due to memory or time constraints for large, densely-connected graphs.

network <- sample_smallworld(1, 8, 3, .05)
bench::mark(
  all_simple_paths(network, from = 1, to = 5, mode = "all"),
  k_shortest_paths(network, from = 1, to = 5, k = 1),
  check = FALSE
)
#> # A tibble: 2 x 6
#>   expression                                                    min  median
#>   <bch:expr>                                                <bch:t> <bch:t>
#> 1 all_simple_paths(network, from = 1, to = 5, mode = "all")  42.9ms  43.1ms
#> 2 k_shortest_paths(network, from = 1, to = 5, k = 1)          391µs 421.8µs
#> # … with 3 more variables: `itr/sec` <dbl>, mem_alloc <bch:byt>, `gc/sec` <dbl>

Implementation

The package wraps a C++ implementation of Yen’s algorithm created by Yan Qi. The original C++ code is available on GitHub at yan-qi/k-shortest-paths-cpp-version.

Contributing

This project is released with a Contributor Code of Conduct. By participating in this project, you agree to abide by its terms.

Feel free to submit feedback, bug reports, or feature suggestions here, or to submit fixes as pull requests.

References

Yen, Jin Y. 1971. “Finding theKShortest Loopless Paths in a Network.” Management Science 17 (11). Institute for Operations Research; the Management Sciences (INFORMS): 712–16. https://doi.org/10.1287/mnsc.17.11.712.

Toph Allen and Noam Ross, EcoHealth Alliance