vignettes/using-yenpathy.Rmd
using-yenpathy.Rmd
yenpathy is an R package to quickly find k shortest paths through a weighted graph using Yen’s Algorithm (Yen 1971). This algorithm has numerous applications in fields from transportation planning to social network analysis.
There are already comprehensive network analysis packages in R, notably igraph and its tidyverse-compatible interface tidygraph.yenpathy complements these by doing one thing well.
yenpathy provides the function k_shortest_paths()
, which returns k shortest paths between two nodes in a graph, ordered from shortest to longest, with length determined by the sum of weights of consecutive edges.
Install yenpathy from GitHub with remotes:
library(remotes) install_github("ecohealthalliance/yenpathy")
To find shortest paths through a network, pass k_shortest_paths()
a data frame with rows representing the edges of a graph (graph_df
), as well as supplying from
and to
nodes in graph_df
. The function will return a list containing up to k
(default 1) vectors representing paths through the graph:
library(yenpathy) small_graph <- data.frame( start = c(1, 4, 5, 1, 1, 8, 1, 2, 7, 3), end = c(4, 5, 6, 6, 8, 6, 2, 7, 3, 6), weight = c(1, 1, 1.5, 5, 1.5, 2.5, 1.5, 0.5, 0.5, 0.5) ) k_shortest_paths(small_graph, from = 1, to = 6, k = 4) #> [[1]] #> [1] 1 2 7 3 6 #> #> [[2]] #> [1] 1 4 5 6 #> #> [[3]] #> [1] 1 8 6 #> #> [[4]] #> [1] 1 6
You can also pass an edge_penalty
, a number which is added to all of the edge weights, effectively penalizing paths consisting of more edges.
k_shortest_paths(small_graph, from = 1, to = 6, k = 4, edge_penalty = 1) #> [[1]] #> [1] 1 6 #> #> [[2]] #> [1] 1 8 6 #> #> [[3]] #> [1] 1 4 5 6 #> #> [[4]] #> [1] 1 2 7 3 6
k_shortest_paths()
takes a data frame as its first argument (graph_df
). In place of a data frame, you can provide an igraph or tidygraph graph object, which will be converted to a data frame using igraph::as_data_frame()
.
By default, the first and second columns are assumed to contain the names of the the start and end nodes, respectively. These may be either integers or character vectors. If there is a column named “weights”, it is used for edge weights. You can change these defaults by passing column numbers or names to the col_from
, col_to
, and col_weights
arguments.
A common application for Yen’s algorithm is in transportation planning, when one wants to find the shortest routes between stops such as airports. Here we use the USairports
dataset from the igraphdata package to demonstrate how to find the k shortest routes between airports in Bangor, ME (BGR) and Omaha, NE (OMA).
library(igraph) #> #> Attaching package: 'igraph' #> The following objects are masked from 'package:stats': #> #> decompose, spectrum #> The following object is masked from 'package:base': #> #> union library(igraphdata) data("USairports") paths <- k_shortest_paths(USairports, "BGR", "OMA", k = 20, col_weights = "Distance") paths[1:5] #> [[1]] #> [1] "BGR" "DTW" "OMA" #> #> [[2]] #> [1] "BGR" "DTW" "ORD" "OMA" #> #> [[3]] #> [1] "BGR" "DTW" "ORD" "DSM" "OMA" #> #> [[4]] #> [1] "BGR" "DTW" "DSM" "OMA" #> #> [[5]] #> [1] "BGR" "DTW" "AZO" "ORD" "OMA"
The result shows the airport nodes, but we can use the paths
object to look up the edges, or flights, that make up these itineraries. In this case, we will select just the carriers with the most seats for each leg, as USAairports
includes multiple edges for each carrier operating flights between an airport pair:
usa <- as_data_frame(USairports) itineraries <- lapply(paths, function(.x) { n_stops <- length(.x) legs <- cbind(.x[1:(n_stops - 1)], .x[2:n_stops]) flights <- apply(legs, 1, function(.z) { fl <- subset(usa, from == .z[1] & to == .z[2]) fl <- subset(fl, Seats == max(Seats)) }) do.call(rbind, flights) }) itineraries[1:5] #> [[1]] #> from to Carrier Departures Seats Passengers Aircraft #> 8404 BGR DTW Pinnacle Airlines Inc. 29 1450 1287 629 #> 18340 DTW OMA Atlantic Southeast Airlines 71 4615 3350 631 #> Distance #> 8404 750 #> 18340 651 #> #> [[2]] #> from to Carrier Departures Seats Passengers #> 8404 BGR DTW Pinnacle Airlines Inc. 29 1450 1287 #> 20147 DTW ORD American Eagle Airlines Inc. 144 7200 6344 #> 21492 ORD OMA United Air Lines Inc. 48 6798 5077 #> Aircraft Distance #> 8404 629 750 #> 20147 675 235 #> 21492 694 416 #> #> [[3]] #> from to Carrier Departures Seats Passengers #> 8404 BGR DTW Pinnacle Airlines Inc. 29 1450 1287 #> 20147 DTW ORD American Eagle Airlines Inc. 144 7200 6344 #> 22514 ORD DSM Shuttle America Corp. 73 5110 2759 #> 17379 DSM OMA Allegiant Air 1 130 31 #> Aircraft Distance #> 8404 629 750 #> 20147 675 235 #> 22514 677 299 #> 17379 654 117 #> #> [[4]] #> from to Carrier Departures Seats Passengers Aircraft #> 8404 BGR DTW Pinnacle Airlines Inc. 29 1450 1287 629 #> 8534 DTW DSM Pinnacle Airlines Inc. 42 2100 1518 629 #> 17379 DSM OMA Allegiant Air 1 130 31 654 #> Distance #> 8404 750 #> 8534 534 #> 17379 117 #> #> [[5]] #> from to Carrier Departures Seats Passengers #> 8404 BGR DTW Pinnacle Airlines Inc. 29 1450 1287 #> 8514 DTW AZO Pinnacle Airlines Inc. 66 3299 2093 #> 19876 AZO ORD American Eagle Airlines Inc. 54 2700 1451 #> 21492 ORD OMA United Air Lines Inc. 48 6798 5077 #> Aircraft Distance #> 8404 629 750 #> 8514 629 113 #> 19876 675 122 #> 21492 694 416
We can calculate the total distance of each itinerary:
sapply(itineraries, function(.x) sum(.x[["Distance"]])) #> [1] 1401 1401 1401 1401 1401 1401 1402 1402 1402 1402 1404 1405 1405 1407 1407 #> [16] 1408 1411 1414 1416 1416
A number of packages, such as igraph and dodgr, provide functions to find the shortest path between two nodes on a network, or all “simple” (non-looping) paths between nodes. For the single shortest path, these packages can be faster than yenpathy:
library(bench) set.seed(42) network <- sample_smallworld(1, 20, 4, .05) bench::mark( shortest_paths(network, from = 1, to = 10), k_shortest_paths(network, from = 1, to = 10, k = 1), check = FALSE ) #> # A tibble: 2 x 6 #> expression min median `itr/sec` #> <bch:expr> <bch> <bch:> <dbl> #> 1 shortest_paths(network, from = 1, to = 10) 125µs 141µs 6795. #> 2 k_shortest_paths(network, from = 1, to = 10, k = 1) 484µs 514µs 1910. #> # … with 2 more variables: mem_alloc <bch:byt>, `gc/sec` <dbl>
However, to calculate k shortest paths, one generally has to calculate all such paths, and later subset. Doing this with igraph::all_simple_paths()
can be considerably slower, even on a very simple network, and impossible due to memory or time constraints for large, densely-connected graphs.
network <- sample_smallworld(1, 8, 3, .05) bench::mark( all_simple_paths(network, from = 1, to = 5, mode = "all"), k_shortest_paths(network, from = 1, to = 5, k = 1), check = FALSE ) #> # A tibble: 2 x 6 #> expression min median #> <bch:expr> <bch:t> <bch:t> #> 1 all_simple_paths(network, from = 1, to = 5, mode = "all") 42.9ms 43.1ms #> 2 k_shortest_paths(network, from = 1, to = 5, k = 1) 391µs 421.8µs #> # … with 3 more variables: `itr/sec` <dbl>, mem_alloc <bch:byt>, `gc/sec` <dbl>
The package wraps a C++ implementation of Yen’s algorithm created by Yan Qi. The original C++ code is available on GitHub at yan-qi/k-shortest-paths-cpp-version.
This project is released with a Contributor Code of Conduct. By participating in this project, you agree to abide by its terms.
Feel free to submit feedback, bug reports, or feature suggestions here, or to submit fixes as pull requests.
Yen, Jin Y. 1971. “Finding theKShortest Loopless Paths in a Network.” Management Science 17 (11). Institute for Operations Research; the Management Sciences (INFORMS): 712–16. https://doi.org/10.1287/mnsc.17.11.712.