Seaway distance, or the shortest route between two points via water, is an important measure of risk for disease spread between aquaculture establishments. Salmon aquaculture, for example, is usually located in complex fjords, which means that there are numerous obstacles and the calculation of the seaway distance is not trivial. Commercial tools exist for the calculation of these distances, for example using ESRI’s ArcGIS Spatial Analyst extension, but there appear to be no open source tools available.

This post describes the algorithm for calculating seaway distances, and presents a number of functions that implement the calculations in the open source database PostgreSQL, using the PostGIS spatial extension and PGRouting.

# Theory

Shortest distance calculations are commonplace these days – everybody’s in-car GPS or phone can do them. They are based on having a network (usually of roads) with each segment (‘edge’) having a cost associated with it (either simply distance, or more commonly the time required to travel the segment, taking distance, speed limits, surface and bends into account). Once you have a network of routes you can take, there are various algorithms to calculate the shortest.

In PostgreSQL, with the PostGIS spatial extension, you can also install PGRouting which gives you access to a range of shortest-distance algorithms. The problem is that, at sea, there are no roads…

## Visibility graphs

In open water, over short distances, the problem is simple: the shortest distance between two aquaculture sites is a straight line between them (as determined on a projected map). Over longer distances, things get a little more complicated, because we need to calculate the *great circle* distance, but that is a discussion for another blog. For the moment, we will focus on short distances, such as may be important for the water-borne transmission of pathogens between aquaculture sites.

In coastal aquaculture, there are often many obstacles, as in the example of the image above, which shows a section of Chile’s coast line with aquaculture sites. Islands and fjords present numerous obstacles, so the shortest distance must take these into account.

The solution to this problem is the use of a *visibility graph*. In essence, we create a ‘road’ network (or really a seaway network) on the surface of the sea, which allows us to travel from any point to any other point in the area of interest, avoiding obstacles (islands, headlands etc).

The seaway network starts with our points of interest – the aquaculture sites. In the image above, they are marked as blue points. Each site can be linked to every other site with a straight line. If that line doesn’t cross land (or in other words, if the destination point is ‘visible’ from the point of origin) then the straight line is the shortest distance. However if it does intersect with land, we need to go around the obstacle. The straight line is dropped and an alternative path is needed.

The second step is to determine the possible routes around the obstacles. In a GIS map, the coastline is represented as a series of points. A visibility map is created by connecting every point to every other visible point. The algorithm to achieve this involves joining each point on the coastline to every other point on all coastlines in the area of interest with a straight line. If it crosses land, it is dropped, but otherwise it is retained as part of the network. For complex coastlines, with many points, this can generate a very large number of connecting lines (edges). In the image below, the grey lines

## Route optimisation

Once the visibility graph or sea ‘road network’ has been created, we can use PGRouting to calculate the shortest option between any two points. First, the network needs to be processed to connect up each line to a unique node so that PGRouting knows that we can travel from one line to the next.

Once this is done, we can use on of the in-built algorithms to calculate the shortest distance. This involves specifying the point of origin (one of our aquaculture sites) and the destination (another site), and using one of the route optimisation functions.

In this example, we use the Dijkstra algorithm, which is one of the earliest and still most popular approaches to finding the shortest distance.

# Implementation

This blog presents two functions implementing seaway distances using PostgreSQL. The implementation involves two functions written in PL/PGSQL: one to generate the topology of the network (the visibility graph) and the other to calculate the path and distance for the shortest route between two specified points, using the saved network.

## Visibility graphs

The sites or points of interest, and the coast line points can be considered as equivalent, as far as the generation of the visibility graph goes. The job is simply to draw a strait line between every pair of points with ‘visibility’ (that are not separated by an obstacle or in this case a land mass). In the function, a temporary table is created, and first populated with coastline vertices, and then with aquaculture site centroids. A cross join is then use to connect every point in the table to every other point, dropping those that intersect land.

Depending on the resolution of the coastal polygon map being used and the number of sites involved, the number of points (n) to link can be very very large. As each point is linked to every other point, the possible number of edges is n². This can pose significant processing challenges, so there is a need to optimise the process by minimsing the number of points to be joined.

### Search distance

In this example, the purpose of the seaway distance calculation was to assess the risk of transfer of pathogens from one farm to another. As free-floating pathogens (bacteria, viruses and parasites) survive for only a limited time, the distance they can cover is limited. It is therefore not interesting to identify the shortest distance between sites that are hundreds of kilometers apart – there is little risk of disease spread at this scale. Instead, the focus is on shorter distances, in the order of tens of kilometers. One way of optimising the algorithm is to impose a maximum search radius for connections between points in the visibility graph. In this example, we use a default maximum search radius of 40 km, but this can be changed as required.

This means that if the there are sites or coastal vertices more than 40 km from a particular site, the risk of disease spread over this distance is negligible, so it does not need to be included in the shortest distance calculation. This lead to the first major optimisation: it is not necessary to consider all the points on the land polygon, but only those that are within 40 km of an aquaculture site.

This was implemement by buffering all sites by 40 km (st_buffer) then merging the buffers to create a polygon (st_union), and intersecting the buffer polygon with the land mass poloygon (st_intersection) to create a new limited polygon with the relevant vertices.

### Buffering the coast line

The shortest route doesn’t actually pass through coastal vertices. If a ship were used, it would run aground at each headland. If waterborne transfer of pathogens is of interest, the currents in the shallow water close to land move much more slowly than those in water further from coast.

To capture this effect, the land polygon is buffered (st_buffer) using a default distance of 100 metres offshore. Unfortunately, buffering around a vertex (one point) produces a rounded buffer (many points). The buffer is simplified (st_simplifypreservetopology) to drop back to a single point.

## Concave points

When navigating around an island or other part of the coast line, we can be sure that the shortest route will never involve enter into a bay. Instead, we should go from headland to headland.

This means that coastal vertices that are concave never contribute to the shortest route. To optimising the algorithm, it is therefore possible to remove all the points that are on concave portions of the coast line.

Assessing concavity requires knowing the coordinates of the point, as well as the preceding and subsequent points. This is achieved using a PostgreSQL windows function lag(). Concavity is assessed using simple trigonometry.

# Function 1: Build the visibility graph and the topology

The first function creates and saves the topology ready for use to find the shortest distance. For a given geography and set of sites, it only needs to be run once. However, if new sites are added, it should be run again to update the visibility graph.

The function definition is:

create_visibility_graph( barriers text, sites text, max_distance double precision DEFAULT 40000, buffer_distance double precision DEFAULT 100 )

- barriers: the name of the polygon table for land masses
- sites: the name of the points table for sites
- max_distance: the search distance representing the longest edge
- buffer_distance: the land mass buffer distances

The **barriers** table contains one or more polygon features (or multipolygons) representing the coastline and islands. The **sites** table contains the site locations. Both tables must have at least two fields:

**gid**: a unique geographic id field**geom**: a geometry field in a defined SRID (which will be converted to projected SRID 3857)- For the barriers table, type polygon or multipolygon
- For the sites table, type point

An example of usage, using the default buffer and search distances might be:

select create_visibility_graph('adm0', 'farm_locations');

**NOTE**: With complex geography or large numbers of sites, this may take a long time to run . For example, the dataset used for this example has about 3000 coastline vertices and 4000 aquaculture sites. Generating the visibility graph using this function takes about 5 minutes on a reasonably powerful computer.

## Function 2: Shortest route

The second function is used once the visibility table and topography have been created. The function takes two parameters:

sea_way( site1 bigint, site2 bigint )

These are the ID of the two sites to calculate the seaway distance between. The function returns a record with two values:

- geom: a geometry line value, which is the shortest route between the two sites, and
- distance: a float value which is the distance between the two sites

This function is much faster to run, as the edge table and topology has already been created. The time required is proportional to the complexity of the path (which usually is related to the distance between points), but normally runs within a second or two for a single distance calculation.

This can be used in various ways. For example, to simply find the distance between two sites, with ID values 392 and 492 in the vertices table:

select distance from sea_way(392,492);

If we want to display the route as well, using a mapping interface (for example, using QGIS’s Database Manager’s query interface) you would need to return a unique ID as well as the geometry value:

select 1 as gid, -- create a unique ID column * -- return both elements of the record (geom and distance) from sea_way (392, 492);

In order to find sites that are less then 20 km away by sea, we need to calculate the distances from the source site (392, for example a site suffering a disease outbreak) to all other possible sites. To make the calculation as fast as possible, it is much more efficient to test only those sites within a straight-line radius of 20 km from the affected site. This is a much faster spatial calculation and saves testing the distance to sites that cannot be within 20 km by sea.

select nearby.gid, -- return the vertex ID for matching sites (sea_way(1065, nearby.gid)).* -- return the route and distance from ( select distinct v.id as gid from visibility_graph_vertices_pgr v join visibility_graph g on g.source = v.id and g.sid is not null join visibility_graph_vertices_pgr s on s.id = 1065 and st_distance(v.the_geom, s.the_geom) < 20000 and v.id <> s.id ) as nearby;

The result is shown below, with the 20 km buffer around the affected site. The route to sites with a seaway distance of less than 20 km are shown in red, while those that are greater are shown in blue.