Topographic streamline analysis

The Streamlines project provides a suite of Python tools to study the structure of surface-water flow on large digital terrain models (DTMs). It takes the form of a Python 3 package, called slm, which uses numba and PyOpenCL to speed up processing, along with some demo Jupyter/IPython notebooks, and a custom set of open-source lidar DTM data files for the notebooks to work on.

Warning

This documentation is a work in progress. Recent deployments of OpenCL acceleration and splitting of the git repo are being incorporated now.

The novelty of the Streamlines method lies in how it computes the convergence and divergence of terrain driven flow. Instead of taking the usual, flow-routing approach at the pixel-scale, it traces topographic streamlines at a sub-pixel resolution and then processes them to compute useful properties such as drainage density and hillslope length distributions.

The streamlines are derived in the fluid dynamics sense by treating the gradient vector field of the terrain as a 2D pattern of conservative (zero divergence) steady flow. Flow trajectories are integrated across this vector field from (generally) every DTM grid cell not just in the downstream direction, but also upstream. The pixel-densities of up-streamlines and down-streamlines, together with their stream length distributions, are then computed at the DTM grid resolution. It turns out that important geometric and topological properties of the landscape are revealed by analyzing these properties. In particular, they help reveal: (1) channel heads and terminations (e.g., on alluvial fans), (2) channel density, and (3) hillslope length.

_images/Guadalupe_example1.png

A key innovation in Streamlines is a tool to overcome non-zero-divergence artefacts that arise when a DTM is preconditioned by having its sinks and pits filled. Pit filling is required of any DTM that is to be loaded into and processed by Streamlines – much as any flow routing software requires. The problem is that if the gradient vector field of a ‘corrected’ DTM is computed at the grid resolution, there will inevitably be places (particularly at single-diagonal-outflow pixels adjacent to filled pits) where gradient vectors appear to converge. If the DTM could be corrected at a sub-pixel resolution (such as by resolving an outflow channel), this convergence probably would not occur. Rather than attempting such sub-pixel trickery prior to calculating the gradient vector field at an oversampled resolution, algorithms (see the preprocess module) that overcome the artefacts are applied directly the gradient field at the DTM grid resolution. These algorithms require some tuning to overcome all such errors.

Technical stuff

Note

Mapping streamlines across a large DTM from every pixel is a cumbersome task, both in terms of processing and in memory use. Two big steps have been made to speed up the code so that, RAM permitting, very large DTMs can be analyzed.

The first step was to deploy Numba to accelerate the rate-limiting (2nd order Runge-Kutta) integration code. A boost in speed of around 135x was achieved. Given what is known of the efficiency of Numba, this boost brought the Python code within a factor of 1–5 of the speed we would see in a version hard-coded in C/C++ [1], [2], Fortran [3], etc.

The challenge with Numba is that it’s deceptively easy to apply: just slap on an @jit decorator to your slow function or method, and bingo! it goes faster. However, for the JIT compilation to really work its magic the function in question needs to be tranformed into a rather non-pythonic form: divested of objects, as strongly typed as possible, and stateless. The up-side to neutering the code like this is that it is excellent preparation for the next step…

The second step was (and is ongoing) to deploy PyOpenCL and to adapt the Runge-Kutta integration code into a GPU-compute form. This effort was not as tough as expected, since migration to Numba entailed conversion to a form of Python rendered it easy to translate into OpenCL. Currently, this GPU-accelerated version of Streamlines is about 1500x faster than the raw, Pythonic, original Python code. Further optimizations are expected to boost the speed further.

Getting started

Running the code

Streamlines workflow can be carried out in several ways:

Controlling the workflow

Streamlines computation and analysis is controlled by supplying workflow parameters and (optionally) command line arguments. The parameters are contained in JSON files that are loaded at the beginning of operations.

First, default parameter values are assigned by parsing the defaults.json file located in the streamlines package directory. Second, a ‘job’ parameters file is parsed that overrides the default values where needed. Third, key workflow parameters may be assigned when invoking Streamlines computation from the command line or with an explicit Python call from a script or notebook. Such options override the JSON-parsed values.

Here is an example sequence of files in which parameters are assigned and streamline computation, postprocessing and plotting are carried out:

More substantial demo analyses can be found in IndianCreek_Test2.ipynb and GuadalupeMtns1.ipynb. Rather bigger DTM analyses are illustrated in GuadalupeMtns2.ipynb, GuadalupeMtns3.ipynb and IndianCreek3.ipynb.