Movement data in GIS #3: visualizing massive trajectory datasets

In the fist two parts of the Movement Data in GIS series, I discussed modeling trajectories as LinestringM features in PostGIS to overcome some common issues of movement data in GIS and presented a way to efficiently render speed changes along a trajectory in QGIS without having to split the trajectory into shorter segments.

While visualizing individual trajectories is important, the real challenge is trying to visualize massive trajectory datasets in a way that enables further analysis. The out-of-the-box functionality of GIS is painfully limited. Except for some transparency and heatmap approaches, there is not much that can be done to help interpret “hairballs” of trajectories. Luckily researchers in visual analytics have already put considerable effort into finding solutions for this visualization challenge. The approach I want to talk about today is by Andrienko, N., & Andrienko, G. (2011). Spatial generalization and aggregation of massive movement data. IEEE Transactions on visualization and computer graphics, 17(2), 205-219. and consists of the following main steps:

Extracting characteristic points from the trajectories
Grouping the extracted points by spatial proximity
Computing group centroids and corresponding Voronoi cells
Dividing trajectories into segments according to the Voronoi cells
Counting transitions from one cell to another

The authors do a great job at describing the concepts and algorithms, which made it relatively straightforward to implement them in QGIS Processing. So far, I’ve implemented the basic logic but the paper contains further suggestions for improvements. This was also my first pyQGIS project that makes use of the measurement value support in the new geometry engine. The time information stored in the m-values is used to detect stop points, which – together with start, end, and turning points – make up the characteristic points of a trajectory.

The following animation illustrates the current state of the implementation: First the “hairball” of trajectories is rendered. Then we extract the characteristic points and group them by proximity. The big black dots are the resulting group centroids. From there, I skipped the Voronoi cells and directly counted transitions from “nearest to centroid A” to “nearest to centroid B”.

From thousands of individual trajectories to a generalized representation of overall movement patterns (data credits: GeoLife project, map tiles: Stamen, map data: OSM)

The resulting visualization makes it possible to analyze flow strength as well as directionality. I have deliberately excluded all connections with a count below 10 transitions to reduce visual clutter. The cell size / distance between point groups – and therefore the level-of-detail – is one of the input parameters. In my example, I used a target cell size of approximately 2km. This setting results in connections which follow the major roads outside the city center very well. In the city center, where the road grid is tighter, trajectories on different roads mix and the connections are less clear.

Since trajectories in this dataset are not limited to car trips, it is expected to find additional movement that is not restricted to the road network. This is particularly noticeable in the dense area in the west where many slow trajectories – most likely from walking trips – are located. The paper also covers how to ensure that connections are limited to neighboring cells by densifying the trajectories before computing step 4.

Running the scripts for over 18,000 trajectories requires patience. It would be worth evaluating if the first three steps can be run with only a subsample of the data without impacting the results in a negative way.

One thing I’m not satisfied with yet is the way to specify the target cell size. While it’s possible to measure ellipsoidal distances in meters using QgsDistanceArea (irrespective of the trajectory layer’s CRS), the initial regular grid used in step 2 in order to group the extracted points has to be specified in the trajectory layer’s CRS units – quite likely degrees. Instead, it may be best to transform everything into an equidistant projection before running any calculations.

It’s good to see that PyQGIS enables us to use the information encoded in PostGIS LinestringM features to perform spatio-temporal analysis. However, working with m or z values involves a lot of v2 geometry classes which work slightly differently than their v1 counterparts. It certainly takes some getting used to. This situation might get cleaned up as part of the QGIS 3 API refactoring effort. If you can, please support work on QGIS 3. Now is the time to shape the PyQGIS API for the following years!

The source code for this experiment is available on GitHub.

This post is part of a series. Read more about movement data in GIS.

12 comments

Daniel said: 2016-11-1021:51

This is just great. I’ve worked with the Andrienkos before, and it is good to know that we can have their approach implemented in a GIS-friendly environment. I’m working now with bike trip datasets and I’d like to test your implementation. It is possible to use your implementation?
- underdark said: 2016-11-2021:22
  
  Hi Daniel! Sorry I meant to write a reply much earlier but didn’t get around to it.
  I haven’t published the scripts yet. They will only work for lines with m-values. So it would be good to have a way to restrict the inputs layers to this geometry type. If you are interested in getting involved, I’ll try to find some time next week to publish the scripts.
  - roger said: 2017-10-1316:31
    
    Have you published the scripts, Can I get a copy?
  - underdark said: 2017-10-1319:43
    
    Hi Roger, here you go: https://anitagraser.com/2017/10/13/movement-data-in-gis-extra-trajectory-generalization-code-and-sample-data/
Florian Hoedt said: 2016-11-2511:18

Hi Anita, How were the ingoing / outgoing half- arrows created? Those are really nice. Greetings, Florian
- underdark said: 2016-11-2512:42
  
  Hi Florian, the arrow line symbol layer has an option to make the arrow heads one-sided.
  - Florian Hoedt said: 2016-12-2910:33
    
    Hi Anita,
    I have found the one-sided versions thanks for that tip. Do you remember my OpenSource GIS Assignment with commuter movement (https://www.nextplacelab.de/httpapps/drive-like-a-commuter/)? I would like to create print maps with multiple municipalities showing there in- outgoing commuter flows. I thought about using a geometry generator like you have used it in here (https://anitagraser.com/2016/12/18/details-of-good-flow-maps/) combined with the one-sided colour coded arrows of this post’s example. But it´s so much data that it will get cluttered fast. Do you have any hints for me to solve this? And are there any geometry generator tutorials/ posts/ books you could recommend?
    greetings, Florian
  - underdark said: 2016-12-3014:57
    
    Hi Florian!
    Sure I remember, great work!
    Concerning clutter, have a look at my latest post https://anitagraser.com/2016/12/30/new-style-flow-map-arrows/. Another approach, which I’m still working on, is edge bundling but that’s still a work in process.
    Happy new year!
roger said: 2017-10-1710:17

Hi Anita,
Thanks for the scripts! But I still have trouble styling the arrows, can you provide a bit more info on how you color and resize those arrows?
Thanks!
- underdark said: 2017-10-1712:50
  
  Are you familiar with the data-defined styling functions in QGIS? The scripts compute flow strength and store it in a field of the flow layer. This field is then used in data-defined styling. (The color is also data-defined: direction mapped to a rainbow color ramp.)
  - roger said: 2017-10-1715:25
    
    Thanks for the tips. I tried:
    scale_linear(“COUNT”,1,43,1,10) for scaling the size of the arrow.
    ramp_color(‘Spectral’,scale_linear(“COUNT”,1,43,0,1)) for the color ramp.
    I does not look as good as yours, but it works.Any suggestions for improvement?
    Thanks!!!
  - underdark said: 2017-10-1718:40
    
    A maximum width of 10 is quite high. I’d go with something like scale_linear(“COUNT”,1,43,0.01,2)