Advertisements

Archive

Author Archives: underdark

Recently there has been some buzz on Twitter about a new moving object database (MOD) called MobilityDB that builds on PostgreSQL and PostGIS (Zimányi et al. 2019). The MobilityDB Github repo has been published in February 2019 but according to the following presentation at PgConf.Russia 2019 it has been under development for a few years:

Of course, moving object databases have been around for quite a while. The two most commonly cited MODs are HermesDB (Pelekis et al. 2008) which comes as an extension for either PostgreSQL or Oracle and is developed at the University of Piraeus and SECONDO (de Almeida et al. 2006) which is a stand-alone database system developed at the Fernuniversität Hagen. However, both MODs remain at the research prototype level and have not achieved broad adoption.

It will be interesting to see if MobilityDB will be able to achieve the goal they have set in the title of Zimányi et al. (2019) to become “a mainstream moving object database system”. It’s promising that they are building on PostGIS and using its mature spatial analysis functionality instead of reinventing the wheel. They also discuss why they decided that PostGIS trajectories (which I’ve written about in previous posts) are not the way to go:

However, the presentation does not go into detail whether there are any straightforward solutions to visualizing data stored in MobilityDB.

According to the Github readme, MobilityDB runs on Linux and needs PostGIS 2.5. They also provide an online demo as well as a Docker container with MobilityDB and all its dependencies. If you give it a try, I would love to hear about your experiences.

References

  • de Almeida, V. T., Guting, R. H., & Behr, T. (2006). Querying moving objects in secondo. In 7th International Conference on Mobile Data Management (MDM’06) (pp. 47-47). IEEE.
  • Pelekis, N., Frentzos, E., Giatrakos, N., & Theodoridis, Y. (2008). HERMES: aggregative LBS via a trajectory DB engine. In Proceedings of the 2008 ACM SIGMOD international conference on Management of data (pp. 1255-1258). ACM.
  • Zimányi, E., Sakr, M., Lesuisse, A., & Bakli, M. (2019). MobilityDB: A Mainstream Moving Object Database System. In Proceedings of the 16th International Symposium on Spatial and Temporal Databases (pp. 206-209). ACM.
Advertisements

In the previous post, I showed how Folium can be used to create interactive maps of GeoPandas GeoDataFrames. Today’s post continues this theme. Specifically, it compares Folium to another dataviz library called hvplot. hvplot also recently added support for GeoDataFrames, so it’s interesting to see how these different solutions compare.

Minimum viable

The following snippets show the minimum code I found to put a GeoDataFrame of Points onto a map with either Folium or hvplot.

Folium does not automatically zoom to the data extent and I didn’t find a way to add the whole GeoDataFrame of Points without looping through the rows individually:

Hvplot on the other hand registers the hvplot function directly with the GeoDataFrame. This makes it as convenient to use as the original GeoPandas plot function. It also zooms to the data extent:

Standard interaction and zoom to area of interest

The following snippets ensure that the map is set to a useful extent and the map tools enable panning and zooming.

With Folium, we have to set the map center and the zoom. The map tools are Leaflet defaults, so panning and zooming work as expected:

Since hvplot does not come with mouse wheel zoom enabled by default, we need to set that:

Color by attribute

Finally, for many maps, we want to show the point location as well as an attribute value.

To create a continuous color ramp for a numeric value, we can use branca.colormap to define the marker fill color:

In hvplot, it is sufficient to specify the attribute of interest:

I’m pretty impressed with hvplot. The integration with GeoPandas is very smooth. Just don’t forget to set the geo=True parameter if you want to plot lat/lon geometries.

Folium seems less straightforward for this use case. Maybe I missed some option similar to the Choropleth function that I showed in the previous post.

GeoPandas makes it easy to create basic visualizations of GeoDataFrames:

However, if we want interactive plots, we need additional libraries. Folium (which is built on Leaflet) is a great option. However, all examples for plotting GeoDataFrames that I found focused on point or polygon data. So here is what I found to work for GeoDataFrames of LineStrings:

First, some imports:

import pandas as pd
import geopandas
import folium

Loading the data:

graph = geopandas.read_file('data/population_test-routes-geom.csv')
graph.crs = {'init' :'epsg:4326'}

Creating the map using folium.Choropleth:

m = folium.Map([48.2, 16.4], zoom_start=10)

folium.Choropleth(
    graph[graph.geometry.length>0.001],
    line_weight=3,
    line_color='blue'
).add_to(m)

m

I also tried using folium.PolyLine which seemed like the more obvious choice but does not seem to accept GeoDataFrames as input. Instead, it expects a list of coordinate pairs and of course it expects them to be in the opposite order that Shapely.LineString.coords provides … Oh the joys of geodata!

In any case, I had to limit the number of features that get plotted because Folium refuses to plot all 8778 features at once. I decided to filter by line length because drawing really short lines is pointless for my overview visualization anyway.

Last week, I had the pleasure to give a movement data analysis workshop at the OpenGeoHub summer school at the University of Münster in Germany. The workshop materials consist of three Jupyter notebooks that have been designed to also support self-study outside of a workshop setting. So you can try them out as well!

All materials are available on Github:

  • Tutorial 0 provides an introduction to the MovingPandas Trajectory class.
  • Tutorials 1 and 2 provide examples with real-world datasets covering one day of ship movement near Gothenburg and multiple years of gull migration, respectively.

Here’s a quick preview of the bird migration data analysis tutorial (click for full size):

Tutorial 2: Bird migration data analysis

You can run all three Jupyter notebooks online using MyBinder (no installations required).

Alternatively or if you want to dig deeper: installation instructions are available on movingpandas.org

The OpenGeoHub summer school this year had a strong focus on spatial analysis with R and GRASS (sometimes mixing those two together). It was great to meet @mdsumner (author of R trip) and @edzerpebesma (author of R trajectories) for what might have well been the ultimate movement data libraries geek fest. In the ultimate R / Python cross-over,  0_getting_started.Rmd

Both talks and workshops have been recorded. Here’s the introduction:

and this is the full workshop recording:

In the past, network analysis capabilities in QGIS were rather limited or not straight-forward to use. This has changed! In QGIS 3.x, we now have a wide range of network analysis tools, both for use case where you want to use your own network data, as well as use cases where you don’t have access to appropriate data or just prefer to use an existing service.

This blog post aims to provide an overview of the options:

  1. Based on local network data
    1. Default QGIS Processing network analysis tools
    2. QNEAT3 plugin
  2. Based on web services
    1. Hqgis plugin (HERE)
    2. ORS Tools plugin (openrouteservice.org)
    3. TravelTime platform plugin (TravelTime platform)

All five options provide Processing toolbox integration but not at the same level.

If you are a regular reader of this blog, you’re probably also aware of the pgRoutingLayer plugin. However, I’m not including it in this list due to its dependency on PostGIS and its pgRouting extension.

Processing network analysis tools

The default Processing network analysis tools are provided out of the box. They provide functionality to compute least cost paths and service areas (distance or time) based on your own network data. Inputs can be individual points or layers of points:

The service area tools return reachable edges and / or nodes rather than a service area polygon:

QNEAT3 plugin

The QNEAT3 (short for Qgis Network Analysis Toolbox 3) Plugin aims to provide sophisticated QGIS Processing-Toolbox algorithms in the field of network analysis. QNEAT3 is integrated in the QGIS3 Processing Framework. It offers algorithms that range from simple shortest path solving to more complex tasks like Iso-Area (aka service areas, accessibility polygons) and OD-Matrix (Origin-Destination-Matrix) computation.

QNEAT3 is an alternative for use case where you want to use your own network data.

For more details see the QNEAT3 documentation at: https://root676.github.io/index.html

Hqgis plugin

Access the HERE API from inside QGIS using your own HERE-API key. Currently supports Geocoding, Routing, POI-search and isochrone analysis.

Hqgis currently does not expose all its functionality to the Processing toolbox:

Instead, the full set of functionality is provided through the plugin GUI:

This plugin requires a HERE API key.

ORS Tools plugin

ORS Tools provides access to most of the functions of openrouteservice.org, based on OpenStreetMap. The tool set includes routing, isochrones and matrix calculations, either interactive in the map canvas or from point files within the processing framework. Extensive attributes are set for output files, incl. duration, length and start/end locations.

ORS Tools is based on OSM data. However, using this plugin still requires an openrouteservice.org API key.

TravelTime platform plugin

This plugin adds a toolbar and processing algorithms allowing to query the TravelTime platform API directly from QGIS. The TravelTime platform API allows to obtain polygons based on actual travel time using several transport modes rather, allowing for much more accurate results than simple distance calculations.

The TravelTime platform plugin requires a TravelTime platform API key.

For more details see: https://blog.traveltimeplatform.com/isochrone-qgis-plugin-traveltime

Today’s post continues where “Why you should be using PostGIS trajectories” leaves off. It’s the result of a collaboration with Eva Westermeier. I had the pleasure to supervise her internship at AIT last year and also co-supervised her Master’s thesis [0] on the topic of enriching trajectories with information about their geographic context.

Context-aware analysis of movement data is crucial for different domains and applications, from transport to ecology. While there is a wealth of data, efficient and user-friendly contextual trajectory analysis is still hampered by a lack of appropriate conceptual approaches and practical methods. (Westermeier, 2018)

Part of the work was focused on evaluating different approaches to adding context information from vector datasets to trajectories in PostGIS. For example, adding land cover context to animal movement data or adding information on anchoring and harbor areas to vessel movement data.

Classic point-based model vs. line-based model

The obvious approach is to intersect the trajectory points with context data. This is the classic point data model of contextual trajectories. It’s straightforward to add context information in the point-based model but it also generates large numbers of repeating annotations. In contrast, the line data model using, for example, PostGIS trajectories (LinestringM) is more compact since trajectories can be split into segments at context borders. This creates one annotation per segment and the individual segments are convenient to analyze (as described in part #12).

Spatio-temporal interpolation as provided by the line data model offers additional advantages for the analysis of annotated segments. Contextual segments start and end at the intersection of the trajectory linestring with context polygon borders. This means that there are no gaps like in the point-based model. Consequently, while the point-based model systematically underestimates segment length and duration, the line-based approach offers more meaningful segment length and duration measurements.

Schematic illustration of a subset of an annotated trajectory in two context classes, a) systematic underestimation of length or duration in the point data model, b) full length or duration between context polygon borders in the line data model (source: Westermeier (2018))

Another issue of the point data model is that brief context changes may be missed or represented by just one point location. This makes it impossible to compute the length or duration of the respective context segment. (Of course, depending on the application, it can be desirable to ignore brief context changes and make the annotation process robust towards irrelevant changes.)

Schematic illustration of context annotation for brief context changes, a) and b)
two variants for the point data model, c) gapless annotation in the line data model (source: Westermeier (2018) based on Buchin et al. (2014))

Beyond annotations, context can also be considered directly in an analysis, for example, when computing distances between trajectories and contextual point objects. In this case, the point-based approach systematically overestimates the distances.

Schematic illustration of distance measurement from a trajectory to an external
object, a) point data model, b) line data model (source: Westermeier (2018))

The above examples show that there are some good reasons to dump the classic point-based model. However, the line-based model is not without its own issues.

Issues

Computing the context annotations for trajectory segments is tricky. The main issue is that ST_Intersection drops the M values. This effectively destroys our trajectories! There are ways to deal with this issue – and the corresponding SQL queries are published in the thesis (p. 38-40) – but it’s a real bummer. Basically, ST_Intersection only provides geometric output. Therefore, we need to reconstruct the temporal information in order to create usable trajectory segments.

Finally, while the line-based model is well suited to add context from other vector data, it is less useful for context data from continuous rasters but that was beyond the scope of this work.

Conclusion

After the promising results of my initial investigations into PostGIS trajectories, I was optimistic that context annotations would be a straightforward add-on. The line-based approach has multiple advantages when it comes to analyzing contextual segments. Unfortunately, generating these contextual segments is much less convenient and also slower than I had hoped. Originally, I had planned to turn this work into a plugin for the Processing toolbox but the results of this work motivated me to look into other solutions. You’ve already seen some of the outcomes in part #20 “Trajectools v1 released!”.

References

[0] Westermeier, E.M. (2018). Contextual Trajectory Modeling and Analysis. Master Thesis, Interfaculty Department of Geoinformatics, University of Salzburg.


This post is part of a series. Read more about movement data in GIS.

If you’ve been following my posts, you’ll no doubt have seen quite a few flow maps on this blog. This tutorial brings together many different elements to show you exactly how to create a flow map from scratch. It’s the result of a collaboration with Hans-Jörg Stark from Switzerland who collected the data.

The flow data

The data presented in this post stems from a survey conducted among public transport users, especially commuters (available online at: https://de.surveymonkey.com/r/57D33V6). Among other questions, the questionnair asks where the commuters start their journey and where they are heading.

The answers had to be cleaned up to correct for different spellings, spelling errors, and multiple locations in one field. This cleaning and the following geocoding step were implemented in Python. Afterwards, the flow information was aggregated to count the number of nominations of each connection between different places. Finally, these connections (edges that contain start id, destination id and number of nominations) were stored in a text file. In addition, the locations were stored in a second text file containing id, location name, and co-ordinates.

Why was this data collected?

Besides travel demand, Hans-Jörg’s survey also asks participants about their coffee consumption during train rides. Here’s how he tells the story behind the data:

As a nearly daily commuter I like to enjoy a hot coffee on my train rides. But what has bugged me for a long time is the fact the coffee or hot beverages in general are almost always served in a non-reusable, “one-use-only-and-then-throw-away” cup. So I ended up buying one of these mostly ugly and space-consuming reusable cups. Neither system seem to satisfy me as customer: the paper-cup produces a lot of waste, though it is convenient because I carry it only when I need it. With the re-usable cup I carry it all day even though most of the time it is empty and it is clumsy and consumes the limited space in bag.

So I have been looking for a system that gets rid of the disadvantages or rather provides the advantages of both approaches and I came up with the following idea: Installing a system that provides a re-usable cup that I only have with me when I need it.

In order to evaluate the potential for such a system – which would not only imply a material change of the cups in terms of hardware but also introduce some software solution with the convenience of getting back the necessary deposit that I pay as a customer and some software-solution in the back-end that handles all the cleaning, distribution to the different coffee-shops and managing a balanced stocking in the stations – I conducted a survey

The next step was the geographic visualization of the flow data and this is where QGIS comes into play.

The flow map

Survey data like the one described above is a common input for flow maps. There’s usually a point layer (here: “nodes”) that provides geographic information and a non-spatial layer (here: “edges”) that contains the information about the strength or weight of a flow between two specific nodes:

The first step therefore is to create the flow line features from the nodes and edges layers. To achieve our goal, we need to join both layers. Sounds like a job for SQL!

More specifically, this is a job for Virtual Layers: Layer | Add Layer | Add/Edit Virtual Layer

SELECT StartID, DestID, Weight, 
       make_line(a.geometry, b.geometry)
FROM edges
JOIN nodes a ON edges.StartID = a.ID
JOIN nodes b ON edges.DestID = b.ID
WHERE a.ID != b.ID 

This SQL query joins the geographic information from the nodes table to the flow weights in the edges table based on the node IDs. In the last line, there is a check that start and end node ID should be different in order to avoid zero-length lines.

By styling the resulting flow lines using data-driven line width and adding in some feature blending, it’s possible to create some half decent maps:

However, we can definitely do better. Let’s throw in some curved arrows!

The arrow symbol layer type automatically creates curved arrows if the underlying line feature has three nodes that are not aligned on a straight line.

Therefore, to turn our straight lines into curved arrows, we need to add a third point to the line feature and it has to have an offset. This can be achieved using a geometry generator and the offset_curve() function:

make_line(
   start_point($geometry),
   centroid(
      offset_curve(
         $geometry, 
         length($geometry)/-5.0
      )
   ),
   end_point($geometry)
)

Additionally, to achieve the effect described in New style: flow map arrows, we extend the geometry generator to crop the lines at the beginning and end:

difference(
   difference(
      make_line(
         start_point($geometry),
         centroid(
            offset_curve(
               $geometry, 
               length($geometry)/-5.0
            )
         ),
	 end_point($geometry)
      ),
      buffer(start_point($geometry), 0.01)
   ),
   buffer(end_point( $geometry), 0.01)
)

By applying data-driven arrow and arrow head sizes, we can transform the plain flow map above into a much more appealing map:

The two different arrow colors are another way to emphasize flow direction. In this case, orange arrows mark flows to the west, while blue flows point east.

CASE WHEN
 x(start_point($geometry)) - x(end_point($geometry)) < 0
THEN
 '#1f78b4'
ELSE
 '#ff7f00'
END

Conclusion

As you can see, virtual layers and geometry generators are a powerful combination. If you encounter performance problems with the virtual layer, it’s always possible to make it permanent by exporting it to a file. This will speed up any further visualization or analysis steps.

%d bloggers like this: