Tag Archives: Python

The latest v0.9 release is now available from conda-forge.

This release contains some really cool new algorithms:

The Kalman filter in action on the Geolife sample: smoother, less jiggly trajectories.
Top-Down Time Ratio generalization aka trajectory compression in action: reduces the number of positions along the trajectory without altering the spatiotemporal properties, such as speed, too much.

These new algorithms were contributed by Lyudmil Vladimirov and George S. Theodoropoulos.

Behind the scenes, Ray Bell took care of moving testing from Travis to Github Actions, and together we worked through the steps to ensure that the source code is now properly linted using flake8 and black.

Being able to work with so many awesome contributors has made this release really special for me. It’s great to see the project attracting more developer interest.

As always, all tutorials are available from the movingpandas-examples repository and on MyBinder:

The latest v0.8 release is now available from conda-forge.

New features include:

  • More convenient creation of TrajectoryCollection objects from (Geo)DataFrames (#137)
  • Support for different geometry column names (#112)

Last week, I also had the pleasure to speak about MovingPandas at Carto’s Spatial Data Science Conference SDSC21:

As always, all tutorials are available from the movingpandas-examples repository and on MyBinder:

The latest v0.7 release is now available from conda-forge.

New features include:

As always, all tutorials are available from the movingpandas-examples repository and on MyBinder:

After writing “Towards a template for exploring movement data” last year, I spent a lot of time thinking about how to develop a solid approach for movement data exploration that would help analysts and scientists to better understand their datasets. Finally, my search led me to the excellent paper “A protocol for data exploration to avoid common statistical problems” by Zuur et al. (2010). What they had done for the analysis of common ecological datasets was very close to what I was trying to achieve for movement data. I followed Zuur et al.’s approach of a exploratory data analysis (EDA) protocol and combined it with a typology of movement data quality problems building on Andrienko et al. (2016). Finally, I brought it all together in a Jupyter notebook implementation which you can now find on Github.

There are two options for running the notebook:

  1. The repo contains a Dockerfile you can use to spin up a container including all necessary datasets and a fitting Python environment.
  2. Alternatively, you can download the datasets manually and set up the Python environment using the provided environment.yml file.

The dataset contains over 10 million location records. Most visualizations are based on Holoviz Datashader with a sprinkling of MovingPandas for visualizing individual trajectories.

Point density map of 10 million location records, visualized using Datashader

Line density map for detecting gaps in tracks, visualized using Datashader

Example trajectory with strong jitter, visualized using MovingPandas & GeoViews


I hope this reference implementation will provide a starting point for many others who are working with movement data and who want to structure their data exploration workflow.

If you want to dive deeper, here’s the paper:

[1] Graser, A. (2021). An exploratory data analysis protocol for identifying problems in continuous movement data.Β Journal of Location Based Services. doi:10.1080/17489725.2021.1900612.

(If you don’t have institutional access to the journal, the publisher provides 50 free copies using this link. Once those are used up, just leave a comment below and I can email you a copy.)


This post is part of a series. Read more aboutΒ movement data in GIS.

Data sourcing and preparation is one of the most time consuming tasks in many spatial analyses. Even though the Austrian platform already provides a central catalog, the individual datasets still vary considerably in their accessibility or readiness for use.

OGD.AT Lab is a new repository collecting Jupyter notebooks for working with Austrian Open Government Data and other auxiliary open data sources. The notebooks illustrate different use cases, including so far:

  1. Accessing geodata from the city of Vienna WFS
  2. Downloading environmental data (heat vulnerability and air quality)
  3. Geocoding addresses and getting elevation information
  4. Exploring urban movement data

Data processing and visualization are performed using Pandas, GeoPandas, and Holoviews. GeoPandas makes it straighforward to use data from WFS. Therefore, OGD.AT Lab can provide one universal gdf_from_wfs() function which takes the desired WFS layer as an argument and returns a GeoPandas.GeoDataFrame that is ready for analysis:

Many other datasets are provided as CSV files which need to be joined with spatial datasets to use them in spatial analysis. For example, the “Urban heat vulnerability index” dataset which needs to be joined to statistical areas.


Another issue with many CSV files is that they use German number formatting, where commas are used as a decimal separater instead of dots:

Besides file access, there are also open services provided by other developers, for example, Manfred Egger developed an elevation service that provides elevation information for any point in Austria. In combination with geocoding services, such as Nominatim, this makes is possible to, for example, find the elevation for any address in Austria:

Last but not least, the first version of the mobility notebook showcases open travel time data provided by Uber Movement:

The utility functions for data access included in this repository will continue to grow as new data sources are included. Eventually, it may make sense to extract the data access function into a dedicated library, similar to geofi (Finland) or geobr (Brazil).

If you’re aware of any interesting open datasets or services that should be included in OGD.AT, feel free to reach out here or on Github through the issue tracker or by providing a pull request.

In the previous post, we explored how hvPlot and Datashader can help us to visualize large CSVs with point data in interactive map plots. Of course, the spatial distribution of points usually only shows us one part of the whole picture. Today, we’ll therefore look into how to explore other data attributes by linking other (non-spatial) plots to the map.

This functionality, referred to as “linked brushing” or “crossfiltering” is under active development and the following experiment was prompted by a recent thread on Twitter launched by @plotlygraphs announcement of HoloViews 1.14:

Turns out these features are not limited to plotly but can also be used with Bokeh and hvPlot:

Like in the previous post, this demo uses a Pandas DataFrame with 12 million rows (and HoloViews 1.13.4).

In addition to the map plot, we also create a histogram from the same DataFrame:

map_plot = df.hvplot.scatter(x='x', y='y', datashade=True, height=300, width=400)
hist_plot = df.where((df.SOG>0) & (df.SOG<50)).hvplot.hist("SOG",  bins=20, width=400, height=200) 

To link the two plots, we use HoloViews’ link_selections function:

from holoviews.selection import link_selections
linked_plots = link_selections(map_plot + hist_plot)

That’s all! We can now perform spatial filters in the map and attribute value filters in the histogram and the filters are automatically applied to the linked plots:

Linked brushing demo using ship movement data (AIS): filtering records by speed (SOG) reveals spatial patterns of fast and slow movement.

You’ve probably noticed that there is no background map in the above plot. I had to remove the background map tiles to get rid of an error in Holoviews 1.13.4. This error has been fixed in 1.14.0 but I ran into other issues with the datashaded Scatterplot.

It’s worth noting that not all plot types support linked brushing. For the complete list, please refer to

Even with all their downsides, CSV files are still a common data exchange format – particularly between disciplines with different tech stacks. Indeed, “How to Specify Data Types of CSV Columns for Use in QGIS” (originally written in 2011) is still one of the most popular posts on this blog. QGIS continues to be quite handy for visualizing CSV file contents. However, there are times when it’s just not enough, particularly when the number of rows in the CSV is in the range of multiple million. The following example uses a 12 million point CSV:

To give you an idea of the waiting times in QGIS, I’ve run the following script which loads and renders the CSV:

from datetime import datetime

def get_time():
    t2 =
    print('Done :)')

canvas = iface.mapCanvas()

print('Starting ...')

t0 =

print('Loading CSV ...')

uri = "file:///E:/Geodata/AISDK/raw_ais/aisdk_20170701.csv?type=csv&amp;xField=Longitude&amp;yField=Latitude&amp;crs=EPSG:4326&amp;"
vlayer = QgsVectorLayer(uri, "layer name you like", "delimitedtext")

t1 =
print(t1 - t0)

print('Rendering ...')


The script output shows that creating the vector layer takes 02:39 minutes and rendering it takes over 05:10 minutes:

Starting ...
2020-12-06 12:35:56.266002
Loading CSV ...
2020-12-06 12:38:35.565332
Rendering ...
2020-12-06 12:43:45.637504
Done :)

Rendered CSV file in QGIS

Panning and zooming around are no fun either since rendering takes so long. Changing from a single symbol renderer to, for example, a heatmap renderer does not improve the rendering times. So we need a different solutions when we want to efficiently explore large point CSV files.

The Pandas data analysis library is well-know for being a convenient tool for handling CSVs. However, it’s less clear how to use it as a replacement for desktop GIS for exploring large CSVs with point coordinates. My favorite solution so far uses hvPlot + HoloViews + Datashader to provide interactive Bokeh plots in Jupyter notebooks.

hvPlot provides a high-level plotting API built on HoloViews that provides a general and consistent API for plotting data in (Geo)Pandas, xarray, NetworkX, dask, and others. (Image source:

But first things first! Loading the CSV as a Pandas Dataframe takes 10.7 seconds. Pandas’ default plotting function (based on Matplotlib), however, takes around 13 seconds and only produces a static scatter plot.

Loading and plotting the CSV with Pandas

hvPlot to the rescue!

We only need two more steps to get faster and interactive map plots (plus background maps!): First, we need to reproject the lat/lon values. (There’s a warning here, most likely since some of the input lat/lon values are invalid.) Then, we replace plot() with hvplot() and voilΓ :

Plotting the CSV with Datashader

As you can see from the above GIF, the whole process barely takes 2 seconds and the resulting map plot is interactive and very responsive.

12 million points are far from the limit. As long as the Pandas DataFrame fits into memory, we are good and when the datasets get bigger than that, there are Dask DataFrames. But that’s a story for another day.

The latest v0.5 release is now available from conda-forge.

New features include:

As always, all tutorials are available on MyBinder:

Detected stops (left) and trajectory split at stops (right)

Podcasts have become huge. I’m an avid listener of podcasts myself. I particularly enjoy formats that take the time to talk about unconventional topics in detail.

My first podcast experience was on the QGIS podcast hosted by Tim Sutton in 2014. Unfortunately, it seems like the podcast episodes are not online anymore.

Recently, I had the pleasure to join the MapScaping Podcast by Daniel O’Donohue to talk about Python for Geospatial:Β 

Other guests Daniel has already interviewed include:

Another geospatial podcast I really enjoy is The Mappyist Hour by Silas and Todd. Unfortunately, it’s a bit silent there now but it’s definitely worth to listen into their episode archive. One of my favorites is Episode 9 where Linda Stevens (Hecht) discusses her career at ESRI, the future of GIS, and the role of Open Source Spatial in that future:

If you listen to and want to recommend other spatial podcasts, please share them in the comments!

This post introduces Holoviz Panel, a library that makes it possible to create really quick dashboards in notebook environments as well as more sophisticated custom interactive web apps and dashboards.

The following example shows how to use Panel to explore a dataset (a trajectory collection in this case) and different parameter settings (relating to trajectory generalization). All the Panel code we need is a dict that defines the parameters that we want to explore. Then we can use Panel’s interact function to automatically generate a dashboard for our custom plotting function:

import panel as pn

kw = dict(traj_id=(1, len(traj_collection)), 
          tolerance=(10, 100, 10), 
          generalizer=['douglas-peucker', 'min-distance'])
pn.interact(plot_generalized, **kw)

Click to view the resulting dashboard in full resolution:

The plotting function uses the parameters to generate a Holoviews plot. First it fetches a specific trajectory from the trajectory collection. Then it generalizes the trajectory using the specified parameter settings. As you can see, we can easily combine maps and other plots to visualize different aspects of the data:

def plot_generalized(traj_id=1, tolerance=10, generalizer='douglas-peucker'):
  my_traj = traj_collection.get_trajectory(traj_id).to_crs(CRS(4088))
  if generalizer=='douglas-peucker':
    generalized = mpd.DouglasPeuckerGeneralizer(my_traj).generalize(tolerance)
    generalized = mpd.MinDistanceGeneralizer(my_traj).generalize(tolerance)
  return ( 
      title='Trajectory {} (tolerance={})'.format(, tolerance), 
      c='speed', cmap='Viridis', colorbar=True, clim=(0,20), 
      line_width=10, width=500, height=500) + 
      title='Speed histogram', width=300, height=500) 

Trajectory collections and generalization functions used in this example are part of the MovingPandas library. If you are interested in movement data analysis, you should check it out! You can find this example notebook in the MovingPandas tutorial section.

%d bloggers like this: