Archive

Author Archives: underdark

Exploring large movement datasets is hard because visualizations of movement data quickly get cluttered and hard to interpret. Therefore, we need to aggregate the data. Density maps are commonly used since they are readily available and quick to compute but they provide only very limited insight. In contrast, meaningful aggregations that can help discover patterns are computationally expensive and therefore slow to generate.

This post serves as a starting point for a series of new approaches to exploring massive movement data. This series will summarize parts of my PhD research and – for those of you who are interested in more details – there will be links to the relevant papers.

Starting with the raw location records, we use different forms of aggregation to learn more about what information a movement dataset contains:

  1. Summarizing movement using prototypes by aggregating raw location records using our flexible M³ Massive Movement Model [1]
  2. Generating trajectories by connecting consecutive records into continuous tracks and splitting them into meaningful trajectories [2]
  3. Extracting flows by summarizing trajectory-based transitions between prototypes [3]

Besides clever aggregation approaches, massive movement datasets also require appropriate computing resources. To ensure that we can efficiently explore large datasets, we have implemented the above mentioned aggregation steps in Spark. This enables us to run the computations on general purpose computing clusters that can be scaled according to the dataset size.

In the next post, we’ll look at how to summarize movement using M³ prototypes. So stay tuned!

But if you don’t want to wait, these are the original papers:

[1] Graser. A., Widhalm, P., & Dragaschnig, M. (2020). The M³ massive movement model: a distributed incrementally updatable solution for big movement data exploration. International Journal of Geographical Information Science. doi:10.1080/13658816.2020.1776293.
[2] Graser, A., Dragaschnig, M., Widhalm, P., Koller, H., & Brändle, N. (2020). Exploratory Trajectory Analysis for Massive Historical AIS Datasets. In: 21st IEEE International Conference on Mobile Data Management (MDM) 2020. doi:10.1109/MDM48529.2020.00059
[3] Graser, A., Widhalm, P., & Dragaschnig, M. (2020). Extracting Patterns from Large Movement Datasets. GI_Forum – Journal of Geographic Information Science, 1-2020, 153-163. doi:10.1553/giscience2020_01_s153.


This post is part of a series. Read more about movement data in GIS.

QGIS Temporal Controller is a powerful successor of TimeManager. Temporal Controller is a new core feature of the current development version and will be shipped with the 3.14 release. This post demonstrates two key advantages of this new temporal support:

  1. Expression support for defining start and end timestamps
  2. Integration into the PyQGIS API

These features come in very handy in many use cases. For example, they make it much easier to create animations from folders full of GPS tracks since the files can now be loaded and configured automatically:

Script & Temporal Controller in action (click for full resolution)

All tracks start at the same location but at different times. (Kudos for Andrew Fletcher for recordings these tracks and sharing them with me!) To create an animation that shows all tracks start simultaneously, we need to synchronize them. This synchronization can be achieved on-the-fly by subtracting the start time from all track timestamps using an expression:

directory = "E:/Google Drive/QGIS_Course/05_TimeManager/Example_Dayrides/"

def load_and_configure(filename):
    path = os.path.join(directory, filename)
    uri = 'file:///' + path + "?type=csv&escape=&useHeader=No&detectTypes=yes"
    uri = uri + "&crs=EPSG:4326&xField=field_3&yField=field_2"
    vlayer = QgsVectorLayer(uri, filename, "delimitedtext")
    QgsProject.instance().addMapLayer(vlayer)

    mode = QgsVectorLayerTemporalProperties.ModeFeatureDateTimeStartAndEndFromExpressions
    expression = """to_datetime(field_1) -
    make_interval(seconds:=minimum(epoch(to_datetime("field_1")))/1000)
    """

    tprops = vlayer.temporalProperties()
    tprops.setStartExpression(expression)
    tprops.setEndExpression(expression) # optional
    tprops.setMode(mode)
    tprops.setIsActive(True)

for filename in os.listdir(directory):
    if filename.endswith(".csv"):
        load_and_configure(filename)

The above script loads all CSV files from the given directory (field_1 is the timestamp, field_2 is y, and field_3 is x), enables sets the start and end expression as well as the corresponding temporal control mode and finally activates temporal rendering. The resulting config can be verified in the layer properties dialog:

To adapt this script to other datasets, it’s sufficient to change the file directory and revisit the layer uri definition as well as the field names referenced in the expression.


This post is part of a series. Read more about movement data in GIS.

Podcasts have become huge. I’m an avid listener of podcasts myself. I particularly enjoy formats that take the time to talk about unconventional topics in detail.

My first podcast experience was on the QGIS podcast hosted by Tim Sutton in 2014. Unfortunately, it seems like the podcast episodes are not online anymore.

Recently, I had the pleasure to join the MapScaping Podcast by Daniel O’Donohue to talk about Python for Geospatial: 

Other guests Daniel has already interviewed include:

Another geospatial podcast I really enjoy is The Mappyist Hour by Silas and Todd. Unfortunately, it’s a bit silent there now but it’s definitely worth to listen into their episode archive. One of my favorites is Episode 9 where Linda Stevens (Hecht) discusses her career at ESRI, the future of GIS, and the role of Open Source Spatial in that future:

If you listen to and want to recommend other spatial podcasts, please share them in the comments!

TimeManager turns 10 this year. The code base has made the transition from QGIS 1.x to 2.x and now 3.x and it would be wrong to say that it doesn’t show ;-)

Now, it looks like the days of TimeManager are numbered. Four days ago, Nyall Dawson has added native temporal support for vector layers to QGIS. This is part of a larger effort of adding time support for rasters, meshes, and now also vectors.

The new Temporal Controller panel looks similar to TimeManager. Layers are configured through the new Temporal tab in Layer Properties. The temporal dimension can be used in expressions to create fancy time-dependent styles:

temporal1

TimeManager Geolife demo converted to Temporal Controller (click for full resolution)

Obviously, this feature is brand new and will require polishing. Known issues listed by Nyall include limitations of supported time fields (only fields with datetime type are supported right now, strings cannot be used) and worse performance than TimeManager since features are filtered in QGIS rather than in the backend.

If you want to give the new Temporal Controller a try, you need to install the current development version, e.g. qgis-dev in OSGeo4W.


Update from May 16:

Many of the limitations above have already been addressed.

Last night, Nyall has recorded a one hour tutorial on this new feature, enjoy:

Mapping spatial decision patterns, such as election results, is always a hot topic. That’s why we decided to include a recipe for election maps in our QGIS Map Design books. What’s new is that this recipe is now available as a free video tutorial recorded by Oliver Burdekin:

This video is just one of many recently published video tutorials that have been created by QGIS community members.

For example, Hans van der Kwast and Kurt Menke have recorded a 7-part series on QGIS for Hydrological Applications:

and Klas Karlsson’s Youtube channel is also always worth a follow:

For the Pythonically inclined among you, there is also a new version of Python in QGIS on the Automating GIS-processes channel:

 

This post introduces Holoviz Panel, a library that makes it possible to create really quick dashboards in notebook environments as well as more sophisticated custom interactive web apps and dashboards.

The following example shows how to use Panel to explore a dataset (a trajectory collection in this case) and different parameter settings (relating to trajectory generalization). All the Panel code we need is a dict that defines the parameters that we want to explore. Then we can use Panel’s interact function to automatically generate a dashboard for our custom plotting function:

import panel as pn

kw = dict(traj_id=(1, len(traj_collection)), 
          tolerance=(10, 100, 10), 
          generalizer=['douglas-peucker', 'min-distance'])
pn.interact(plot_generalized, **kw)

Click to view the resulting dashboard in full resolution:

The plotting function uses the parameters to generate a Holoviews plot. First it fetches a specific trajectory from the trajectory collection. Then it generalizes the trajectory using the specified parameter settings. As you can see, we can easily combine maps and other plots to visualize different aspects of the data:

def plot_generalized(traj_id=1, tolerance=10, generalizer='douglas-peucker'):
  my_traj = traj_collection.get_trajectory(traj_id).to_crs(CRS(4088))
  if generalizer=='douglas-peucker':
    generalized = mpd.DouglasPeuckerGeneralizer(my_traj).generalize(tolerance)
  else:
    generalized = mpd.MinDistanceGeneralizer(my_traj).generalize(tolerance)
  generalized.add_speed(overwrite=True)
  return ( 
    generalized.hvplot(
      title='Trajectory {} (tolerance={})'.format(my_traj.id, tolerance), 
      c='speed', cmap='Viridis', colorbar=True, clim=(0,20), 
      line_width=10, width=500, height=500) + 
    generalized.df['speed'].hvplot.hist(
      title='Speed histogram', width=300, height=500) 
    )

Trajectory collections and generalization functions used in this example are part of the MovingPandas library. If you are interested in movement data analysis, you should check it out! You can find this example notebook in the MovingPandas tutorial section.

MovingPandas has come a long way since 2018 when I started to experiment with GeoPandas for trajectory data handling.

This week, MovingPandas passed peer review and was approved for pyOpenSci. This technical review process was extremely helpful in ensuring code, project, and documentation quality. I would strongly recommend it to everyone working on new data science libraries!

The lastest v0.3 release is now available from conda-forge.

All tutorials are available on MyBinder

New features include:

  • Support for GeoPandas 0.7
  • Trajectory collection aggregation functions to generate flow maps

 

We recently published a new paper on “Open Geospatial Tools for Movement Data Exploration” (open access). If you liked Movement data in GIS #26: towards a template for exploring movement data, you will find even more information about the context, challenges, and recent developments in this paper.

It also presents three open source stacks for movement data exploration:

  1. QGIS + PostGIS: a combination that will be familiar to most open source GIS users
  2. Jupyter + MovingPandas: less common so far, but Jupyter notebooks are quickly gaining popularity (even in the proprietary GIS world)
  3. GeoMesa + Spark: for when datasets become too big to handle using other means

and discusses their capabilities and limitations:


This post is part of a series. Read more about movement data in GIS.

In December, I wrote about GeoPandas on Databricks. Back then, I also tried to get MovingPandas working but without luck. (While GeoPandas can be installed using Databricks’ dbutils.library.installPyPI("geopandas") this PyPI install just didn’t want to work for MovingPandas.)

Now that MovingPandas is available from conda-forge, I gave it another try and … *spoiler alert* … it works!

First of all, conda support on Databricks is in beta. It’s not included in the default runtimes. At the time of writing this post, “6.0 Conda Beta” is the latest runtime with conda:

Once the cluster is up and connected to the notebook, a quick conda list shows the installed packages:

Time to install MovingPandas! I went with a 100% conda-forge installation. This takes a looong time (almost half an hour)!

When the installs are finally done, it get’s serious: time to test the imports!

Success!

Now we can put the MovingPandas data structures to good use. But first we need to load some movement data:

Or course, the points in this GeoDataFrame can be plotted. However, the plot isn’t automatically displayed once plot() is called on the GeoDataFrame. Instead, Databricks provides a display() function to display Matplotlib figures:

MovingPandas also uses Matplotlib. Therefore we can use the same approach to plot the TrajectoryCollection that can be created from the GeoDataFrame:

These Matplotlib plots are nice and quick but they lack interactivity and therefore are of limited use for data exploration.

MovingPandas provides interactive plotting (including base maps) using hvplot. hvplot is based on Bokeh and, luckily, the Databricks documentation tells us that bokeh plots can be exported to html and then displayed using  displayHTML():

Of course, we could achieve all this on MyBinder as well (and much more quickly). However, Databricks gets interesting once we can add (Py)Spark and distributed processing to the mix. For example, “Getting started with PySpark & GeoPandas on Databricks” shows a spatial join function that adds polygon information to a point GeoDataFrame.

A potential use case for MovingPandas would be to speed up flow map computations. The recently added aggregator functionality (currently in master only) first computes clusters of significant trajectory points and then aggregates the trajectories into flows between these clusters. Matching trajectory points to the closest cluster could be a potential use case for distributed computing. Each trajectory (or each point) can be handled independently, only the cluster locations have to be broadcast to all workers.

Flow map (screenshot from MovingPandas tutorial 4_generalization_and_aggregation.ipynb)

 

This post is a follow-up to the draft template for exploring movement data I wrote about in my previous post. Specifically, I want to address step 4: Exploring patterns in trajectory and event data.

The patterns I want to explore in this post are clusters of trip origins. The case study presented here is an extension of the MovingPandas ship data analysis notebook.

The analysis consists of 4 steps:

  1. Splitting continuous GPS tracks into individual trips
  2. Extracting trip origins (start locations)
  3. Clustering trip origins
  4. Exploring clusters

Since I have already removed AIS records with a speed over ground (SOG) value of zero from the dataset, we can use the split_by_observation_gap() function to split the continuous observations into individual trips. Trips that are shorter than 100 meters are automatically discarded as irrelevant clutter:

traj_collection.min_length = 100
trips = traj_collection.split_by_observation_gap(timedelta(minutes=5))

The split operation results in 302 individual trips:

Passenger vessel trajectories are blue, high-speed craft green, tankers red, and cargo vessels orange. Other vessel trajectories are gray.

To extract trip origins, we can use the get_start_locations() function. The list of column names defines which columns are carried over from the trajectory’s GeoDataFrame to the origins GeoDataFrame:

 
origins = trips.get_start_locations(['SOG', 'ShipType']) 

The following density-based clustering step is based on a blog post by Geoff Boeing and uses scikit-learn’s DBSCAN implementation:

from sklearn.cluster import DBSCAN
from geopy.distance import great_circle
from shapely.geometry import MultiPoint

origins['lat'] = origins.geometry.y
origins['lon'] = origins.geometry.x
matrix = origins.as_matrix(columns=['lat', 'lon'])

kms_per_radian = 6371.0088
epsilon = 0.1 / kms_per_radian

db = DBSCAN(eps=epsilon, min_samples=1, algorithm='ball_tree', metric='haversine').fit(np.radians(matrix))
cluster_labels = db.labels_
num_clusters = len(set(cluster_labels))
clusters = pd.Series([matrix[cluster_labels == n] for n in range(num_clusters)])
print('Number of clusters: {}'.format(num_clusters))

Resulting in 69 clusters.

Finally, we can add the cluster labels to the origins GeoDataFrame and plot the result:

origins['cluster'] = cluster_labels

To analyze the clusters, we can compute summary statistics of the trip origins assigned to each cluster. For example, we compute a representative (center-most) point, count the number of trips, and compute the mean speed (SOG) value:

 
def get_centermost_point(cluster):
    centroid = (MultiPoint(cluster).centroid.x, MultiPoint(cluster).centroid.y)
    centermost_point = min(cluster, key=lambda point: great_circle(point, centroid).m)
    return Point(tuple(centermost_point)[1], tuple(centermost_point)[0])
centermost_points = clusters.map(get_centermost_point) 

The largest cluster with a low mean speed (indicating a docking or anchoring location) is cluster 29 which contains 43 trips from passenger vessels, high-speed craft, an an undefined vessel:

To explore the overall cluster pattern, we can plot the clusters colored by speed and scaled by the number of trips:

Besides cluster 29, this visualization reveals multiple smaller origin clusters with low speeds that indicate different docking locations in the analysis area.

Cluster locations with high speeds on the other hand indicate locations where vessels enter the analysis area. In a next step, it might be interesting to compute flows between clusters to gain insights about connections and travel times.

It’s worth noting that AIS data contains additional information, such as vessel status, that could be used to extract docking or anchoring locations. However, the workflow presented here is more generally applicable to any movement data tracks that can be split into meaningful trips.

For the full interactive ship data analysis tutorial visit https://mybinder.org/v2/gh/anitagraser/movingpandas/binder-tag


This post is part of a series. Read more about movement data in GIS.

%d bloggers like this: