Archive

Movement data in GIS

QGIS Temporal Controller is a powerful successor of TimeManager. Temporal Controller is a new core feature of the current development version and will be shipped with the 3.14 release. This post demonstrates two key advantages of this new temporal support:

  1. Expression support for defining start and end timestamps
  2. Integration into the PyQGIS API

These features come in very handy in many use cases. For example, they make it much easier to create animations from folders full of GPS tracks since the files can now be loaded and configured automatically:

Script & Temporal Controller in action (click for full resolution)

All tracks start at the same location but at different times. (Kudos for Andrew Fletcher for recordings these tracks and sharing them with me!) To create an animation that shows all tracks start simultaneously, we need to synchronize them. This synchronization can be achieved on-the-fly by subtracting the start time from all track timestamps using an expression:

directory = "E:/Google Drive/QGIS_Course/05_TimeManager/Example_Dayrides/"

def load_and_configure(path):
    path = os.path.join(directory, filename)
    uri = 'file:///' + path + "?type=csv&escape=&useHeader=No&detectTypes=yes"
    uri = uri + "&crs=EPSG:4326&xField=field_3&yField=field_2"
    vlayer = QgsVectorLayer(uri, filename, "delimitedtext")
    QgsProject.instance().addMapLayer(vlayer)

    mode = QgsVectorLayerTemporalProperties.ModeFeatureDateTimeStartAndEndFromExpressions
    expression = """to_datetime(field_1) -
    make_interval(seconds:=minimum(epoch(to_datetime("field_1")))/1000)
    """

    tprops = vlayer.temporalProperties()
    tprops.setStartExpression(expression)
    tprops.setEndExpression(expression) # optional
    tprops.setMode(mode)
    tprops.setIsActive(True)

for filename in os.listdir(directory):
    if filename.endswith(".csv"):
        load_and_configure(filename)

The above script loads all CSV files from the given directory (field_1 is the timestamp, field_2 is y, and field_3 is x), enables sets the start and end expression as well as the corresponding temporal control mode and finally activates temporal rendering. The resulting config can be verified in the layer properties dialog:

To adapt this script to other datasets, it’s sufficient to change the file directory and revisit the layer uri definition as well as the field names referenced in the expression.


This post is part of a series. Read more about movement data in GIS.

This is a guest post by Bommakanti Krishna Chaitanya @chaitan94

Introduction

This post introduces mobilitydb-sqlalchemy, a tool I’m developing to make it easier for developers to use movement data in web applications. Many web developers use Object Relational Mappers such as SQLAlchemy to read/write Python objects from/to a database.

Mobilitydb-sqlalchemy integrates the moving objects database MobilityDB into SQLAlchemy and Flask. This is an important step towards dealing with trajectory data using appropriate spatiotemporal data structures rather than plain spatial points or polylines.

To make it even better, mobilitydb-sqlalchemy also supports MovingPandas. This makes it possible to write MovingPandas trajectory objects directly to MobilityDB.

For this post, I have made a demo application which you can find live at https://mobilitydb-sqlalchemy-demo.adonmo.com/. The code for this demo app is open source and available on GitHub. Feel free to explore both the demo app and code!

In the following sections, I will explain the most important parts of this demo app, to show how to use mobilitydb-sqlalchemy in your own webapp. If you want to reproduce this demo, you can clone the demo repository and do a “docker-compose up –build” as it automatically sets up this docker image for you along with running the backend and frontend. Just follow the instructions in README.md for more details.

Declaring your models

For the demo, we used a very simple table – with just two columns – an id and a tgeompoint column for the trip data. Using mobilitydb-sqlalchemy this is as simple as defining any regular table:

from flask_sqlalchemy import SQLAlchemy
from mobilitydb_sqlalchemy import TGeomPoint

db = SQLAlchemy()

class Trips(db.Model):
   __tablename__ = "trips"
   trip_id = db.Column(db.Integer, primary_key=True)
   trip = db.Column(TGeomPoint)

Note: The library also allows you to use the Trajectory class from MovingPandas as well. More about this is explained later in this tutorial.

Populating data

When adding data to the table, mobilitydb-sqlalchemy expects data in the tgeompoint column to be a time indexed pandas dataframe, with two columns – one for the spatial data  called “geometry” with Shapely Point objects and one for the temporal data “t” as regular python datetime objects.

from datetime import datetime
from shapely.geometry import Point

# Prepare and insert the data
# Typically it won’t be hardcoded like this, but it might be coming from 
# other data sources like a different database or maybe csv files
df = pd.DataFrame(
   [
       {"geometry": Point(0, 0), "t": datetime(2012, 1, 1, 8, 0, 0),},
       {"geometry": Point(2, 0), "t": datetime(2012, 1, 1, 8, 10, 0),},
       {"geometry": Point(2, -1.9), "t": datetime(2012, 1, 1, 8, 15, 0),},
   ]
).set_index("t")

trip = Trips(trip_id=1, trip=df)
db.session.add(trip)
db.session.commit()

Writing queries

In the demo, you see two modes. Both modes were designed specifically to explain how functions defined within MobilityDB can be leveraged by our webapp.

1. All trips mode – In this mode, we extract all trip data, along with distance travelled within each trip, and the average speed in that trip, both computed by MobilityDB itself using the ‘length’, ‘speed’ and ‘twAvg’ functions. This example also shows that MobilityDB functions can be chained to form more complicated queries.

mobilitydb-sqlalchemy-demo-1

trips = db.session.query(
   Trips.trip_id,
   Trips.trip,
   func.length(Trips.trip),
   func.twAvg(func.speed(Trips.trip))
).all()

2. Spatial query mode – In this mode, we extract only selective trip data, filtered by a user-selected region of interest. We then make a query to MobilityDB to extract only the trips which pass through the specified region. We use MobilityDB’s ‘intersects’ function to achieve this filtering at the database level itself.

mobilitydb-sqlalchemy-demo-2

trips = db.session.query(
   Trips.trip_id,
   Trips.trip,
   func.length(Trips.trip),
   func.twAvg(func.speed(Trips.trip))
).filter(
   func.intersects(Point(lat, lng).buffer(0.01).wkb, Trips.trip),
).all()

Using MovingPandas Trajectory objects

Mobilitydb-sqlalchemy also provides first-class support for MovingPandas Trajectory objects, which can be installed as an optional dependency of this library. Using this Trajectory class instead of plain DataFrames allows us to make use of much richer functionality over trajectory data like analysis speed, interpolation, splitting and simplification of trajectory points, calculating bounding boxes, etc. To make use of this feature, you have set the use_movingpandas flag to True while declaring your model, as shown in the below code snippet.

class TripsWithMovingPandas(db.Model):
   __tablename__ = "trips"
   trip_id = db.Column(db.Integer, primary_key=True)
   trip = db.Column(TGeomPoint(use_movingpandas=True))

Now when you query over this table, you automatically get the data parsed into Trajectory objects without having to do anything else. This also works during insertion of data – you can directly assign your movingpandas Trajectory objects to the trip column. In the below code snippet we show how inserting and querying works with movingpandas mode.

from datetime import datetime
from shapely.geometry import Point

# Prepare and insert the data
# Typically it won’t be hardcoded like this, but it might be coming from 
# other data sources like a different database or maybe csv files
df = pd.DataFrame(
   [
       {"geometry": Point(0, 0), "t": datetime(2012, 1, 1, 8, 0, 0),},
       {"geometry": Point(2, 0), "t": datetime(2012, 1, 1, 8, 10, 0),},
       {"geometry": Point(2, -1.9), "t": datetime(2012, 1, 1, 8, 15, 0),},
   ]
).set_index("t")

geo_df = GeoDataFrame(df)
traj = mpd.Trajectory(geo_df, 1)

trip = Trips(trip_id=1, trip=traj)
db.session.add(trip)
db.session.commit()

# Querying over this table would automatically map the resulting tgeompoint 
# column to movingpandas’ Trajectory class
result = db.session.query(TripsWithMovingPandas).filter(
   TripsWithMovingPandas.trip_id == 1
).first()

print(result.trip.__class__)
# <class 'movingpandas.trajectory.Trajectory'>

Bonus: trajectory data serialization

Along with mobilitydb-sqlalchemy, recently I have also released trajectory data serialization/compression libraries based on Google’s Encoded Polyline Format Algorithm, for python and javascript called trajectory and trajectory.js respectively. These libraries let you send trajectory data in a compressed format, resulting in smaller payloads if sending your data through human-readable serialization formats like JSON. In some of the internal APIs we use at Adonmo, we have seen this reduce our response sizes by more than half (>50%) sometimes upto 90%.

Want to learn more about mobilitydb-sqlalchemy? Check out the quick start & documentation.


This post is part of a series. Read more about movement data in GIS.

We recently published a new paper on “Open Geospatial Tools for Movement Data Exploration” (open access). If you liked Movement data in GIS #26: towards a template for exploring movement data, you will find even more information about the context, challenges, and recent developments in this paper.

It also presents three open source stacks for movement data exploration:

  1. QGIS + PostGIS: a combination that will be familiar to most open source GIS users
  2. Jupyter + MovingPandas: less common so far, but Jupyter notebooks are quickly gaining popularity (even in the proprietary GIS world)
  3. GeoMesa + Spark: for when datasets become too big to handle using other means

and discusses their capabilities and limitations:


This post is part of a series. Read more about movement data in GIS.

This post is a follow-up to the draft template for exploring movement data I wrote about in my previous post. Specifically, I want to address step 4: Exploring patterns in trajectory and event data.

The patterns I want to explore in this post are clusters of trip origins. The case study presented here is an extension of the MovingPandas ship data analysis notebook.

The analysis consists of 4 steps:

  1. Splitting continuous GPS tracks into individual trips
  2. Extracting trip origins (start locations)
  3. Clustering trip origins
  4. Exploring clusters

Since I have already removed AIS records with a speed over ground (SOG) value of zero from the dataset, we can use the split_by_observation_gap() function to split the continuous observations into individual trips. Trips that are shorter than 100 meters are automatically discarded as irrelevant clutter:

traj_collection.min_length = 100
trips = traj_collection.split_by_observation_gap(timedelta(minutes=5))

The split operation results in 302 individual trips:

Passenger vessel trajectories are blue, high-speed craft green, tankers red, and cargo vessels orange. Other vessel trajectories are gray.

To extract trip origins, we can use the get_start_locations() function. The list of column names defines which columns are carried over from the trajectory’s GeoDataFrame to the origins GeoDataFrame:

 
origins = trips.get_start_locations(['SOG', 'ShipType']) 

The following density-based clustering step is based on a blog post by Geoff Boeing and uses scikit-learn’s DBSCAN implementation:

from sklearn.cluster import DBSCAN
from geopy.distance import great_circle
from shapely.geometry import MultiPoint

origins['lat'] = origins.geometry.y
origins['lon'] = origins.geometry.x
matrix = origins.as_matrix(columns=['lat', 'lon'])

kms_per_radian = 6371.0088
epsilon = 0.1 / kms_per_radian

db = DBSCAN(eps=epsilon, min_samples=1, algorithm='ball_tree', metric='haversine').fit(np.radians(matrix))
cluster_labels = db.labels_
num_clusters = len(set(cluster_labels))
clusters = pd.Series([matrix[cluster_labels == n] for n in range(num_clusters)])
print('Number of clusters: {}'.format(num_clusters))

Resulting in 69 clusters.

Finally, we can add the cluster labels to the origins GeoDataFrame and plot the result:

origins['cluster'] = cluster_labels

To analyze the clusters, we can compute summary statistics of the trip origins assigned to each cluster. For example, we compute a representative (center-most) point, count the number of trips, and compute the mean speed (SOG) value:

 
def get_centermost_point(cluster):
    centroid = (MultiPoint(cluster).centroid.x, MultiPoint(cluster).centroid.y)
    centermost_point = min(cluster, key=lambda point: great_circle(point, centroid).m)
    return Point(tuple(centermost_point)[1], tuple(centermost_point)[0])
centermost_points = clusters.map(get_centermost_point) 

The largest cluster with a low mean speed (indicating a docking or anchoring location) is cluster 29 which contains 43 trips from passenger vessels, high-speed craft, an an undefined vessel:

To explore the overall cluster pattern, we can plot the clusters colored by speed and scaled by the number of trips:

Besides cluster 29, this visualization reveals multiple smaller origin clusters with low speeds that indicate different docking locations in the analysis area.

Cluster locations with high speeds on the other hand indicate locations where vessels enter the analysis area. In a next step, it might be interesting to compute flows between clusters to gain insights about connections and travel times.

It’s worth noting that AIS data contains additional information, such as vessel status, that could be used to extract docking or anchoring locations. However, the workflow presented here is more generally applicable to any movement data tracks that can be split into meaningful trips.

For the full interactive ship data analysis tutorial visit https://mybinder.org/v2/gh/anitagraser/movingpandas/binder-tag


This post is part of a series. Read more about movement data in GIS.

Exploring new datasets can be challenging. Addressing this challenge, there is a whole field called exploratory data analysis that focuses on exploring datasets, often with visual methods.

Concerning movement data in particular, there’s a comprehensive book on the visual analysis of movement by Andrienko et al. (2013) and a host of papers, such as the recent state of the art summary by Andrienko et al. (2017).

However, while the literature does provide concepts, methods, and example applications, these have not yet translated into readily available tools for analysts to use in their daily work. To fill this gap, I’m working on a template for movement data exploration implemented in Python using MovingPandas. The proposed workflow consists of five main steps:

  1. Establishing an overview by visualizing raw input data records
  2. Putting records in context by exploring information from consecutive movement data records (such as: time between records, speed, and direction)
  3. Extracting trajectories & events by dividing the raw continuous tracks into individual trajectories and/or events
  4. Exploring patterns in trajectory and event data by looking at groups of the trajectories or events
  5. Analyzing outliers by looking at potential outliers and how they may challenge preconceived assumptions about the dataset characteristics

To ensure a reproducible workflow, I’m designing the template as a a Jupyter notebook. It combines spatial and non-spatial plots using the awesome hvPlot library:

This notebook is a work-in-progress and you can follow its development at http://exploration.movingpandas.org. Your feedback is most welcome!

 

References

  • Andrienko G, Andrienko N, Bak P, Keim D, Wrobel S (2013) Visual analytics of movement. Springer Science & Business Media.
  • Andrienko G, Andrienko N, Chen W, Maciejewski R, Zhao Y (2017) Visual Analytics of Mobility and Transportation: State of the Art and Further Research Directions. IEEE Transactions on Intelligent Transportation Systems 18(8):2232–2249, DOI 10.1109/TITS.2017.2683539

Recently there has been some buzz on Twitter about a new moving object database (MOD) called MobilityDB that builds on PostgreSQL and PostGIS (Zimányi et al. 2019). The MobilityDB Github repo has been published in February 2019 but according to the following presentation at PgConf.Russia 2019 it has been under development for a few years:

Of course, moving object databases have been around for quite a while. The two most commonly cited MODs are HermesDB (Pelekis et al. 2008) which comes as an extension for either PostgreSQL or Oracle and is developed at the University of Piraeus and SECONDO (de Almeida et al. 2006) which is a stand-alone database system developed at the Fernuniversität Hagen. However, both MODs remain at the research prototype level and have not achieved broad adoption.

It will be interesting to see if MobilityDB will be able to achieve the goal they have set in the title of Zimányi et al. (2019) to become “a mainstream moving object database system”. It’s promising that they are building on PostGIS and using its mature spatial analysis functionality instead of reinventing the wheel. They also discuss why they decided that PostGIS trajectories (which I’ve written about in previous posts) are not the way to go:

However, the presentation does not go into detail whether there are any straightforward solutions to visualizing data stored in MobilityDB.

According to the Github readme, MobilityDB runs on Linux and needs PostGIS 2.5. They also provide an online demo as well as a Docker container with MobilityDB and all its dependencies. If you give it a try, I would love to hear about your experiences.

References

  • de Almeida, V. T., Guting, R. H., & Behr, T. (2006). Querying moving objects in secondo. In 7th International Conference on Mobile Data Management (MDM’06) (pp. 47-47). IEEE.
  • Pelekis, N., Frentzos, E., Giatrakos, N., & Theodoridis, Y. (2008). HERMES: aggregative LBS via a trajectory DB engine. In Proceedings of the 2008 ACM SIGMOD international conference on Management of data (pp. 1255-1258). ACM.
  • Zimányi, E., Sakr, M., Lesuisse, A., & Bakli, M. (2019). MobilityDB: A Mainstream Moving Object Database System. In Proceedings of the 16th International Symposium on Spatial and Temporal Databases (pp. 206-209). ACM.

This post is part of a series. Read more about movement data in GIS.

Last week, I had the pleasure to give a movement data analysis workshop at the OpenGeoHub summer school at the University of Münster in Germany. The workshop materials consist of three Jupyter notebooks that have been designed to also support self-study outside of a workshop setting. So you can try them out as well!

All materials are available on Github:

  • Tutorial 0 provides an introduction to the MovingPandas Trajectory class.
  • Tutorials 1 and 2 provide examples with real-world datasets covering one day of ship movement near Gothenburg and multiple years of gull migration, respectively.

Here’s a quick preview of the bird migration data analysis tutorial (click for full size):

Tutorial 2: Bird migration data analysis

You can run all three Jupyter notebooks online using MyBinder (no installations required).

Alternatively or if you want to dig deeper: installation instructions are available on movingpandas.org

The OpenGeoHub summer school this year had a strong focus on spatial analysis with R and GRASS (sometimes mixing those two together). It was great to meet @mdsumner (author of R trip) and @edzerpebesma (author of R trajectories) for what might have well been the ultimate movement data libraries geek fest. In the ultimate R / Python cross-over,  0_getting_started.Rmd

Both talks and workshops have been recorded. Here’s the introduction:

and this is the full workshop recording:


This post is part of a series. Read more about movement data in GIS.

Today’s post continues where “Why you should be using PostGIS trajectories” leaves off. It’s the result of a collaboration with Eva Westermeier. I had the pleasure to supervise her internship at AIT last year and also co-supervised her Master’s thesis [0] on the topic of enriching trajectories with information about their geographic context.

Context-aware analysis of movement data is crucial for different domains and applications, from transport to ecology. While there is a wealth of data, efficient and user-friendly contextual trajectory analysis is still hampered by a lack of appropriate conceptual approaches and practical methods. (Westermeier, 2018)

Part of the work was focused on evaluating different approaches to adding context information from vector datasets to trajectories in PostGIS. For example, adding land cover context to animal movement data or adding information on anchoring and harbor areas to vessel movement data.

Classic point-based model vs. line-based model

The obvious approach is to intersect the trajectory points with context data. This is the classic point data model of contextual trajectories. It’s straightforward to add context information in the point-based model but it also generates large numbers of repeating annotations. In contrast, the line data model using, for example, PostGIS trajectories (LinestringM) is more compact since trajectories can be split into segments at context borders. This creates one annotation per segment and the individual segments are convenient to analyze (as described in part #12).

Spatio-temporal interpolation as provided by the line data model offers additional advantages for the analysis of annotated segments. Contextual segments start and end at the intersection of the trajectory linestring with context polygon borders. This means that there are no gaps like in the point-based model. Consequently, while the point-based model systematically underestimates segment length and duration, the line-based approach offers more meaningful segment length and duration measurements.

Schematic illustration of a subset of an annotated trajectory in two context classes, a) systematic underestimation of length or duration in the point data model, b) full length or duration between context polygon borders in the line data model (source: Westermeier (2018))

Another issue of the point data model is that brief context changes may be missed or represented by just one point location. This makes it impossible to compute the length or duration of the respective context segment. (Of course, depending on the application, it can be desirable to ignore brief context changes and make the annotation process robust towards irrelevant changes.)

Schematic illustration of context annotation for brief context changes, a) and b)
two variants for the point data model, c) gapless annotation in the line data model (source: Westermeier (2018) based on Buchin et al. (2014))

Beyond annotations, context can also be considered directly in an analysis, for example, when computing distances between trajectories and contextual point objects. In this case, the point-based approach systematically overestimates the distances.

Schematic illustration of distance measurement from a trajectory to an external
object, a) point data model, b) line data model (source: Westermeier (2018))

The above examples show that there are some good reasons to dump the classic point-based model. However, the line-based model is not without its own issues.

Issues

Computing the context annotations for trajectory segments is tricky. The main issue is that ST_Intersection drops the M values. This effectively destroys our trajectories! There are ways to deal with this issue – and the corresponding SQL queries are published in the thesis (p. 38-40) – but it’s a real bummer. Basically, ST_Intersection only provides geometric output. Therefore, we need to reconstruct the temporal information in order to create usable trajectory segments.

Finally, while the line-based model is well suited to add context from other vector data, it is less useful for context data from continuous rasters but that was beyond the scope of this work.

Conclusion

After the promising results of my initial investigations into PostGIS trajectories, I was optimistic that context annotations would be a straightforward add-on. The line-based approach has multiple advantages when it comes to analyzing contextual segments. Unfortunately, generating these contextual segments is much less convenient and also slower than I had hoped. Originally, I had planned to turn this work into a plugin for the Processing toolbox but the results of this work motivated me to look into other solutions. You’ve already seen some of the outcomes in part #20 “Trajectools v1 released!”.

References

[0] Westermeier, E.M. (2018). Contextual Trajectory Modeling and Analysis. Master Thesis, Interfaculty Department of Geoinformatics, University of Salzburg.


This post is part of a series. Read more about movement data in GIS.

This post looks into the current AI hype and how it relates to geoinformatics in general and movement data analysis in GIS in particular. This is not an exhaustive review but aims to highlight some of the development within these fields. There are a lot of references in this post, including some to previous work of mine, so you can dive deeper into this topic on your own.

I’m looking forward to reading your take on this topic in the comments!

Introduction to AI

The dream of artificial intelligence (AI) that can think like a human (or even outsmart one) reaches back to the 1950s (Fig. 1, Tandon 2016). Machine learning aims to enable AI. However, classic machine learning approaches that have been developed over the last decades (such as: decision trees, inductive logic programming, clustering, reinforcement learning, neural networks, and Bayesian networks) have failed to achieve the goal of a general AI that would rival humans. Indeed, even narrow AI (technology that can only perform specific tasks) was mostly out of reach (Copeland 2018).

However, recent increases in computing power (be it GPUs, TPUs or CPUs) and algorithmic advances, particularly those based on neural networks, have made this dream (or nightmare) come closer (Rao 2017) and are fueling the current AI hype. It should be noted that artificial neural networks (ANN) are not a new technology. In fact, they used to be not very popular because they require large amounts of input data and computational power. However, in 2012, Andrew Ng at Google managed to create large enough neural networks and train them with massive amounts of data, an approach now know as deep learning (Copeland 2018).

Fig. 1: The evolution of artificial intelligence, machine learning, and deep learning. (Image source: Tandon 2016)

Machine learning & GIS

GIScience or geoinformatics is not new to machine learning. The most well-known application is probably supervised image classification, as implemented in countless commercial and open tools. This approach requires labeled training and test data (Fig. 2) to learn a prediction model that can, for example, classify land cover in remote sensing imagery. Many classification algorithms have been introduced, ranging from maximum likelihood classification to clustering (Congedo 2016) and neural networks.

Fig. 2: With supervised machine learning, the algorithm learns from labeled data. (Image source: Salian 2018)

Like in other fields, neural networks have intrigued geographers and GIScientists for a long time. For example, Hewitson & Crane (1994) state that “Neural nets offer a fascinating new strategy for spatial analysis, and their application holds enormous potential for the geographic sciences.” Early uses of neural network in GIScience include, for example: spatial interaction modeling (Openshaw 1998) and hydrological modeling of rainfall runoff (Dawson & Wilby 2001). More recently, neural networks and deep learning have enabled object recognition in georeferenced images. Most prominently, the research team at Mapillary (2016-2019) works on object recognition in street-level imagery (including fusion with other spatial data sources). Even Generative adversarial networks (GANs) (Fig. 3) have found their application in GIScience: for example, Zhu et al. (2017) (at the Berkeley AI Research (BAIR) laboratory) demonstrate how GANs can generate road maps from aerial images and vice versa, and Zhu et al. (2019) generate artificial digital elevation models.

Fig. 3: In a GAN, the discriminator is shown images from both the generator and from the training dataset. The discriminator is tasked with determining which images are real, and which are fakes from the generator. (Image source: Salian 2018)

However, besides general excitement about new machine learning approaches, researchers working on spatial analysis (Openshaw & Turton 1996) caution that “conventional classifiers, as provided in statistical packages, completely ignore most of the challenges of spatial data classification and handle a few inappropriately from a geographical perspective”. For example, data transformation using principal component or factor scores is sensitive to non-normal data distribution common in geographic data and many methods ignore spatial autocorrelation completely (Openshaw & Turton 1996). And neural networks are no exception: Convolutional neural networks (CNNs) are generally regarded appropriate for any problem involving pixels or spatial representations. However, Liu et al. (2018) demonstrate that they fail even for the seemingly trivial coordinate transform problem, which requires learning a mapping between coordinates in (x, y) Cartesian space and coordinates in one-hot pixel space.

The integration of spatial data challenges into machine learning is an ongoing area of research, for example in geostatistics (Hengl & Heuvelink 2019).

Machine learning and movement data

More and more movement data of people, vehicles, goods, and animals is becoming available. Developments in intelligent transportation systems specifically have been sparked by the availability of cheap GPS receivers and many models have been built that leverage floating car data (FCD) to classify traffic situations (for example, using visual analysis (Graser et al. 2012)), predict traffic speeds (for example, using linear regression models (Graser et al. 2016)), or detect movement anomalies (for example, using Gaussian mixture models (Graser & Widhalm 2018)). Beyond transportation, Valletta et al. (2017) describe applications of machine learning in animal movement and behavior.

Of course deep learning is making its way into movement data analysis as well. For example, Wang et al. (2018) and Kudinov (2018) trained neural networks to predict travel times in a transport networks. In contrast to conventional travel time prediction models (based on street graphs with associated speeds or travel times), these are considerably more computationally intensive. Kudinov (2018) for example, used 300 million simulated trips (start and end location, start time, and trip duration) as input and “spent about eight months of running one of the GP100 cards 24-7 in a search for an efficient architecture, spatial and statistical distributions of the training set, good values for multiple hyperparameters”.  More recently, Zhang et al. (2019) (at Microsoft Research Asia) used deep learning to predict flows in spatio-temporal networks. It remains to be seen if deep learning will manage to out-perform classical machine learning approaches for predictions in the transportation sector.

What would a transportation AI look like? Would it be able to drive a car and follow data-driven route recommendations (e.g. from waze.com) or would it purposefully ignore them because other – more basic systems – blindly follow it? Logistics AI might build on these kind of systems while simultaneously optimizing large fleets of vehicles. Transport planning AI might replace transport planners by providing reliable mobility demand predictions as well as resulting traffic models for varying infrastructure and policy scenarios.

Conclusions

The opportunities for using ML in geoinformatics are extensive and have been continuously explored for a multitude of different research problems and applications (from land use classification to travel time prediction). Geoinformatics is largely playing catch-up with the quick development in machine learning (including deep learning) that promise new and previously unseen possibilities. At the same time, it is necessary that geoinformatics researchers are aware of the particularities of spatial data, for example, by developing models that take spatial autocorrelation into account. Future research in geoinformatics should incorporate learnings from geostatistics to ensure that resulting machine learning models incorporate the geographical perspective.

References

  • Congedo, L. (2016). Semi-Automatic Classification Plugin Documentation. DOI: http://dx.doi.org/10.13140/RG.2.2.29474.02242/1
  • Copeland, M. (2016) What’s the Difference Between Artificial Intelligence, Machine Learning, and Deep Learning? https://blogs.nvidia.com/blog/2016/07/29/whats-difference-artificial-intelligence-machine-learning-deep-learning-ai/
  • Dawson, C. W., & Wilby, R. L. (2001). Hydrological modelling using artificial neural networks. Progress in physical Geography, 25(1), 80-108.
  • Graser, A., Ponweiser, W., Dragaschnig, M., Brandle, N., & Widhalm, P. (2012). Assessing traffic performance using position density of sparse FCD. In Intelligent Transportation Systems (ITSC), 2012 15th International IEEE Conference on (pp. 1001-1005). IEEE.
  • Graser, A., Leodolter, M., Koller, H., & Brändle, N. (2016) Improving vehicle speed estimates using street network centrality. International Journal of Cartography. doi:10.1080/23729333.2016.1189298.
  • Graser, A., & Widhalm, P. (2018). Modelling Massive AIS Streams with Quad Trees and Gaussian Mixtures. In: Mansourian, A., Pilesjö, P., Harrie, L., & von Lammeren, R. (Eds.), 2018. Geospatial Technologies for All : short papers, posters and poster abstracts of the 21th AGILE Conference on Geographic Information Science. Lund University 12-15 June 2018, Lund, Sweden. ISBN 978-3-319-78208-9. Accessible through https://agile-online.org/index.php/conference/proceedings/proceedings-2018
  • Hengl, T. Heuvelink, G.B.M. (2019) Workshop on Machine learning as a framework for predictive soil mapping https://www.cvent.com/events/pedometrics-2019/custom-116-81b34052775a43fcb6616a3f6740accd.aspx?dvce=1
  • Hewitson, B., Crane, R. G. (Eds.) (1994) Neural Nets: Applications in Geography. Springer.
  • Kudinov, D. (2018) Predicting travel times with artificial neural network and historical routes. https://community.esri.com/community/gis/applications/arcgis-pro/blog/2018/03/27/predicting-travel-times-with-artificial-neural-network-and-historical-routes
  • Liu, R., Lehman, J., Molino, P., Such, F. P., Frank, E., Sergeev, A., & Yosinski, J. (2018). An intriguing failing of convolutional neural networks and the coordconv solution. In Advances in Neural Information Processing Systems (pp. 9605-9616).
  • Mapillary Research (2016-2019) publications listed on https://research.mapillary.com/
  • Openshaw, S., & Turton, I. (1996). A parallel Kohonen algorithm for the classification of large spatial datasets. Computers & Geosciences, 22(9), 1019-1026.
  • Openshaw, S. (1998). Neural network, genetic, and fuzzy logic models of spatial interaction. Environment and Planning A, 30(10), 1857-1872.
  • Rao, R. C.S. (2017) New Product breakthroughs with recent advances in deep learning and future business opportunities. https://mse238blog.stanford.edu/2017/07/ramdev10/new-product-breakthroughs-with-recent-advances-in-deep-learning-and-future-business-opportunities/
  • Salian, I. (2018) SuperVize Me: What’s the Difference Between Supervised, Unsupervised, Semi-Supervised and Reinforcement Learning? https://blogs.nvidia.com/blog/2018/08/02/supervised-unsupervised-learning/
  • Tandon, K. (2016) AI & Machine Learning: The evolution, differences and connections https://www.linkedin.com/pulse/ai-machine-learning-evolution-differences-connections-kapil-tandon/
  • Valletta, J. J., Torney, C., Kings, M., Thornton, A., & Madden, J. (2017). Applications of machine learning in animal behaviour studies. Animal Behaviour, 124, 203-220.
  • Wang, D., Zhang, J., Cao, W., Li, J., & Zheng, Y. (2018). When will you arrive? estimating travel time based on deep neural networks. In Thirty-Second AAAI Conference on Artificial Intelligence.
  • Zhang, J., Zheng, Y., Sun, J., & Qi, D. (2019). Flow Prediction in Spatio-Temporal Networks Based on Multitask Deep Learning. IEEE Transactions on Knowledge and Data Engineering.
  • Zhu, J. Y., Park, T., Isola, P., & Efros, A. A. (2017). Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE international conference on computer vision (pp. 2223-2232).
  • Zhu, D., Cheng, X., Zhang, F., Yao, X., Gao, Y., & Liu, Y. (2019). Spatial interpolation using conditional generative adversarial neural networks. International Journal of Geographical Information Science, 1-24.

This post is part of a series. Read more about movement data in GIS.

MovingPandas is my attempt to provide a pure Python solution for trajectory data handling in GIS. MovingPandas provides trajectory classes and functions built on top of GeoPandas. 

To lower the entry barrier to getting started with MovingPandas, there’s now an interactive iPython notebook hosted on MyBinder. This notebook provides all the necessary imports and demonstrates how to create a Trajectory object.

Launch MyBinder for MovingPandas to get started!

%d bloggers like this: