Movement data in GIS

Many of my previous posts in this series [1][2][3] have relied on PostGIS for trajectory data handling. While I love PostGIS, it feels like overkill to require a database to analyze smaller movement datasets. Wouldn’t it be great to have a pure Python solution?

If we look into moving object data literature, beyond the “trajectories are points with timestamps” perspective, which is common in GIS, we also encounter the “trajectories are time series with coordinates” perspective. I don’t know about you, but if I hear “time series” and Python, I think Pandas! In the Python Data Science Handbook, Jake VanderPlas writes:

Pandas was developed in the context of financial modeling, so as you might expect, it contains a fairly extensive set of tools for working with dates, times, and time-indexed data.

Of course, time series are one thing, but spatial data handling is another. Lucky for us, this is where GeoPandas comes in. GeoPandas has been around for a while and version 0.4 has been released in June 2018. So far, I haven’t found examples that use GeoPandas to manage movement data, so I’ve set out to give it a shot. My trajectory class uses a GeoDataFrame df for data storage. For visualization purposes, it can be converted to a LineString:

```import pandas as pd
from geopandas import GeoDataFrame
from shapely.geometry import Point, LineString

class Trajectory():
def __init__(self, id, df, id_col):
self.id = id
self.df = df
self.id_col = id_col

def __str__(self):
return "Trajectory {1} ({2} to {3}) | Size: {0}".format(
self.df.geometry.count(), self.id, self.get_start_time(),
self.get_end_time())

def get_start_time(self):
return self.df.index.min()

def get_end_time(self):
return self.df.index.max()

def to_linestring(self):
return self.make_line(self.df)

def make_line(self, df):
if df.size > 1:
return df.groupby(self.id_col)['geometry'].apply(
lambda x: LineString(x.tolist())).values[0]
else:
raise RuntimeError('Dataframe needs at least two points to make line!')

def get_position_at(self, t):
try:
return self.df.loc[t]['geometry'][0]
except:
return self.df.iloc[self.df.index.drop_duplicates().get_loc(
t, method='nearest')]['geometry']
```

Of course, this class can be used in stand-alone Python scripts, but it can also be used in QGIS. The following script takes data from a QGIS point layer, creates a GeoDataFrame, and finally generates trajectories. These trajectories can then be added to the map as a line layer.

All we need to do to ensure that our data is ordered by time is to set the GeoDataFrame’s index to the time field. From then on, Pandas takes care of the time series aspects and we can access the index as shown in the Trajectory.get_position_at() function above.

```# Get data from a point layer
l = iface.activeLayer()
time_field_name = 't'
trajectory_id_field = 'trajectory_id'
names = [field.name() for field in l.fields()]
data = []
for feature in l.getFeatures():
my_dict = {}
for i, a in enumerate(feature.attributes()):
my_dict[names[i]] = a
x = feature.geometry().asPoint().x()
y = feature.geometry().asPoint().y()
my_dict['geometry']=Point((x,y))
data.append(my_dict)

# Create a GeoDataFrame
df = pd.DataFrame(data).set_index(time_field_name)
crs = {'init': l.crs().geographicCrsAuthId()}
geo_df = GeoDataFrame(df, crs=crs)
print(geo_df)

# Test if spatial functions work
print(geo_df.dissolve([True]*len(geo_df)).centroid)

# Create a QGIS layer for trajectory lines
vl = QgsVectorLayer("LineString", "trajectories", "memory")
vl.setCrs(l.crs()) # doesn't stop popup :(
pr = vl.dataProvider()
vl.updateFields()

df_by_id = dict(tuple(geo_df.groupby(trajectory_id_field)))
trajectories = {}
for key, value in df_by_id.items():
traj = Trajectory(key, value, trajectory_id_field)
trajectories[key] = traj
line = QgsGeometry.fromWkt(traj.to_linestring().wkt)
f = QgsFeature()
f.setGeometry(line)
f.setAttributes([key])
print(trajectories[1])

vl.updateExtents()
```

The following screenshot shows this script applied to a sample of the Geolife datasets containing 100 trajectories with a total of 236,776 points. On my notebook, the runtime is approx. 20 seconds.

So far, GeoPandas has proven to be a convenient way to handle time series with coordinates. Trying to implement some trajectory analysis tools will show if it is indeed a promising data structure for trajectories.

This is a guest post by Time Manager collaborator and Python expert, Ariadni-Karolina Alexiou.

Today we’re going to look at how to visualize the error bounds of a GPS trace in time. The goal is to do an in-depth visual exploration using QGIS and Time Manager in order to learn more about the data we have.

The Data

We have a file that contains GPS locations of an object in time, which has been created by a GPS tracker. The tracker also keeps track of the error covariance matrix for each point in time, that is, what confidence it has in the measurements it gives. Here is what the file looks like:

Error Covariance Matrix

What are those sd* fields? According to the manual: The estimated standard deviations of the solution assuming a priori error model and error parameters by the positioning options. What it basically means is that the real GPS location will be located no further than three standard deviations across north and east from the measured location, most of (99.7%) the time. A way to represent this visually is to create an ellipse that maps this area of where the real location can be.

An ellipse can be uniquely defined from the lengths of the segments a and b and its rotation angle. For more details on how to get those ellipse parameters from the covariance matrix, please see the footnote.

Ground truth data

We also happen to have a file with the actual locations (also in longitudes and latitudes) of the object for the same time frame as the GPS (also in seconds), provided through another tracking method which is more accurate in this case.

This is because, the object was me running on a rooftop in Zürich wearing several tracking devices (not just GPS), and I knew exactly which floor tiles I was hitting.

The goal is to explore, visually, the relationship between the GPS data and the actual locations in time. I hope to get an idea of the accuracy, and what can influence it.

First look

Loading the GPS data into QGIS and Time Manager, we can indeed see the GPS locations vis-a-vis the actual locations in time.

Let’s see if the actual locations that were measured independently fall inside the ellipse coverage area. To do this, we need to use the covariance data to render ellipses.

Creating the ellipses

I considered using the ellipses marker from QGIS.

It is possible to switch from Millimeter to Map Unit and edit a data defined override for symbol width, height and rotation. Symbol width would be the a parameter of the ellipse, symbol height the b parameter and rotation simply the angle. The thing is, we haven’t computed any of these values yet, we just have the error covariance values in our dataset.

Because of the re-projections and matrix calculations inherent into extracting the a, b and angle of the error ellipse at each point in time, I decided to do this calculation offline using Python and relevant libraries, and then simply add a WKT text field with a polygon representation of the ellipse to the file I had. That way, the augmented data could be re-used outside QGIS, for example, to visualize using Leaflet or similar. I could have done a hybrid solution, where I calculated a, b and the angle offline, and then used the dynamic rendering capabilities of QGIS, as well.

I also decided to dump the csv into an sqlite database with an index on the time column, to make time range queries (which Time Manager does) run faster.

Putting it all together

The code for transforming the initial GPS data csv file into an sqlite database can be found in my github along with a small sample of the file containing the GPS data.

I created three ellipses per timestamp, to represent the three standard deviations. Opening QGIS (I used version: 2.12, Las Palmas) and going to Layer>Add Layer>Add SpatialLite Layer, we see the following dialog:

After adding the layer (say, for the second standard deviation ellipse), we can add it to Time Manager like so:

We do the process three times to add the three types of ellipses, taking care to style each ellipse differently. I used transparent fill for the second and third standard deviation ellipses.

I also added the data of my  actual positions.

Here is an exported video of the trace (at a place in time where I go forward, backwards and forward again and then stay still).

Conclusions

Looking at the relationship between the actual data and the GPS data, we can see the following:

• Although the actual position differs from the measured one, the actual position always lies within one or two standard deviations of the measured position (so, inside the purple and golden ellipses).
• The direction of movement has greater uncertainty (the ellipse is elongated across the line I am running on).
• When I am standing still, the GPS position is still moving, and unfortunately does not converge to my actual stationary position, but drifts. More research is needed regarding what happens with the GPS data when the tracker is actually still.
• The GPS position doesn’t jump erratically, which can be good, however, it seems to have trouble ‘catching up’ with the actual position. This means if we’re looking to measure velocity in particular, the GPS tracker might underestimate that.

These findings are empirical, since they are extracted from a single visualization, but we have already learned some new things. We have some new ideas for what questions to ask on a large scale in the data, what additional experiments to run in the future and what limitations we may need to be aware of.

Footnote: Error Covariance Matrix calculations

The error covariance matrix is (according to the definitions of the sd* columns in the manual):

 sde * sde sign(sdne) * sdne * sdne sign(sdne) * sdne * sdne sdn * sdn

It is not a diagonal matrix, which means that the errors across the ‘north’ dimension and the ‘east’ dimension, are not exactly independent.

An important detail is that, while the position is given in longitudes and latitudes, the sdn, sde and sdne fields are in meters. To address this in the code, we convert the longitude and latitudes using UTM projection, so that they are also in meters (northings and eastings).

For more details on the mathematics used to plot the ellipses check out this article by Robert Eisele and the implementation of the ellipse calculations on my github.

Do you sometimes start writing an SQL query and around at line 50 you get the feeling that it might be getting out of hand? If so, it might be useful to start breaking it down into smaller chunks and wrap those up into custom functions. Never done that? Don’t despair! There’s an excellent PL/pgSQL tutorial on postgresqltutorial.com to get you started.

To get an idea of the basic structure of a PL/pgSQL function and to proof that PostGIS datatypes work just fine in this context, here’s a basic function that takes a trajectory geometry and outputs its duration, i.e. the difference between its last and first timestamp:

```CREATE OR REPLACE FUNCTION AG_Duration(traj geometry)
RETURNS numeric LANGUAGE 'plpgsql'
AS \$BODY\$
BEGIN
RETURN ST_M(ST_EndPoint(traj))-ST_M(ST_StartPoint(traj));
END; \$BODY\$;
```

My end goal for this exercise was to implement a function that takes a trajectory and outputs the stops along this trajectory. Commonly, a stop is defined as a long stay within an area with a small radius. This leads us to the following definition:

```CREATE OR REPLACE FUNCTION AG_DetectStops(
traj geometry,
max_size numeric,
min_duration numeric)
RETURNS TABLE(sequence integer, geom geometry)
-- implementation follows here!
```

Note how this function uses RETURNS TABLE to enable it to return all the stops that it finds. To add a line to the output table, we need to assign values to the sequence and geom variables and then use RETURN NEXT.

Another reason to use PL/pgSQL is that it enables us to write loops. And loops I wanted for my stop detection function! Specifically, I wanted to go through all the points in the trajectory:

```FOR pt IN SELECT (ST_DumpPoints(traj)).geom LOOP
-- here comes the magic!
END LOOP;
```

Eventually the function should go through the trajectory and identify all segments that stay within an area with max_size diameter for at least min_duration time. To test for the area size, we can use:

```IF ST_MaxDistance(segment,pt) <= max_size THEN is_stop := true;
```

Putting everything together, my current implementation looks like this:

```CREATE OR REPLACE FUNCTION AG_DetectStops(traj geometry, max_size numeric, min_duration numeric)
RETURNS TABLE(sequence integer, geom geometry) LANGUAGE 'plpgsql'
AS \$BODY\$
DECLARE
pt geometry;
segment geometry;
is_stop boolean;
previously_stopped boolean;
stop_sequence integer;
p1 geometry;
BEGIN
segment := NULL;
sequence := 0;
is_stop := false;
previously_stopped := false;
p1 := NULL;
FOR pt IN SELECT (ST_DumpPoints(traj)).geom
LOOP
IF segment IS NULL AND p1 IS NULL THEN
p1 := pt;
ELSIF segment IS NULL THEN
segment := ST_MakeLine(p1,pt);
p1 := NULL;
IF ST_Length((segment)) <= max_size THEN is_stop := true; END IF;
ELSE
-- if we're in a stop, we want to grow the segment, otherwise we remove points to the specified min_duration
IF NOT is_stop THEN
WHILE ST_NPoints(segment) > 2 AND AG_Duration(ST_RemovePoint(segment,0)) >= min_duration LOOP
segment := ST_RemovePoint(segment,0);
END LOOP;
END IF;
-- a stop is identified if the segment stays within a circle of diameter = max_size
IF ST_Length((segment)) <= max_size THEN is_stop := true;  -- not necessary but faster
ELSIF ST_Distance((ST_StartPoint(segment)),(pt)) > max_size THEN is_stop := false;
ELSIF ST_MaxDistance(segment,pt) <= max_size THEN is_stop := true;
ELSE is_stop := false;
END IF;
-- if we found the end of a stop, we need to check if it lasted long enough
IF NOT is_stop AND previously_stopped THEN
IF ST_M(ST_PointN(segment,ST_NPoints(segment)-1))-ST_M(ST_StartPoint(segment)) >= min_duration THEN
geom := ST_RemovePoint(segment,ST_NPoints(segment)-1);
RETURN NEXT;
sequence := sequence + 1;
segment := NULL;
p1 := pt;
END IF;
END IF;
END IF;
previously_stopped := is_stop;
END LOOP;
IF previously_stopped AND AG_Duration(segment) >= min_duration THEN
geom := segment;
RETURN NEXT;
END IF;
END;
\$BODY\$;
```

While this function is not really short, it’s so much more readable than my previous attempts of doing this in pure SQL. Some of the lines for determining is_stop are not strictly necessary but they do speed up processing.

Performance still isn’t quite where I’d like it to be. I suspect that all the adding and removing points from linestring geometries is not ideal. In general, it’s quicker to find shorter stops in smaller areas than longer stop in bigger areas.

Let’s test!

Looking for a testing framework for PL/pgSQL, I found plpgunit on Github. While I did not end up using it, I did use its examples for inspiration to write a couple of tests, e.g.

```CREATE OR REPLACE FUNCTION test.stop_at_beginning() RETURNS void LANGUAGE 'plpgsql'
AS \$BODY\$
DECLARE t0 integer; n0 integer;
BEGIN
WITH temp AS ( SELECT AG_DetectStops(
ST_GeometryFromText('LinestringM(0 0 0, 0 0 1, 0.1 0.1 2, 2 2 3)'),
1,1) stop
)
SELECT ST_M(ST_StartPoint((stop).geom)),
ST_NPoints((stop).geom) FROM temp INTO t0, n0;
IF t0 = 0 AND n0 = 3
THEN RAISE INFO 'PASSED - Stop at the beginning of the trajectory';
ELSE RAISE INFO 'FAILED - Stop at the beginning of the trajectory';
END IF;
END; \$BODY\$;
```

Basically, each test is yet another PL/pgSQL function that doesn’t return anything (i.e. returns void) but outputs messages about the status of the test. Here I made heavy use of the PERFORM statement which executes the provided function but discards the results:

Update: The source code for this function is now available on https://github.com/anitagraser/postgis-spatiotemporal

Last week, I traveled to Salzburg to attend the 30th AGIT conference and co-located English-speaking GI_Forum. Like in previous year, there were a lot of mobility and transportation research related presentations. Here are my personal highlights:

This year’s keynotes touched on a wide range of issues, from Sandeep Singhal (Google Cloud Storage) who – when I asked about the big table queries he showed – stated that they are not using a spatial index but are rather brute-forcing their way through massive data sets, to Laxmi Ramasubramanian @nycplanner (Hunter College City University of New York) who cautioned against tech arrogance and tendency to ignore expertise from other fields such as urban planning:

One issue that Laxmi particularly highlighted was the fact that many local communities are fighting excessive traffic caused by apps like Waze that suggest shortcuts through residential neighborhoods. Just because we can do something with (mobility) data, doesn’t necessarily mean that we should!

Not limited to mobility but very focused on open source, Jochen Albrecht (Hunter College City University of New York) invited the audience to join his quest for a spatial decision support system based on FOSS only at https://github.com/geojochen/fosssdss

The session Spatial Perspectives on Healthy Mobility featured multiple interesting contributions, particularly by Michelle P. Fillekes who presented a framework of mobility indicators to assess daily mobility of study participants. It considers both spatial and temporal aspects of movement, as well as the movement context:

Figure from Michelle Pasquale Fillekes, Eleftheria Giannouli, Wiebren Zijlstra, Robert Weibel. Towards a Framework for Assessing Daily Mobility using GPS Data. DOI: 10.1553/giscience2018_01_s177 (under cc-by-nd)

It was also good to see that topics we’ve been working on in the past (popularity routing in this case) continue to be relevant and have been picked up in the German-speaking part of the conference:

Of course, I also presented some new work of my own, specifically my research into PostGIS trajectory datatypes which I’ve partially covered in a previous post on this blog and which is now published in Graser, A. (2018) Evaluating Spatio-temporal Data Models for Trajectories in PostGIS Databases. GI_Forum ‒ Journal of Geographic Information Science, 1-2018, 16-33. DOI: 10.1553/giscience2018_01_s16.

My introduction to GeoMesa talk failed to turn up any fellow Austrian GeoMesa users. So I’ll keep on looking and spreading the word. The most common question – and certainly no easy one at that – is how to determine the point where it becomes worth it to advance from regular databases to big data systems. It’s not just about the size of the data but also about how it is intended to be used. And of course, if you are one of those db admin whizzes who manages a distributed PostGIS setup in their sleep, you might be able to push the boundaries pretty far. On the other hand, if you already have some experience with the Hadoop ecosystem, getting started with tools like GeoMesa shouldn’t be too huge a step either. But that’s a topic for another day!

Since AGIT&GI_Forum are quite a big event with over 1,000 participants, it was not limited to movement data topics. You can find the first installment of English papers in GI_Forum 2018, Volume 1. As I understand it, there will be a second volume with more papers later this year.

This post is part of a series. Read more about movement data in GIS.

In Movement data in GIS #2: visualization I mentioned that it should be possible to label trajectory segments without having to break the original trajectory feature. While it’s not a straightforward process, it is indeed possible to create timestamp labels at desired intervals:

The main point here is that we cannot use regular labels because there would be only one label for the whole trajectory feature. Instead, we are using a marker line with a font marker:

By default, font markers only display one character from a given font but by using expressions we can make it display longer text, including datetime strings:

If you want to have a label at every node of the trajectory, the expression looks like this:

```format_date(
to_datetime('1970-01-01T00:00:00Z')+to_interval(
m(start_point(geometry_n(
segments_to_lines( \$geometry ),
@geometry_part_num)
))||' seconds'
),
'HH:mm:ss'
)
```

You probably remember those parts of the expression that extract the m value from previous posts. Note that – compared to 2016 – it is now necessary to add the segments_to_lines() function.

The m value (which stores time as seconds since Unix epoch) is then converted to datetime and finally formatted to only show time. Of course you can edit the datetime format string to also include the date.

If we only want a label every 30 seconds, we can add a case statement around that:

```CASE WHEN
m(start_point(geometry_n(
segments_to_lines( \$geometry ),
@geometry_part_num)
)) % 30 = 0
THEN
format_date(
to_datetime('1970-01-01T00:00:00Z')+to_interval(
m(start_point(geometry_n(
segments_to_lines( \$geometry ),
@geometry_part_num)
))||' seconds'
),
'HH:mm:ss'
)
END
```

This works well if the trajectory sampling interval is fairly regular. This is not always the case and that means that the above case statement wouldn’t find many nodes with a timestamp that ends in :30 or :00. In such a case, we could resort to labeling nodes based on their order in the linestring:

```CASE WHEN
@geometry_part_num  % 30 = 0
THEN
...
```

Thanks a lot to @JuergenEFischer for providing a solution for converting seconds since Unix epoch to datetime without a custom function!

Note that expressions using @geometry_part_num currently suffer from the following issue: Combination of segments_to_lines(\$geometry) and @geometry_part_num gives wrong segment numbers

This post is part of a series. Read more about movement data in GIS.

In short: both writing trajectory queries as well as executing them is considerably faster using PostGIS trajectories (as LinestringM) rather than the commonly used point-based approach.

Here are a couple of examples to give you an impression of the differences.

Spoiler alert! Trajectory queries are up to 500 times faster than comparable point-based queries.

A quick look at indexing

In both cases, we have indexed the tracker id, geometry, and time columns to speed up query processing.

The trajectory table has 3 indexes

• gist (time_range)
• gist (track gist_geometry_ops_nd)
• btree (tracker)

The point-based table has 4 indexes

• gist (pt)
• btree (trajectory_id)
• btree (tracker)
• btree (t)

Length

First, let’s see how to determine trajectory length for all observed moving objects (identified by a tracker id).

Using the point-based approach, we first need to ensure that the points are in the correct temporal order, create the lines, and finally sum up their length:

```WITH ordered AS (
SELECT trajectory_id, tracker, t, pt
FROM geolife.trajectory_pt
ORDER BY t
), tmp AS (
SELECT trajectory_id, tracker, st_makeline(pt) traj
FROM ordered
GROUP BY trajectory_id, tracker
)
SELECT tracker, round(sum(ST_Length(traj::geography)))
FROM tmp
GROUP BY tracker
ORDER BY tracker
```

With trajectories, we can go right to computing lengths:

```SELECT tracker, round(sum(ST_Length(track::geography)))
FROM geolife.trajectory_ext
GROUP BY tracker
ORDER BY tracker
```

On my test system, the trajectory query run time is 22.7 sec instead of 43.0 sec for the point-based approach:

Duration

Compared to trajectory length, duration is less complicated in the point-based approach:

```WITH tmp AS (
SELECT trajectory_id, tracker, min(t) start_time, max(t) end_time
FROM geolife.trajectory_pt
GROUP BY trajectory_id, tracker
)
SELECT tracker, sum(end_time - start_time)
FROM tmp
GROUP BY tracker
ORDER BY tracker
```

Still, the trajectory query is less complex and much faster at 31 ms instead of 6.0 sec:

```SELECT tracker, sum(upper(time_range) - lower(time_range))
FROM geolife.trajectory_ext
GROUP BY tracker
ORDER BY tracker
```

Temporal filter

Extracting trajectories that occurred during a certain time frame is another common use case:

```WITH tmp AS (
SELECT trajectory_id, tracker, min(t) start_time, max(t) end_time
FROM geolife.trajectory_pt
GROUP BY trajectory_id, tracker
)
SELECT trajectory_id, tracker, start_time, end_time
FROM tmp
WHERE end_time > '2008-11-26 11:00'
AND start_time < '2008-11-26 15:00'
ORDER BY tracker
```

This point-based query takes 6.0 sec while the shorter trajectory query finishes in 12 ms:

```SELECT id, tracker, time_range
FROM geolife.trajectory_ext
WHERE time_range && '[2008-11-26 11:00+1,2008-11-26 15:00+01]'::tstzrange
```

or equally fast (12 ms) by making use of the n-dimensional index:

```WHERE track &&&	ST_Collect(
ST_MakePointM(-180, -90, extract(epoch from '2008-11-26 11:00'::timestamptz)),
ST_MakePointM(180, 90, extract(epoch from '2008-11-26 15:00'::timestamptz))
)
```

Spatial filter

Finally, of course, let’s have a look at spatial filters, for example, trajectories that start in a certain area:

```WITH my AS (
SELECT ST_Buffer(ST_SetSRID(ST_MakePoint(116.31894,39.97472),4326),0.0005) areaA
), tmp AS (
SELECT trajectory_id, tracker, min(t) t
FROM geolife.trajectory_pt
GROUP BY trajectory_id, tracker
)
SELECT distinct traj.tracker, traj.trajectory_id
FROM tmp
JOIN geolife.trajectory_pt traj
ON tmp.trajectory_id = traj.trajectory_id AND traj.t = tmp.t
JOIN my
ON ST_Within(traj.pt, my.areaA)
```

This point-based query takes 6.0 sec while the shorter trajectory query finishes in 488 ms:

```WITH my AS (
SELECT ST_Buffer(ST_SetSRID(ST_MakePoint(116.31894, 39.97472),4326),0.0005) areaA
)
SELECT id, tracker, ST_AsText(track)
FROM geolife.trajectory_ext
JOIN my
ON areaA && track
AND ST_Within(ST_StartPoint(track), areaA)
```

For more generic “does this trajectory intersect another geometry”, the points can also be aggregated to a linestring on the fly but that takes 21.9 sec:

I’ll be presenting more work on PostGIS trajectories at GI_Forum in Salzburg in July. In the talk, I’ll also have a look at the custom PG-Trajectory datatype. Here’s the full open-access paper:

Graser, A. (2018) Evaluating Spatio-temporal Data Models for Trajectories in PostGIS Databases. GI_Forum ‒ Journal of Geographic Information Science, 1-2018, 16-33. DOI: 10.1553/giscience2018_01_s16.

You can find my fork of the PG-Trajectory project – including all necessary fixes – on Bitbucket.

This post is part of a series. Read more about movement data in GIS.

Many of the topics I’ve covered in recent “Movement data in GIS” posts, have also been discussed at this year’s FOSS4G. Here’s a list of videos for you to learn more about the OGC Moving Features standard, modelling AIS data with FOSS, and more:

1. Introduction to the OGC Moving Features standard presented by Kyoung-Sook Kim from the Artificial Intelligence Research Center, Japan:

2. Modeling AIS data using GDAL & PostGIS presented by Morten Aronsen from the Norwegian Defence Research Establishment:

3. 3D visualization of movement data from videos presented by Anna Petrasova from the Center for Geospatial Analysis, North Carolina State University:

There are also a ton of Docker presentations on the FOSS4G2017 Vimeo channel, if you liked “Docker basics with Geodocker GeoServer”.

This post is part of a series. Read more about movement data in GIS.

MarineCadastre.gov is a great source for AIS data along the US coast. Their data formats and tools though are less open. Luckily, GDAL – and therefore QGIS – can read ESRI File Geodatabases (.gdb).

MarineCadastre.gov also offer a Track Builder script that creates lines out of the broadcast points. (It can also join additional information from the vessel and voyage layers.) We could reproduce the line creation step using tools such as Processing’s Point to path but this post will show how to create PostGIS trajectories instead.

First, we have to import the points into PostGIS using either DB Manager or Processing’s Import into PostGIS tool:

Then we can create the trajectories. I’ve opted to create a materialized view:

The first part of the query creates a temporary table called ptm (short for PointM). This step adds time stamp information to each point. The second part of the query then aggregates these PointMs into trajectories of type LineStringM.

```CREATE MATERIALIZED VIEW ais.trajectory AS
WITH ptm AS (
SELECT b.mmsi,
st_makepointm(
st_x(b.geom),
st_y(b.geom),
date_part('epoch', b.basedatetime)
) AS pt,
b.basedatetime t
ORDER BY mmsi, basedatetime
)
SELECT row_number() OVER () AS id,
st_makeline(ptm.pt) AS st_makeline,
ptm.mmsi,
min(ptm.t) AS min_t,
max(ptm.t) AS max_t
FROM ptm
GROUP BY ptm.mmsi
WITH DATA;
```

The trajectory start and end times (min_t and max_t) are optional but they can help speed up future queries.

One of the advantages of creating trajectory lines is that they render many times faster than the original points.

Of course, we end up with some artifacts at the border of the dataset extent. (Files are split by UTM zone.) Trajectories connect the last known position before the vessel left the observed area with the position of reentry. This results, for example, in vertical lines which you can see in the bottom left corner of the above screenshot.

With the trajectories ready, we can go ahead and start exploring the dataset. For example, we can visualize trajectory speed and/or create animations:

Purple trajectory segments are slow while green segments are faster

We can also perform trajectory analysis, such as trajectory generalization:

This is a first proof of concept. It would be great to have a script that automatically fetches the datasets for a specified time frame and list of UTM zones and loads them into PostGIS for further processing. In addition, it would be great to also make use of the information in the vessel and voyage tables, thus splitting up trajectories into individual voyages.

This post is part of a series. Read more about movement data in GIS.

There are multiple ways to model trajectory data. This post takes a closer look at the OGC® Moving Features Encoding Extension: Simple Comma Separated Values (CSV). This standard has been published in 2015 but I haven’t been able to find any reviews of the standard (in a GIS context or anywhere else).

The following analysis is based on the official OGC trajectory example at http://docs.opengeospatial.org/is/14-084r2/14-084r2.html#42. The header consists of two lines: the first line provides some meta information while the second defines the CSV columns. The data model is segment based. That is, each line describes a trajectory segment with at least two coordinate pairs (or triplets for 3D trajectories). For each segment, there is a start and an end time which can be specified as absolute or relative (offset) values:

```@stboundedby,urn:x-ogc:def:crs:EPSG:6.6:4326,2D,50.23 9.23,50.31 9.27,2012-01-17T12:33:41Z,2012-01-17T12:37:00Z,sec @columns,mfidref,trajectory,state,xsd:token,”type code”,xsd:integer a, 10,150,11.0 2.0 12.0 3.0,walking,1 b, 10,190,10.0 2.0 11.0 3.0,walking,2 a,150,190,12.0 3.0 10.0 3.0,walking,2 c, 10,190,12.0 1.0 10.0 2.0 11.0 3.0,vehicle,1```

Let’s look at the first data row in detail:

• `a` … trajectory id
• `10` … start time offset from `2012-01-17T12:33:41Z` in seconds
• `150` … end time offset from `2012-01-17T12:33:41Z` in seconds
• `11.0 2.0 12.0 3.0` … trajectory coordinates: x1, y1, x2, y2
• `walking` …  state
• `1`… type code

My main issues with this approach are

1. They missed the chance to use WKT notation to make the CSV easily readable by existing GIS tools.
2. As far as I can see, the data model requires a regular sampling interval because there is no way to store time stamps for intermediate positions along trajectory segments. (Irregular intervals can be stored using segments for each pair of consecutive locations.)

In the common GIS simple feature data model (which is point-based), the same data would look something like this:

```traj_id,x,y,t,state,type_code a,11.0,2.0,2012-01-17T12:33:51Z,walking,1 a,12.0,3.0,2012-01-17T12:36:11Z,walking,1 a,10.0,3.0,2012-01-17T12:36:51Z,walking,2 b,10.0,2.0,2012-01-17T12:33:51Z,walking,2 b,11.0,3.0,2012-01-17T12:36:51Z,walking,2 c,12.0,1.0,2012-01-17T12:33:51Z,vehicle,1 c,10.0,2.0,2012-01-17T12:35:21Z,vehicle,1 c,11.0,3.0,2012-01-17T12:36:51Z,vehicle,1```

The main issue here is that there has to be some application logic that knows how to translate from points to trajectory. For example, trajectory a changes from walking1 to walking2 at `2012-01-17T12:36:11Z` but we have to decide whether to store the previous or the following state code for this individual point.

An alternative to the common simple feature model is the PostGIS trajectory data model (which is LineStringM-based). For this data model, we need to convert time stamps to numeric values, e.g. 2012-01-17T12:33:41Z is 1326803621 in Unix time. In this data model, the data looks like this:

```traj_id,trajectory,state,type_code a,LINESTRINGM(11.0 2.0 1326803631, 12.0 3.0 1326803771),walking,1 a,LINESTRINGM(12.0 3.0 1326803771, 10.0 3.0 1326803811),walking,2 b,LINESTRINGM(10.0 2.0 1326803631, 11.0 3.0 1326803811),walking,2 c,LINESTRINGM(12.0 1.0 1326803631, 10.0 2.0 1326803771, 11.0 3.0 1326803811),vehicle,1```

This is very similar to the OGC data model, with the notable difference that every position is time-stamped (instead of just having segment start and end times). If one has movement data which is recorded at regular intervals, the OGC data model can be a bit more compact, but if the trajectories are sampled at irregular intervals, each point pair will have to be modeled as a separate segment.

Since the PostGIS data model is flexible, explicit, and comes with existing GIS tool support, it’s my clear favorite.

This post is part of a series. Read more about movement data in GIS.

Today’s post is a follow-up of Movement data in GIS #3: visualizing massive trajectory datasets. In that post, I summarized a concept for trajectory generalization. Now, I have published the scripts and sample data in my QGIS-Processing-tools repository on Github.

To add the trajectory generalization scripts to your Processing toolbox, you can use the Add scripts from files tool:

It is worth noting, that Add scripts from files fails to correctly import potential help files for the scripts but that’s not an issue this time around, since I haven’t gotten around to actually write help files yet.

The scripts are used in the following order:

1. Extract characteristic trajectory points
2. Group points in space
3. Compute flows between cells from trajectories

The sample project contains input data, as well as output layers of the individual tools. The only required input is a layer of trajectories, where trajectories have to be LINESTRINGM (note the M!) features:

Trajectory sample based on data provided by the GeoLife project

In Extract characteristic trajectory points, distance parameters are specified in meters, stop duration in seconds, and angles in degrees. The characteristic points contain start and end locations, as well as turns and stop locations:

The characteristic points are then clustered. In this tool, the distance has to be specified in layer units, which are degrees in case of the sample data.

Finally, we can compute flows between cells defined by these clusters:

Flow lines scaled by flow strength and cell centers scaled by counts

If you use these tools on your own data, I’d be happy so see what you come up with!

This post is part of a series. Read more about movement data in GIS.