Movement data in GIS #15: writing a PL/pgSQL stop detection function for PostGIS trajectories
Do you sometimes start writing an SQL query and around at line 50 you get the feeling that it might be getting out of hand? If so, it might be useful to start breaking it down into smaller chunks and wrap those up into custom functions. Never done that? Don’t despair! There’s an excellent PL/pgSQL tutorial on postgresqltutorial.com to get you started.
To get an idea of the basic structure of a PL/pgSQL function and to proof that PostGIS datatypes work just fine in this context, here’s a basic function that takes a trajectory geometry and outputs its duration, i.e. the difference between its last and first timestamp:
CREATE OR REPLACE FUNCTION AG_Duration(traj geometry) RETURNS numeric LANGUAGE 'plpgsql' AS $BODY$ BEGIN RETURN ST_M(ST_EndPoint(traj))-ST_M(ST_StartPoint(traj)); END; $BODY$;
My end goal for this exercise was to implement a function that takes a trajectory and outputs the stops along this trajectory. Commonly, a stop is defined as a long stay within an area with a small radius. This leads us to the following definition:
CREATE OR REPLACE FUNCTION AG_DetectStops( traj geometry, max_size numeric, min_duration numeric) RETURNS TABLE(sequence integer, geom geometry) -- implementation follows here!
Note how this function uses RETURNS TABLE to enable it to return all the stops that it finds. To add a line to the output table, we need to assign values to the sequence and geom variables and then use RETURN NEXT.
Another reason to use PL/pgSQL is that it enables us to write loops. And loops I wanted for my stop detection function! Specifically, I wanted to go through all the points in the trajectory:
FOR pt IN SELECT (ST_DumpPoints(traj)).geom LOOP -- here comes the magic! END LOOP;
Eventually the function should go through the trajectory and identify all segments that stay within an area with max_size diameter for at least min_duration time. To test for the area size, we can use:
IF ST_MaxDistance(segment,pt) <= max_size THEN is_stop := true;
Putting everything together, my current implementation looks like this:
CREATE OR REPLACE FUNCTION AG_DetectStops(traj geometry, max_size numeric, min_duration numeric) RETURNS TABLE(sequence integer, geom geometry) LANGUAGE 'plpgsql' AS $BODY$ DECLARE pt geometry; segment geometry; is_stop boolean; previously_stopped boolean; stop_sequence integer; p1 geometry; BEGIN segment := NULL; sequence := 0; is_stop := false; previously_stopped := false; p1 := NULL; FOR pt IN SELECT (ST_DumpPoints(traj)).geom LOOP IF segment IS NULL AND p1 IS NULL THEN p1 := pt; ELSIF segment IS NULL THEN segment := ST_MakeLine(p1,pt); p1 := NULL; IF ST_Length((segment)) <= max_size THEN is_stop := true; END IF; ELSE segment := ST_AddPoint(segment,pt); -- if we're in a stop, we want to grow the segment, otherwise we remove points to the specified min_duration IF NOT is_stop THEN WHILE ST_NPoints(segment) > 2 AND AG_Duration(ST_RemovePoint(segment,0)) >= min_duration LOOP segment := ST_RemovePoint(segment,0); END LOOP; END IF; -- a stop is identified if the segment stays within a circle of diameter = max_size IF ST_Length((segment)) <= max_size THEN is_stop := true; -- not necessary but faster ELSIF ST_Distance((ST_StartPoint(segment)),(pt)) > max_size THEN is_stop := false; ELSIF ST_MaxDistance(segment,pt) <= max_size THEN is_stop := true; ELSE is_stop := false; END IF; -- if we found the end of a stop, we need to check if it lasted long enough IF NOT is_stop AND previously_stopped THEN IF ST_M(ST_PointN(segment,ST_NPoints(segment)-1))-ST_M(ST_StartPoint(segment)) >= min_duration THEN geom := ST_RemovePoint(segment,ST_NPoints(segment)-1); RETURN NEXT; sequence := sequence + 1; segment := NULL; p1 := pt; END IF; END IF; END IF; previously_stopped := is_stop; END LOOP; IF previously_stopped AND AG_Duration(segment) >= min_duration THEN geom := segment; RETURN NEXT; END IF; END; $BODY$;
While this function is not really short, it’s so much more readable than my previous attempts of doing this in pure SQL. Some of the lines for determining is_stop are not strictly necessary but they do speed up processing.
Performance still isn’t quite where I’d like it to be. I suspect that all the adding and removing points from linestring geometries is not ideal. In general, it’s quicker to find shorter stops in smaller areas than longer stop in bigger areas.
Let’s test!
Looking for a testing framework for PL/pgSQL, I found plpgunit on Github. While I did not end up using it, I did use its examples for inspiration to write a couple of tests, e.g.
CREATE OR REPLACE FUNCTION test.stop_at_beginning() RETURNS void LANGUAGE 'plpgsql' AS $BODY$ DECLARE t0 integer; n0 integer; BEGIN WITH temp AS ( SELECT AG_DetectStops( ST_GeometryFromText('LinestringM(0 0 0, 0 0 1, 0.1 0.1 2, 2 2 3)'), 1,1) stop ) SELECT ST_M(ST_StartPoint((stop).geom)), ST_NPoints((stop).geom) FROM temp INTO t0, n0; IF t0 = 0 AND n0 = 3 THEN RAISE INFO 'PASSED - Stop at the beginning of the trajectory'; ELSE RAISE INFO 'FAILED - Stop at the beginning of the trajectory'; END IF; END; $BODY$;
Basically, each test is yet another PL/pgSQL function that doesn’t return anything (i.e. returns void) but outputs messages about the status of the test. Here I made heavy use of the PERFORM statement which executes the provided function but discards the results:
Update: The source code for this function is now available on https://github.com/anitagraser/postgis-spatiotemporal