Advertisements

Archive

Author Archives: underdark

In previous posts, I already wrote about Trajectools and some of the functionality it provides to QGIS Processing including:

There are also tools to compute heading and speed which I only talked about on Twitter.

Trajectools is now available from the QGIS plugin repository.

The plugin includes sample data from MarineCadastre downloads and the Geolife project.

Under the hood, Trajectools depends on GeoPandas!

If you are on Windows, here’s how to install GeoPandas for OSGeo4W:

  1. OSGeo4W installer: install python3-pip
  2. Environment variables: add GDAL_VERSION = 2.3.2 (or whichever version your OSGeo4W installation currently includes)
  3. OSGeo4W shell: call C:\OSGeo4W64\bin\py3_env.bat
  4. OSGeo4W shell: pip3 install geopandas (this will error at fiona)
  5. From https://www.lfd.uci.edu/~gohlke/pythonlibs/#fiona: download Fiona-1.7.13-cp37-cp37m-win_amd64.whl
  6. OSGeo4W shell: pip3 install path-to-download\Fiona-1.7.13-cp37-cp37m-win_amd64.whl
  7. OSGeo4W shell: pip3 install geopandas
  8. (optionally) From https://www.lfd.uci.edu/~gohlke/pythonlibs/#rtree: download Rtree-0.8.3-cp37-cp37m-win_amd64.whl and pip3 install it

If you want to use this functionality outside of QGIS, head over to my movingpandas project!

Advertisements

Yesterday, I learned about a cool use case in data-driven agriculture that requires dealing with delayed measurements. As Bert mentions, for example, potatoes end up in the machines and are counted a few seconds after they’re actually taken out of the ground:

Therefore, in order to accurately map yield, we need to take this temporal offset into account.

We need to make sure that time and location stay untouched, but need to shift the potato count value. To support this use case, I’ve implemented apply_offset_seconds() for trajectories in movingpandas:

    def apply_offset_seconds(self, column, offset):
        self.df[column] = self.df[column].shift(offset, freq='1s')

The following test illustrates its use: you can see how the value column is shifted by 120 second. Geometry and time remain unchanged but the value column is shifted accordingly. In this test, we look at the row with index 2 which we access using iloc[2]:

    def test_offset_seconds(self):
        df = pd.DataFrame([
            {'geometry': Point(0, 0), 't': datetime(2018, 1, 1, 12, 0, 0), 'value': 1},
            {'geometry': Point(-6, 10), 't': datetime(2018, 1, 1, 12, 1, 0), 'value': 2},
            {'geometry': Point(6, 6), 't': datetime(2018, 1, 1, 12, 2, 0), 'value': 3},
            {'geometry': Point(6, 12), 't': datetime(2018, 1, 1, 12, 3, 0), 'value':4},
            {'geometry': Point(6, 18), 't': datetime(2018, 1, 1, 12, 4, 0), 'value':5}
        ]).set_index('t')
        geo_df = GeoDataFrame(df, crs={'init': '31256'})
        traj = Trajectory(1, geo_df)
        traj.apply_offset_seconds('value', -120)
        self.assertEqual(traj.df.iloc[2].value, 5)
        self.assertEqual(traj.df.iloc[2].geometry, Point(6, 6))

Many current movement data sources provide more or less continuous streams of object locations. For example, the AIS system provides continuous locations of vessels (mostly ships). This continuous stream of locations – let’s call it track – starts when we first record the vessel and ends with the last record. This start and end does not necessarily coincide with the start or end of a vessel voyage from one port to another. The stream start and end do not have any particular meaning. Instead, if we want to see what’s going on, we need to split the track into meaningful segments. One such segmentation – albeit a simple one – is to split tracks by day. This segmentation assumes that day/night changes affect the movement of our observed object. For many types of objects – those who mostly stay still during the night – this will work reasonably well.

For example, the following screenshot shows raw data of one particular vessel in the Boston region. By default, QGIS provides a Points to Path to convert points to lines. This tool takes one “group by” and one “order by” field. Therefore, if we want one trajectory per ship per day, we’d first have to create a new field that combines ship ID and day so that we can use this combination as a “group by” field. Additionally, the resulting lines loose all temporal information.

To simplify this workflow, Trajectools now provides a new algorithm that creates day trajectories and outputs LinestringM features. Using the Day trajectories from point layer tool, we can immediately see that our vessel of interest has been active for three consecutive days: entering our observation area on Nov 5th, moving to Boston where it stayed over night, then moving south to Weymouth on the next day, and leaving on the 7th.

Since the resulting trajectories are LinestringM features with time information stored in the M value, we can also visualize the speed of movement (as discussed in part #2 of this series):

Pandas is great for data munging and with the help of GeoPandas, these capabilities expand into the spatial realm.

With just two lines, it’s quick and easy to transform a plain headerless CSV file into a GeoDataFrame. (If your CSV is nice and already contains a header, you can skip the header=None and names=FILE_HEADER parameters.)

usecols=USE_COLS is also optional and allows us to specify that we only want to use a subset of the columns available in the CSV.

After the obligatory imports and setting of variables, all we need to do is read the CSV into a regular DataFrame and then construct a GeoDataFrame.

import pandas as pd
from geopandas import GeoDataFrame
from shapely.geometry import Point

FILE_NAME = "/temp/your.csv"
FILE_HEADER = ['a', 'b', 'c', 'd', 'e', 'x', 'y']
USE_COLS = ['a', 'x', 'y']

df = pd.read_csv(
    FILE_NAME, delimiter=";", header=None,
    names=FILE_HEADER, usecols=USE_COLS)
gdf = GeoDataFrame(
    df.drop(['x', 'y'], axis=1),
    crs={'init': 'epsg:4326'},
    geometry=[Point(xy) for xy in zip(df.x, df.y)])

It’s also possible to create the point objects using a lambda function as shown by weiji14 on GIS.SE.

PyQGIS 101: Introduction to QGIS Python programming for non-programmers has now reached the part 10 milestone!

Beyond the obligatory Hello world! example, the contents so far include:

If you’ve been thinking about learning Python programming, but never got around to actually start doing it, give PyQGIS101 a try.

I’d like to thank everyone who has already provided feedback to the exercises. Every comment is important to help me understand the pain points of learning Python for QGIS.

I recently read an article – unfortunately I forgot to bookmark it and cannot locate it anymore – that described the problems with learning to program very well: in the beginning, it’s rather slow going, you don’t know the right terminology and therefore don’t know what to google for when you run into issues. But there comes this point, when you finally get it, when the terminology becomes clearer, when you start thinking “that might work” and it actually does! I hope that PyQGIS101 will be a help along the way.

We’ve seen a lot of explorative movement data analysis in the Movement data in GIS series so far. Beyond exploration, predictive analysis is another major topic in movement data analysis. One of the most obvious movement prediction use cases is trajectory prediction, i.e. trying to predict where a moving object will be in the future. The two main categories of trajectory prediction methods I see are those that try to predict the actual path that a moving object will take versus those that only try to predict the next destination.

Today, I want to focus on prediction methods that predict the path that a moving object is going to take. There are many different approaches from simple linear prediction to very sophisticated application-dependent methods. Regardless of the prediction method though, there is the question of how to evaluate the prediction results when these methods are applied to real-life data.

As long as we work with nice, densely, and regularly updated movement data, extracting evaluation samples is rather straightforward. To predict future movement, we need some information about past movement. Based on that past movement, we can then try to predict future positions. For example, given a trajectory that is twenty minutes long, we can extract a sample that provides five minutes of past movement, as well as the actually observed position five minutes into the future:

But what if the trajectory is irregularly updated? Do we interpolate the positions at the desired five minute timestamps? Do we try to shift the sample until – by chance – we find a section along the trajectory where the updates match our desired pattern? What if location timestamps include seconds or milliseconds and we therefore cannot find exact matches? Should we introduce a tolerance parameter that would allow us to match locations with approximately the same timestamp?

Depending on the duration of observation gaps in our trajectory, it might not be a good idea to simply interpolate locations since these interpolated locations could systematically bias our evaluation. Therefore, the safest approach may be to shift the sample pattern along the trajectory until a close match (within the specified tolerance) is found. This approach is now implemented in MovingPandas’ TrajectorySampler.

def test_sample_irregular_updates(self):
    df = pd.DataFrame([
        {'geometry':Point(0,0), 't':datetime(2018,1,1,12,0,1)},
        {'geometry':Point(0,3), 't':datetime(2018,1,1,12,3,2)},
        {'geometry':Point(0,6), 't':datetime(2018,1,1,12,6,1)},
        {'geometry':Point(0,9), 't':datetime(2018,1,1,12,9,2)},
        {'geometry':Point(0,10), 't':datetime(2018,1,1,12,10,2)},
        {'geometry':Point(0,14), 't':datetime(2018,1,1,12,14,3)},
        {'geometry':Point(0,19), 't':datetime(2018,1,1,12,19,4)},
        {'geometry':Point(0,20), 't':datetime(2018,1,1,12,20,0)}
        ]).set_index('t')
    geo_df = GeoDataFrame(df, crs={'init': '4326'})
    traj = Trajectory(1,geo_df)
    sampler = TrajectorySampler(traj, timedelta(seconds=5))
    past_timedelta = timedelta(minutes=5)
    future_timedelta = timedelta(minutes=5)
    sample = sampler.get_sample(past_timedelta, future_timedelta)
    result = sample.future_pos.wkt
    expected_result = "POINT (0 19)"
    self.assertEqual(result, expected_result)
    result = sample.past_traj.to_linestring().wkt
    expected_result = "LINESTRING (0 9, 0 10, 0 14)"
    self.assertEqual(result, expected_result)

The repository also includes a demo that illustrates how to split trajectories using a grid and finally extract samples:

 

In Movement data in GIS #16, I presented a new way to deal with trajectory data using GeoPandas and how to load the trajectory GeoDataframes as a QGIS layer. Following up on this initial experiment, I’ve now implemented a first version of an algorithm that performs a spatial analysis on my GeoPandas trajectories.

The first spatial analysis algorithm I’ve implemented is Clip trajectories by extent. Implementing this algorithm revealed a couple of pitfalls:

  • To achieve correct results, we need to compute spatial intersections between linear trajectory segments and the extent. Therefore, we need to convert our point GeoDataframe to a line GeoDataframe.
  • Based on the spatial intersection, we need to take care of computing the corresponding timestamps of the events when trajectories enter or leave the extent.
  • A trajectory can intersect the extent multiple times. Therefore, we cannot simply use the global minimum and maximum timestamp of intersecting segments.
  • GeoPandas provides spatial intersection functionality but if the trajectory contains consecutive rows without location change, these will result in zero length lines and those cause an empty intersection result.

So far, the clip result only contains the trajectory id plus a suffix indicating the sequence of the intersection segments for a specific trajectory (because one trajectory can intersect the extent multiple times). The following screenshot shows one highlighted trajectory that intersects the extent three times and the resulting clipped trajectories:

This algorithm together with the basic trajectory from points algorithm is now available in a Processing algorithm provider plugin called Processing Trajectory.

Note: This plugin depends on GeoPandas.

Note for Windows users: GeoPandas is not a standard package that is available in OSGeo4W, so you’ll have to install it manually. (For the necessary steps, see this answer on gis.stackexchange.com)

The implemented tests show how to use the Trajectory class independently of QGIS. So far, I’m only testing the spatial properties though:

def test_two_intersections_with_same_polygon(self):
    polygon = Polygon([(5,-5),(7,-5),(7,12),(5,12),(5,-5)])
    data = [{'id':1, 'geometry':Point(0,0), 't':datetime(2018,1,1,12,0,0)},
        {'id':1, 'geometry':Point(6,0), 't':datetime(2018,1,1,12,10,0)},
        {'id':1, 'geometry':Point(10,0), 't':datetime(2018,1,1,12,15,0)},
        {'id':1, 'geometry':Point(10,10), 't':datetime(2018,1,1,12,30,0)},
        {'id':1, 'geometry':Point(0,10), 't':datetime(2018,1,1,13,0,0)}]
    df = pd.DataFrame(data).set_index('t')
    geo_df = GeoDataFrame(df, crs={'init': '31256'})
    traj = Trajectory(1, geo_df)
    intersections = traj.intersection(polygon)
    result = []
    for x in intersections:
        result.append(x.to_linestring())
    expected_result = [LineString([(5,0),(6,0),(7,0)]), LineString([(7,10),(5,10)])]
    self.assertEqual(result, expected_result) 

One issue with implementing the algorithms as QGIS Processing tools in this way is that the tools are independent of one another. That means that each tool has to repeat the expensive step of creating the trajectory objects in memory. I’m not sure this can be solved.

%d bloggers like this: