Advertisements

Archive

QGIS

Many of my previous posts in this series [1][2][3] have relied on PostGIS for trajectory data handling. While I love PostGIS, it feels like overkill to require a database to analyze smaller movement datasets. Wouldn’t it be great to have a pure Python solution?

If we look into moving object data literature, beyond the “trajectories are points with timestamps” perspective, which is common in GIS, we also encounter the “trajectories are time series with coordinates” perspective. I don’t know about you, but if I hear “time series” and Python, I think Pandas! In the Python Data Science Handbook, Jake VanderPlas writes:

Pandas was developed in the context of financial modeling, so as you might expect, it contains a fairly extensive set of tools for working with dates, times, and time-indexed data.

Of course, time series are one thing, but spatial data handling is another. Lucky for us, this is where GeoPandas comes in. GeoPandas has been around for a while and version 0.4 has been released in June 2018. So far, I haven’t found examples that use GeoPandas to manage movement data, so I’ve set out to give it a shot. My trajectory class uses a GeoDataFrame df for data storage. For visualization purposes, it can be converted to a LineString:

import pandas as pd 
from geopandas import GeoDataFrame
from shapely.geometry import Point, LineString

class Trajectory():
    def __init__(self, id, df, id_col):
        self.id = id
        self.df = df    
        self.id_col = id_col
        
    def __str__(self):
        return "Trajectory {1} ({2} to {3}) | Size: {0}".format(
            self.df.geometry.count(), self.id, self.get_start_time(), 
            self.get_end_time())
        
    def get_start_time(self):
        return self.df.index.min()
        
    def get_end_time(self):
        return self.df.index.max()
        
    def to_linestring(self):
        return self.make_line(self.df)
        
    def make_line(self, df):
        if df.size > 1:
            return df.groupby(self.id_col)['geometry'].apply(
                lambda x: LineString(x.tolist())).values[0]
        else:
            raise RuntimeError('Dataframe needs at least two points to make line!')

    def get_position_at(self, t):
        try:
            return self.df.loc[t]['geometry'][0]
        except:
            return self.df.iloc[self.df.index.drop_duplicates().get_loc(
                t, method='nearest')]['geometry']

Of course, this class can be used in stand-alone Python scripts, but it can also be used in QGIS. The following script takes data from a QGIS point layer, creates a GeoDataFrame, and finally generates trajectories. These trajectories can then be added to the map as a line layer.

All we need to do to ensure that our data is ordered by time is to set the GeoDataFrame’s index to the time field. From then on, Pandas takes care of the time series aspects and we can access the index as shown in the Trajectory.get_position_at() function above.

# Get data from a point layer
l = iface.activeLayer()
time_field_name = 't'
trajectory_id_field = 'trajectory_id' 
names = [field.name() for field in l.fields()]
data = []
for feature in l.getFeatures():
    my_dict = {}
    for i, a in enumerate(feature.attributes()):
        my_dict[names[i]] = a
    x = feature.geometry().asPoint().x()
    y = feature.geometry().asPoint().y()
    my_dict['geometry']=Point((x,y))
    data.append(my_dict)

# Create a GeoDataFrame
df = pd.DataFrame(data).set_index(time_field_name)
crs = {'init': l.crs().geographicCrsAuthId()} 
geo_df = GeoDataFrame(df, crs=crs)
print(geo_df)

# Test if spatial functions work
print(geo_df.dissolve([True]*len(geo_df)).centroid)

# Create a QGIS layer for trajectory lines
vl = QgsVectorLayer("LineString", "trajectories", "memory")
vl.setCrs(l.crs()) # doesn't stop popup :(
pr = vl.dataProvider()
pr.addAttributes([QgsField("id", QVariant.String)])
vl.updateFields() 

df_by_id = dict(tuple(geo_df.groupby(trajectory_id_field)))
trajectories = {}
for key, value in df_by_id.items():
    traj = Trajectory(key, value, trajectory_id_field)
    trajectories[key] = traj
    line = QgsGeometry.fromWkt(traj.to_linestring().wkt)
    f = QgsFeature()
    f.setGeometry(line)
    f.setAttributes([key])
    pr.addFeature(f) 
print(trajectories[1])

vl.updateExtents() 
QgsProject.instance().addMapLayer(vl)

The following screenshot shows this script applied to a sample of the Geolife datasets containing 100 trajectories with a total of 236,776 points. On my notebook, the runtime is approx. 20 seconds.

So far, GeoPandas has proven to be a convenient way to handle time series with coordinates. Trying to implement some trajectory analysis tools will show if it is indeed a promising data structure for trajectories.

Advertisements

If you follow me on Twitter, you have probably already heard that the ebook of “QGIS Map Design 2nd Edition” has now been published and we are expecting the print version to be up for sale later this month. Gretchen Peterson and I – together with our editor Gary Sherman (yes, that Gary Sherman!) – have been working hard to provide you with tons of new and improved map design workflows and many many completely new maps. By Gretchen’s count, this edition contains 23 new maps, so it’s very hard to pick a favorite!

Like the 1st edition, we provide increasingly advanced recipes in three chapters, each focusing on either layer styling, labeling, or creating print layouts. If I had to pick a favorite, I’d have to go with “Mastering Rotated Maps”, one of the advanced recipes in the print layouts chapter. It looks deceptively simple but it combines a variety of great QGIS features and clever ideas to design a map that provides information on multiple levels of detail. Besides the name inspiring rotated map items, this design combines

  • map overviews
  • map themes
  • graduated lines and polygons
  • a rotated north arrow
  • fancy leader lines

all in one:

“QGIS Map Design 2nd Edition” provides how-to instructions, as well as data and project files for each recipe. So you can jump right into it and work with the provided materials or apply the techniques to your own data.

The ebook is available at LocatePress.

If you’re are following me on Twitter, you’ve certainly already read that I’m working on PyQGIS 101 a tutorial to help GIS users to get started with Python programming for QGIS.

I’ve often been asked to recommend Python tutorials for beginners and I’ve been surprised how difficult it can be to find an engaging tutorial for Python 3 that does not assume that the reader already knows all kinds of programming concepts.

It’s been a while since I started programming, but I do teach QGIS and Python programming for QGIS to university students and therefore have some ideas of which concepts are challenging. Nonetheless, it’s well possible that I overlook something that is not self explanatory. If you’re using PyQGIS 101 and find that some points could use further explanations, please leave a comment on the corresponding page.

PyQGIS 101 is a work in progress. I’d appreciate any feedback, particularly from beginners!

In Movement data in GIS #2: visualization I mentioned that it should be possible to label trajectory segments without having to break the original trajectory feature. While it’s not a straightforward process, it is indeed possible to create timestamp labels at desired intervals:

The main point here is that we cannot use regular labels because there would be only one label for the whole trajectory feature. Instead, we are using a marker line with a font marker:

By default, font markers only display one character from a given font but by using expressions we can make it display longer text, including datetime strings:

If you want to have a label at every node of the trajectory, the expression looks like this:

format_date( 
   to_datetime('1970-01-01T00:00:00Z')+to_interval(
      m(start_point(geometry_n(
         segments_to_lines( $geometry ),
         @geometry_part_num)
      ))||' seconds'
   ),
   'HH:mm:ss'
)

You probably remember those parts of the expression that extract the m value from previous posts. Note that – compared to 2016 – it is now necessary to add the segments_to_lines() function.

The m value (which stores time as seconds since Unix epoch) is then converted to datetime and finally formatted to only show time. Of course you can edit the datetime format string to also include the date.

If we only want a label every 30 seconds, we can add a case statement around that:

CASE WHEN 
m(start_point(geometry_n(
   segments_to_lines( $geometry ),
   @geometry_part_num)
)) % 30 = 0
THEN
format_date( 
   to_datetime('1970-01-01T00:00:00Z')+to_interval(
      m(start_point(geometry_n(
         segments_to_lines( $geometry ),
         @geometry_part_num)
      ))||' seconds'
   ),
   'HH:mm:ss'
)
END

This works well if the trajectory sampling interval is fairly regular. This is not always the case and that means that the above case statement wouldn’t find many nodes with a timestamp that ends in :30 or :00. In such a case, we could resort to labeling nodes based on their order in the linestring:

CASE WHEN 
 @geometry_part_num  % 30 = 0
THEN
...

Thanks a lot to @JuergenEFischer for providing a solution for converting seconds since Unix epoch to datetime without a custom function!

Note that expressions using @geometry_part_num currently suffer from the following issue: Combination of segments_to_lines($geometry) and @geometry_part_num gives wrong segment numbers


This post is part of a series. Read more about movement data in GIS.

Remember the good old times when all parameters in Processing were mandatory?

Inputs and outputs are fixed, and optional parameters or outputs are not supported. [Graser & Olaya, 2015]

Since QGIS 2.14, this is no longer the case. Scripts, as well as models, can now have optional parameters. Here is how for QGIS 3:

When defining a Processing script parameter, the parameter’s constructor takes a boolean flag indicating whether the parameter should be optional. It’s false by default:

class qgis.core.QgsProcessingParameterNumber(
   name: str, description: str = '', 
   type: QgsProcessingParameterNumber.Type = QgsProcessingParameterNumber.Integer, 
   defaultValue: Any = None, 
   optional: bool = False,
   minValue: float = -DBL_MAX+1, maxValue: float = DBL_MAX)

(Source: http://python.qgis.org/api/core/Processing/QgsProcessingParameterNumber.html)

One standard tool that uses optional parameters is Add autoincremental field:

From Python, this algorithm can be called with or without the optional parameters:

When building a model, an optional input can be assigned to the optional parameter. To create an optional input, make sure to deactivate the mandatory checkbox at the bottom of the input parameter definition:

Then this optional input can be used in an algorithm. For example, here the numerical input optional_value is passed to the Start values at parameter:

You can get access to all available inputs by clicking the … button next to the Start values at field. In this example, I have access to values of the input layer as well as  the optional value:

Once this is set up, this is how it looks when the model is run:

You can see that the optional value is indeed Not set.

References

Graser, A., & Olaya, V. (2015). Processing: A Python Framework for the Seamless Integration of Geoprocessing Tools in QGIS. ISPRS Int. J. Geo-Inf. 2015, 4, 2219-2245. doi:10.3390/ijgi4042219.

Processing has been overhauled significantly for QGIS 3.0. Besides speed-ups, one of the most obvious changes is the way to write Processing scripts. Instead of the old Processing-specific syntax, Processing scripts for QGIS3 are purely pythonic implementations of QgsProcessingAlgorithm.

Here’s a template that you can use to develop your own algorithms:

from qgis.PyQt.QtCore import QCoreApplication, QVariant
from qgis.core import (QgsField, QgsFeature, QgsFeatureSink, QgsFeatureRequest, QgsProcessing, QgsProcessingAlgorithm, QgsProcessingParameterFeatureSource, QgsProcessingParameterFeatureSink)
                      
class ExAlgo(QgsProcessingAlgorithm):
    INPUT = 'INPUT'
    OUTPUT = 'OUTPUT'

    def __init__(self):
        super().__init__()

    def name(self):
        return "exalgo"
    
    def tr(self, text):
        return QCoreApplication.translate("exalgo", text)
        
    def displayName(self):
        return self.tr("Example script")

    def group(self):
        return self.tr("Examples")

    def groupId(self):
        return "examples"

    def shortHelpString(self):
        return self.tr("Example script without logic")

    def helpUrl(self):
        return "https://qgis.org"
        
    def createInstance(self):
        return type(self)()
  
    def initAlgorithm(self, config=None):
        self.addParameter(QgsProcessingParameterFeatureSource(
            self.INPUT,
            self.tr("Input layer"),
            [QgsProcessing.TypeVectorAnyGeometry]))
        self.addParameter(QgsProcessingParameterFeatureSink(
            self.OUTPUT,
            self.tr("Output layer"),
            QgsProcessing.TypeVectorAnyGeometry))

    def processAlgorithm(self, parameters, context, feedback):
        source = self.parameterAsSource(parameters, self.INPUT, context)
        (sink, dest_id) = self.parameterAsSink(parameters, self.OUTPUT, context,
                                               source.fields(), source.wkbType(), source.sourceCrs())

        features = source.getFeatures(QgsFeatureRequest())
        for feat in features:
            out_feat = QgsFeature()
            out_feat.setGeometry(feat.geometry())
            out_feat.setAttributes(feat.attributes())
            sink.addFeature(out_feat, QgsFeatureSink.FastInsert)

        return {self.OUTPUT: dest_id}

This script just copies the features of the input layer to the output layer without any modifications. Add your logic to the processAlgorithm() function to get started.

Use Create New Script from the Toolbox toolbar:

Paste the example script:

Once saved, the script will show up in the Processing toolbox:

Joining polygon attributes to points based on their location is a very common GIS task. In QGIS 2, QGIS’ own implementation of “Join attributes by location” was much slower than SAGA’s “Add polygon attributes to points”. Thus, installations without SAGA were out of good options.

Luckily this issue (and many more) has been fixed by the rewrite of many geoprocessing algorithms for QGIS 3! Let’s revisit the comparison:

I’m using publicly available datasets from Naturalearth: The small scale populated places (243 points) and the large scale countries (255 polygons with many nodes). Turns out that QGIS 3’s built-in tool takes a little less than two seconds while the SAGA Processing tool requires a litte less than six seconds:

Like in the previous comparison, times were measured using the Python Console:

In both tools, only the countries’ SOVEREIGNT attribute is joined to the point attribute table:

import processing
t0 = datetime.datetime.now()
print("QGIS Join attributes by location ...")
processing.runAndLoadResults(
   "qgis:joinattributesbylocation", 
   {'INPUT':'E:/Geodata/NaturalEarth/vector_v4/natural_earth_vector/110m_cultural/ne_110m_populated_places.shp',
   'JOIN':'E:/Geodata/NaturalEarth/vector_v4/natural_earth_vector/10m_cultural/ne_10m_admin_0_countries.shp',
   'PREDICATE':[5],'JOIN_FIELDS':['SOVEREIGNT'],
   'METHOD':0,'DISCARD_NONMATCHING':False,'OUTPUT':'memory:'})
t1 = datetime.datetime.now()
print("Runtime: "+str(t1-t0))
print("SAGA Add polygon attributers to points ...")
processing.runAndLoadResults("saga:addpolygonattributestopoints", 
   {'INPUT':'E:/Geodata/NaturalEarth/vector_v4/natural_earth_vector/110m_cultural/ne_110m_populated_places.shp',
   'POLYGONS':'E:/Geodata/NaturalEarth/vector_v4/natural_earth_vector/10m_cultural/ne_10m_admin_0_countries.shp',
   'FIELDS':'SOVEREIGNT','OUTPUT':'C:/Users/anita/AppData/Local/Temp/processing_8b1bbde78de5490285dd530e115cca52/099660d88bf14c54a853cc230e388e55/OUTPUT.shp'})
t2 = datetime.datetime.now()
print("Runtime: "+str(t2-t1))

It is worth noting that it takes longer if more attributes are to be joined to the point layer attribute table. For example, if the JOIN_FIELDS parameter is empty:

'JOIN_FIELDS':[]

instead of

'JOIN_FIELDS':['SOVEREIGNT']

then the the Join attributes by location takes almost 16 seconds. (The country layer contains 71 attributes after all.)

(The SAGA tool currently allows only joining one attribute at a time.)

%d bloggers like this: