Today, I’m super excited to share with you the announcement that our open source textbook “Geocomputation with Python” has finally arrived in print and is now available for purchase from Routledge.com, Amazon.com, Amazon.co.uk, and other booksellers.

“Geocomputation with Python” (or geocompy for short) covers the entire range of standard GIS operations for both vector and raster data models. Each section and chapter builds on the previous. If you’re just starting out with Python to work with geographic data, we hope that the book will be an excellent place to start.

Of course, you can still find the online version of the book at py.geocompx.org.

The book is open-source and you can find the code on GitHub. This ensures that the content is reproducible, transparent, and accessible. It also lets you interact with the project by opening issues and submitting pull requests.

Trajectools 2.4 release

By underdark

2025-01-11

GIS, Movement data in GIS, MovingPandas, Trajectools

In this new release, you will find new algorithms, default output styles, and other usability improvements, in particular for working with public transport schedules in GTFS format, including:

Added GTFS algorithms for extracting stops, fixes #43
Added default output styles for GTFS stops and segments c600060
Added Trajectory splitting at field value changes 286fdbd
Added option to add selected fields to output trajectories layer, fixes #53
Improved UI of the split by observation gap algorithm, fixes #36

Note: To use this new version of Trajectools, please upgrade your installation of MovingPandas to >= 0.21.2, e.g. using

import pip; pip.main(['install', '--upgrade', 'movingpandas'])

or

conda install movingpandas==0.21.2

Urban mobility insights with MovingPandas & CARTO in Snowflake

By underdark

2024-12-17

Big Data, Movement data in GIS, MovingPandas, spatio-temporal data

Today, I want to point out a blog post over at

https://carto.com/blog/urban-mobility-insights-with-movingpandas-carto-in-snowflake

written together with my fellow co-authors and EMERALDS project team member Argyrios Kyrgiazos.

For the technically inclined, the highlight are the presented UDFs in Snowflake to process and transform the trajectory data. For example, here’s a TemporalSplitter UDF:

CREATE OR REPLACE FUNCTION CARTO_DATABASE.CARTO.TemporalSplitter(geom ARRAY, t ARRAY, mode STRING)
RETURNS ARRAY
LANGUAGE PYTHON
RUNTIME_VERSION = 3.11
PACKAGES = ('numpy','pandas', 'geopandas','movingpandas', 'shapely')
HANDLER = 'udf'
AS $$
import numpy as np
import pandas as pd
import geopandas as gpd
import movingpandas as mpd
import shapely
from shapely.geometry import shape, mapping, Point, Polygon
from shapely.validation import make_valid
from datetime import datetime, timedelta

def udf(geom, t, mode):
    valid_df = pd.DataFrame(geom, columns=['geometry'])
    valid_df['t'] = pd.to_datetime(t)
    valid_df['geometry'] = valid_df['geometry'].apply(lambda x:shapely.wkt.loads(x))
    gdf = gpd.GeoDataFrame(valid_df, geometry='geometry', crs='epsg:4326')
    gdf = gdf.set_index('t')
    traj = mpd.Trajectory(gdf, 1)
    traj_sm = mpd.TemporalSplitter(traj).split(mode=mode)
    if len(traj_sm.trajectories)>0:
        res = traj_sm.to_point_gdf()
        res['geometry'] = res['geometry'].apply(lambda x: shapely.wkt.dumps(x))
        return res.reset_index().values
    else:
        return []
$$;

You can find the full code here: https://github.com/anitagraser/carto-research-public/tree/master/movingpandas_carto_in_snowflake

GeoParquet in QGIS – smaller & faster files for the win!

By underdark

2024-11-23

Big Data, GIS, Movement data in GIS, QGIS

5 Comments

tldr; Tired of working with large CSV files? Give GeoParquet a try!

“Parquet is a powerful column-oriented data format, built from the ground up to as a modern alternative to CSV files.” https://geoparquet.org/

(Geo)Parquet is both smaller and faster than CSV. Additionally, (Geo)Parquet columns are typed. Text, numeric values, dates, geometries retain their data types. GeoParquet also stores CRS information and support in GIS solutions is growing.

I’ll be giving a quick overview using AIS data in GeoPandas 1.0.1 (with pyarrow) and QGIS 3.38 (with GDAL 3.9.2).

File size

The example AIS dataset for this demo contains ~10 million rows with 22 columns. I’ve converted the original zipped CSV into GeoPackage and GeoParquet using GeoPandas to illustrate the huge difference in file size: ~470 MB for GeoParquet and zipped CSV, 1.6 GB for CSV, and a whopping 2.6 GB for GeoPackage:

Reading performance

Pandas and GeoPandas both support selective reading of files, i.e. we can specify the specific columns to be loaded. This does speed up reading, even from CSV files:

	Whole file	Selected columns
CSV	27.9 s	13.1 s
Geopackage	2min 12s 😵	20.2 s
GeoParquet	7.2 s	4.1 s

Indeed, reading the whole GeoPackage is getting quite painful.

Here’s the code I used for timing the read times:

As you can see, these times include the creation of the GeoPandas.GeoDataFrame.

If we don’t need a GeoDataFrame, we can read the files even faster:

Non-spatial DataFrames

GeoParquet files can be read by non-GIS tools, such as Pandas. This makes it easier to collaborate with people who may not be familiar with geospatial data stacks.

And reading plain DataFrames is much faster than creating GeoDataFrames:

But back to GIS …

GeoParquet in QGIS

In QGIS, GeoParquet files can be loaded like any other vector layer, thanks to GDAL:

Loading the GeoParquet and GeoPackage files is pretty quick, especially if we zoom into a small region of interest (even though, unfortunately, it doesn’t seem possible to restrict the columns to further speed up loading). Loading the CSV, however, is pretty painful due to the lack of spatial indexing, which becomes apparent very quickly in the direct comparison:

(You can see how slowly the red CSV points are rendering. I didn’t have the patience to include the whole process in the GIF.)

As far as I can tell, my QGIS 3.38 ‘Grenoble’ does not support writing to or editing of GeoParquet files. So I’m limited to reading GeoParquet for now.

However, seeing how much smaller GeoParquets are compared to GeoPackages (and also faster to write), I hope that we will soon get the option to export to GeoParquet.

For now, I’ll start by converting my large CSV files to GeoParquet using GeoPandas.

LLM-based spatial analysis assistants for QGIS

By underdark

2024-10-06

GIS, LLM, QGIS

After the initial ChatGPT hype in 2023 (when we saw the first LLM-backed QGIS plugins, e.g. QChatGPT and QGPT Agent), there has been a notable slump in new development. As far as I can tell, none of the early plugins are actively maintained anymore. They were nice tech demos but with limited utility.

However, in the last month, I saw two new approaches for combining LLMs with QGIS that I want to share in this post:

IntelliGeo plugin: generating PyQGIS scripts or graphical models

At the QGIS User Conference in Bratislava, I had the pleasure to attend the “Large Language Models and GIS” workshop presented by Gustavo Garcia and Zehao Lu from the the University of Twente. There, they presented the IntelliGeo Plugin which enables the automatic generation of PyQGIS scripts and graphical models.

The workshop was packed. After we installed all dependencies and the plugin, it was exciting to test the graphical model generation capabilities. During the workshop, we used OpenAI’s API but the readme also mentions support for Cohere.

I was surprised to learn that even simple graphical models are actually pretty large files. This makes it very challenging to generate and/or modify models because they take up a big part of the LLM’s context window. Therefore, I expect that the PyQGIS script generation will be easier to achieve. But, of course, model generation would be even more impressive and useful since models are easier to edit for most users than code.

Image source: https://github.com/MahdiFarnaghi/intelli_geo

ChatGeoAI: chat with PyQGIS

ChatGeoAI is an approach presented in Mansourian, A.; Oucheikh, R. (2024). ChatGeoAI: Enabling Geospatial Analysis for Public through Natural Language, with Large Language Models. ISPRS Int. J. Geo-Inf., 13, 348.

It uses a fine-tuned Llama 2 model in combination with spaCy for entity recognition and WorldKG ontology to write PyQGIS code that can perform a variety of different geospatial analysis tasks on OpenStreetMap data.

The paper is very interesting, describing the LLM fine-tuning, integration with QGIS, and evaluation of the generated code using different metrics. However, as far as I can tell, the tool is not publicly available and, therefore, cannot be tested.

Image source: https://www.mdpi.com/2220-9964/13/10/348

Are you aware of more examples that integrate QGIS with LLMs? Please share them in the comments below. I’d love to hear about them.

Trajectools tutorial: trajectory preprocessing

By underdark

2024-09-21

GIS, Movement data in GIS, MovingPandas, QGIS, spatio-temporal data

Today marks the release of Trajectools 2.3 which brings a new set of algorithms, including trajectory generalizing, cleaning, and smoothing.

To give you a quick impression of what some of these algorithms would be useful for, this post introduces a trajectory preprocessing workflow that is quite general-purpose and can be adapted to many different datasets.

We start out with the Geolife sample dataset which you can find in the Trajectools plugin directory’s sample_data subdirectory. This small dataset includes 5908 points forming 5 trajectories, based on the trajectory_id field:

We first split our trajectories by observation gaps to ensure that there are no large gaps in our trajectories. Let’s make at cut at 15 minutes:

This splits the original 5 trajectories into 11 trajectories:

When we zoom, for example, to the two trajectories in the north western corner, we can see that the trajectories are pretty noisy and there’s even a spike / outlier at the western end:

If we label the points with the corresponding speeds, we can see how unrealistic they are: over 300 km/h!

Let’s remove outliers over 50 km/h:

Better but not perfect:

Let’s smooth the trajectories to get rid of more of the jittering.

(You’ll need to pip/mamba install the optional stonesoup library to get access to this algorithm.)

Depending on the noise values we chose, we get more or less smoothing:

Let’s zoom out to see the whole trajectory again:

Feel free to pan around and check how our preprocessing affected the other trajectories, for example:

Building spatial analysis assistants using OpenAI’s Assistant API

By underdark

2024-08-29

GIS, LLM

Earlier this year, I shared my experience using ChatGPT’s Data Analyst web interface for analyzing spatiotemporal data in the post “ChatGPT Data Analyst vs. Movement Data”. The Data Analyst web interface, while user-friendly, is not equipped to handle all types of spatial data tasks, particularly those involving more complex or large-scale datasets. Additionally, because the code is executed on a remote server, we’re limited to the libraries and tools available in that environment. I’ve often encountered situations where the Data Analyst simply doesn’t have access to the necessary libraries in its Python environment, which can be frustrating if you need specific GIS functionality.

Today, we’ll therefore start to explore alternatives to ChatGPT’s Data Analyst Web Interface, specifically, the OpenAI Assistant API. Later, I plan to dive deeper into even more flexible approaches, like Langchain’s Pandas DataFrame Agents. We’ll explore these options using spatial analysis workflow, such as:

Loading a zipped shapefile and investigate its content
Finding the three largest cities in the dataset
Selecting all cities in a region, e.g. in Scandinavia from the dataset
Creating static and interactive maps

To try the code below, you’ll need an OpenAI account with a few dollars on it. While gpt-3.5-turbo is quite cheap, using gpt-4o with the Assistant API can get costly fast.

OpenAI Assistant API

The OpenAI Assistant API allows us to create a custom data analysis environment where we can interact with our spatial datasets programmatically. To write the following code, I used the assistant quickstart and related docs (yes, shockingly, ChatGPT wasn’t very helpful for writing this code).

Like with Data Analyst, we need to upload the zipped shapefile to the server to make it available to the assistant. Then we can proceed to ask it questions and task it to perform analytics and create maps.

from openai import OpenAI

client = OpenAI()

file = client.files.create(
  file=open("H:/ne_110m_populated_places_simple.zip", "rb"),
  purpose='assistants'
)

Then we can hand the file over to the assistant:

assistant = client.beta.assistants.create(
  name="GIS Analyst",
  instructions="You are a personal GIS data analyst. Write and rund code to answer geospatial analysis questions",
  tools=[{"type": "code_interpreter"}],
  model="gpt-3.5-turbo",  # or "gpt-4o"
  tool_resources={
    "code_interpreter": {
      "file_ids": [file.id]
    }
  }  
)

Then we can start asking questions and giving our assistant tasks:

thread = client.beta.threads.create()

message = client.beta.threads.messages.create(
  thread_id=thread.id,
  role="user",
  content="Please load the zipped shapefile and describe the content"
)

assistant > The shapefile has been successfully loaded as a GeoDataFrame with the following details:

- The GeoDataFrame has 243 entries.
- It contains a total of 38 columns with various data types including integer, float, and object types.
- One of the columns is a geometry column representing the spatial information.

If you have any specific analysis or visualizations you would like to perform on this data, feel free to let me know!

The docs have a nice example for how to stream the answers back, e.g. into a Jupyter notebook. I’ve used it to create a small convenience function:

def ask_another_question(question):
    message = client.beta.threads.messages.create(
        thread_id=thread.id,
        role="user",
        content=question
    )
    with client.beta.threads.runs.stream(
        thread_id=thread.id,
        assistant_id=assistant.id,
        event_handler=EventHandler(),
    ) as stream:
        stream.until_done()

ask_another_question("Please list the three largest cities")

assistant > code_interpreter

# Sort the GeoDataFrame by the population in descending order
largest_cities = gdf.sort_values(by='pop_max', ascending=False).head(3)[['name', 'pop_max']]
largest_cities
assistant > The three largest cities in the dataset, along with their maximum population numbers, are as follows:

1. Tokyo: 35,676,000
2. New York: 19,040,000
3. Mexico City: 19,028,000

These cities have the highest recorded maximum population numbers in the dataset. Let me know if you need any further information or analysis!

ask_another_question("extract all scandinavian cities ")

assistant > To extract all the Scandinavian cities from the dataset, we can filter the GeoDataFrame based on the countries typically considered part of Scandinavia. These countries include Norway, Sweden, Denmark, Finland, and sometimes Iceland. Let's filter the GeoDataFrame to extract the Scandinavian cities.
assistant > code_interpreter

# List of countries typically considered part of Scandinavia
scandinavian_countries = ['Norway', 'Sweden', 'Denmark', 'Finland', 'Iceland']

# Filter the GeoDataFrame for Scandinavian cities
scandinavian_cities = gdf[gdf['adm0name'].isin(scandinavian_countries)][['name', 'adm0name']]

scandinavian_cities.head()  # Displaying the first few Scandinavian cities
assistant > Here are some of the Scandinavian cities extracted from the dataset:

1. Reykjavík, Iceland
2. Oslo, Norway
3. Helsinki, Finland
4. København (Copenhagen), Denmark
5. Stockholm, Sweden

These are just a few examples of Scandinavian cities present in the dataset. If you would like the complete list or perform any specific analysis on these cities, feel free to let me know!

ask_another_question(
    "please create a world map with all cities (use gray circles) "
    "with scandinavian cities highlighted as red circles. "
)

assistant > To create a world map with all cities represented as gray circles and Scandinavian cities highlighted as red circles, we will use the GeoPandas library for mapping. Let's go ahead and create the map.
assistant > code_interpreter

import matplotlib.pyplot as plt

# Plot all cities with gray circles and Scandinavian cities with red circles
fig, ax = plt.subplots(figsize=(12, 8))

# Plot all cities in gray circles
gdf.plot(ax=ax, color='lightgrey', markersize=5)

# Plot Scandinavian cities in red circles
scandinavian_cities.plot(ax=ax, color='red', markersize=50, edgecolor='black', label='Scandinavian Cities')

# Set the title
ax.set_title('World Map with Scandinavian Cities Highlighted', fontsize=16)

# Add legend
ax.legend()

# Display the map
plt.show()
assistant > It seems that there was an error while plotting the map because the GeoDataFrame `scandinavian_cities` does not have the necessary numeric data to plot the map directly.
...
plt.show()

output >

assistant > Here is the world map with all cities represented as gray circles and Scandinavian cities highlighted as red circles. The map provides a visual representation of the locations of the Scandinavian cities in relation to the rest of the cities around the world. If you need any further assistance or modifications, feel free to let me know!

To load and show the image, we can use:

import matplotlib.pyplot as plt
import matplotlib.image as mpimg

def show_image():
    messages = client.beta.threads.messages.list(thread_id=thread.id)

    for m in messages.data:
        if m.role == "user":
            continue
        if m.content[0].type == 'image_file':
            m.content[0].image_file.file_id
            image_data = client.files.content(messages.data[0].content[0].image_file.file_id)
            image_data_bytes = image_data.read()
            with open("./out/my-image.png", "wb") as file:
                file.write(image_data_bytes)
            image = mpimg.imread("./out/my-image.png")
            plt.imshow(image)
            plt.box(False)
            plt.xticks([])
            plt.yticks([])
            plt.show() 
            break

Asking for an interactive map in an html file works in a similar fashion.

You can see the whole analysis workflow it in action here:

This way, we can use ChatGPT to perform data analysis from the comfort of our Jupyter notebooks. However, it’s important to note that, like the Data Analyst, the code we execute with the Assistant API runs on a remote server. So, again, we are restricted to the libraries available in that server environment. This is an issue we will address next time, when we look into Langchain.

Conclusion

ChatGPT’s Data Analyst Web Interface and the OpenAI Assistant API both come with their own advantages and disadvantages.

The results can be quite random. In the Scandinavia example, every run can produce slightly different results. Sometimes the results just use different assumptions such as, e.g. Finland and Iceland being part of Scandinavia or not, other times, they can be outright wrong.

As always, I’m interested to hear your experiences and thoughts. Have you been testing the LLM plugins for QGIS when they originally came out?

MovingPandas 0.19 released!

By underdark

2024-08-23

MovingPandas

This release is the first to support GeoPandas 1.0.

Additionally, this release adds multiple new features, including:

New explore() function adds interactive folium / leaflet maps
New support for mfjson trajectory

For the full change log, check out the release page.

We have also revamped the documentation at https://movingpandas.readthedocs.io/ using the PyData Sphinx Theme:

On a related note: if you know what I need to change to get all Trajectory functions listed in the TOC on the right, please let me know.

Trajectools 2.2 released

By underdark

2024-07-12

Movement data in GIS, QGIS, Trajectools

If you downloaded Trajectools 2.1 and ran into troubles due to the introduced scikit-mobility and gtfs_functions dependencies, please update to Trajectools 2.2.

This new version makes it easier to set up Trajectools since MovingPandas is pip-installable on most systems nowadays and scikit-mobility and gtfs_functions are now truly optional dependencies. If you don’t install them, you simply will not see the extra algorithms they add:

If you encounter any other issues with Trajectools or have questions regarding its usage, please let me know in the Trajectools Discussions on Github.

New MovingPandas tutorial: taking OGC Moving Features full circle with MF-JSON

By underdark

2024-07-08

MovingPandas, OGC

Last week, I had the pleasure to meet some of the people behind the OGC Moving Features Standard Working group at the IEEE Mobile Data Management Conference (MDM2024). While chatting about the Moving Features (MF) support in MovingPandas, I realized that, after the MF-JSON update & tutorial with official sample post, we never published a complete tutorial on working with MF-JSON encoded data in MovingPandas.

The current MovingPandas development version (to be release as version 0.19) supports:

Reading MF-JSON MovingPoint (single trajectory features and trajectory collections)
Reading MF-JSON Trajectory
Writing MovingPandas Trajectories and TrajectoryCollections to MF-JSON MovingPoint

This means that we can now go full circle: reading — writing — reading.

Reading MF-JSON

Both MF-JSON MovingPoint encoding and Trajectory encoding can be read using the MovingPandas function read_mf_json(). The complete Jupyter notebook for this tutorial is available in the project repo.

Here we read one of the official MF-JSON MovingPoint sample files:

traj = mpd.read_mf_json('data/movingfeatures.json')

Writing MF-JSON

To write MF-JSON, the Trajectory and TrajectoryCollection classes provide a to_mf_json() function:

The resulting Python dictionary in MF-JSON MovingPoint encoding can then be saved to a JSON file, and then read again:

import json
with open('mf1.json', 'w') as json_file:
    json.dump(mf_json, json_file, indent=4)

Similarly, we can read any arbitrary trajectory data set and save it to MF-JSON.

For example, here we use our usual Geolife sample:

gdf = gp.read_file('data/demodata_geolife.gpkg')
tc = mpd.TrajectoryCollection(gdf, 'trajectory_id', t='t')
mf_json = tc.to_mf_json(temporal_columns=['sequence'])

And reading again

import json
with open('mf5.json', 'w') as json_file:
    json.dump(mf_json, json_file, indent=4)
tc = mpd.read_mf_json('mf5.json', traj_id_property='trajectory_id' )

Conclusion

The implemented MF-JSON support covers the basic usage of the encodings. There are some fine details in the standard, such as the distinction of time-varying attribute with linear versus step-wise interpolation, which MovingPandas currently does not support.

If you are working with movement data, I would appreciate if you can give the improved MF-JSON support a spin and report back with your experiences.

—Free and Open Source GIS Ramblings

written by Anita Graser aka Underdark

Archive

GIS

Geocomputation with Python: now in print!

Trajectools 2.4 release

Urban mobility insights with MovingPandas & CARTO in Snowflake

GeoParquet in QGIS – smaller & faster files for the win!

File size

Reading performance

Non-spatial DataFrames

GeoParquet in QGIS

More reading

LLM-based spatial analysis assistants for QGIS

IntelliGeo plugin: generating PyQGIS scripts or graphical models

ChatGeoAI: chat with PyQGIS

Trajectools tutorial: trajectory preprocessing

Building spatial analysis assistants using OpenAI’s Assistant API

OpenAI Assistant API

Conclusion

MovingPandas 0.19 released!

Trajectools 2.2 released

New MovingPandas tutorial: taking OGC Moving Features full circle with MF-JSON

Reading MF-JSON

Writing MF-JSON

And reading again

Conclusion