Archive

AI

At the end of yesterday’s TimeGPT for mobility post, we concluded that TimeGPT’s trainingset probably included a copy of the popular BikeNYC timeseries dataset and that, therefore, we were not looking at a fair comparison.

Naturally, it’s hard to find mobility timeseries datasets online that haven’t been widely disseminated and therefore may have slipped past the scrapers of foundation model builders.

So I scoured the Austrian open government data portal and came up with a bike-share dataset from Vienna.

Dataset

SharedMobility.ai dataset published by Philipp Naderer-Puiu, covering 2019-05-05 to 2019-12-31.

Here are eight of the 120 stations in the dataset. I’ve resampled the number of available bicycles to the maximum hourly value and made a cutoff mid August (before a larger data collection cap and the less busy autumn and winter seasons):

Models

To benchmark TimeGPT, I computed different baseline predictions. I used statsforecast’s HistoricAverage, SeasonalNaive, and AutoARIMA models and computed predictions for horizons of 1 hour, 12 hours, and 24 hours.

Here are examples of the 12-hour predictions:

We can see how Historic Average is pretty much a straight line of the average past value. A little more sophisticated, SeasonalNaive assumes that the future will be a repeat of the past (i.e. the previous day), which results in the shifted curve we can see in the above examples. Finally, there’s AutoARIMA which seems to do a better job than the first two models but also takes much longer to compute.

For comparison, here’s TimeGPT with 12 hours horizon:

You can find the full code in https://github.com/anitagraser/ST-ResNet/blob/570d8a1af4a10c7fb2230ccb2f203307703a9038/experiment.ipynb

Results

In the following table, you’ll find the best model highlighted in bold. Unsurprisingly, this best model is for the 1 hour horizon. The best models for 12 and 24 hours are marked in italics.

ModelHorizonRMSE
HistoricAverage17.0229
HistoricAverage127.0195
HistoricAverage247.0426
SeasonalNaive17.8703
SeasonalNaive127.7317
SeasonalNaive247.8703
AutoARIMA12.2639
AutoARIMA125.1505
AutoARIMA246.3881
TimeGPT12.3193
TimeGPT124.8383
TimeGPT245.6671

AutoARIMA and TimeGPT are pretty closely tied. Interestingly, the SeasonalNaive model performs even worse than the very simple HistoricAverage, which is an indication of the irregular nature of the observed phenomenon (probably caused by irregular restocking of stations, depending on the system operator’s decisions).

Conclusion & next steps

Overall, TimeGPT struggles much more with the longer horizons than in the previous BikeNYC experiment. The error more than doubled between the 1 hour and 12 hours prediction. TimeGPT’s prediction quality barely out-competes AutoARIMA’s for 12 and 24 hours.

I’m tempted to test AutoARIMA for the BikeNYC dataset to further complete this picture.

Of course, the SharedMobility.ai dataset has been online for a while, so I cannot be completely sure that we now have a fair comparison. For that, we would need a completely new / previously unpublished dataset.


For a more thorough write-up, head along to Graser, A. (2025). Timeseries Foundation Models for Mobility: A Benchmark Comparison with Traditional and Deep Learning Models. arXiv preprint arXiv:2504.03725.

tldr; Maybe. Preliminary results certainly are impressive.

Introduction

Crowd and flow predictions have been very popular topics in mobility data science. Traditional forecasting methods rely on classic machine learning models like ARIMA, later followed by deep learning approaches such as ST-ResNet.

More recently, foundation models for timeseries forecasting, such as TimeGPT, Chronos, and LagLlama have been introduced. A key advantage of these models is their ability to generate zero-shot predictions — meaning that they can be applied directly to new tasks without requiring retraining for each scenario.

In this post, I want to compare TimeGPTโ€™s performance against traditional approaches for predicting city-wide crowd flows.

Experiment setup

The experiment builds on the paper “Deep Spatio-Temporal Residual Networks for Citywide Crowd Flows Prediction” by Zhang et al. (2017). The original repo referenced on the homepage does not exist anymore. Therefore, I forked: https://github.com/topazape/ST-ResNet as a starting point.

The goals of this experiment are to:

  1. Get an impression how TimeGPT predicts mobility timeseries.
  2. Compare TimeGPT to classic machine learning (ML) and deep learning (DL) models.
  3. Understand how different forecasting horizons impact predictive accuracy.

The paper presents results for two datasets (TaxiBJ and BikeNYC). The following experiment only covers BikeNYC.

You can find the full notebook at https://github.com/anitagraser/ST-ResNet/blob/079948bfbab2d512b71abc0b1aa4b09b9de94f35/experiment.ipynb

First attempt

In the first version, I applied TimeGPTโ€™s historical forecast function to generate flow predictions. However, there was an issue: the built-in historic forecast function ignores the horizon parameter, thus making it impossible to control the horizon and make a fair comparison.

Refinements

In the second version, I therefore added backtesting with customizable forecast horizon to evaluate TimeGPTโ€™s forecasts over multiple time windows.

To reproduce the original experiments as truthfully as possible, both inflows and outflows were included in the experiments.

I ran TimeGPT for different forecasting horizons: 1 hour, 12 hours, and 24 hours. (In the original paper (Zhang et al. 2017), only one-step-ahead (1 hour) forecasting is performed but it is interesting to explore the effects of the additional challenge resulting from longer forecast horizons.) Here’s an example of the 24-hour forecast:

The predictions pick up on the overall daily patterns but the peaks are certainly hit-and-miss.

For comparison, here are some results for the easier 1-hour forecast:

Not bad. Let’s run the numbers! (And by that I mean: let’s measure the error.)

Results 

The original paper provides results (RMSE, i.e. smaller is better) for multiple traditional ML models and DL models. Addition our experiments to these results, we get:

ModelRMSE
ARIMA10.56
SARIMA10.07
VAR9.92
DeepST-C8.39
DeepST-CP7.64
DeepST-CPT7.56
DeepST-CPTM7.43
ST-ResNet6.33
TimeGPT (horizon=1)5.70
TimeGPT (horizon=12)7.62
TimeGPT (horizon=24)8.93

Key takeaways

  • TimeGPT with a 1 hour horizon outperforms all ML and DL models.
  • For longer horizons, TimeGPT’s accuracy declines but remains competitive with DL approaches.
  • TimeGPTโ€™s pre-trained nature means that we can immediately make predictions without any prior training.ย 

Conclusion & next steps

These preliminary results suggest that timeseries foundation models, such as TimeGPT, are a promising tool. However, a key limitation of the presented experiment remains: since BikeNYC data has been public for a long time, it is well possible that TimeGPT has seen this dataset during its training. This raises questions about how well it generalizes to truly unseen datasets. To address this, the logical next step would be to test TimeGPT and other foundation models on an entirely new dataset to better evaluate its robustness.

We also know that DL model performance can be improved by providing more training data. It is therefore reasonable to assume that specialized DL models will outperform foundation models once they are trained with enough data. But in the absence of large-enough training datasets, foundation models can be an option.

In recent literature, we also find more specific foundation models for spatiotemporal prediction, such as UrbanGPT https://arxiv.org/abs/2403.00813, UniST https://arxiv.org/abs/2402.11838, and UrbanDiT https://arxiv.org/pdf/2411.12164. However, as far as I can tell, none of them have published the model weights.

If you want to join forces, e.g. add more datasets or test other timeseries foundation models, don’t hesitate to reach out.


Part 2: The quest for a fair TimeGPT benchmark

It’s been a while since my post on geo and the AI hype in 2019. Back then, I didn’t use the term “GeoAI”, even though it has certainly been around for a while (including, e.g., with dedicated SIGSPATIAL workshops since 2017).

GeoAI isn’t one single thing. It’s an umbrella term, including: “AI for Geo” (using AI methods in Geo, e.g. deep learning for object recognition in remote sensing images) and “Geo for AI” (integrating geographic concepts into AI models, e.g. by building spatially explicit models). [Zhang 2020] [Li et al. 2024]

Today’s post is a collection of key GeoAI developments I’m aware of. If I missed anything you are excited about, please let me know here in the comments or over on Mastodon.

Background

A week ago, I had the pleasure to attend a “Specialist Meeting” on GeoAI here in Vienna, meeting over 40 researchers from around the world, from Master students to professor emeritus. Huge props to Jano (Prof. Krzysztof Janowicz) and his team at Uni Wien for bringing this awesome group of people together.

The elephant in the room: LLMs

Unsurprisingly, LLMs and the claims they make about geography are a mayor issue due to the mistakes they make and the biases behind them. An infamous example is AI’s issue with understanding topology:

Image source: Janowicz, K. (2023). Philosophical Foundations of GeoAI: Exploring Sustainability, Diversity, and Bias in GeoAI and Spatial Data Science. arXiv e-prints, arXiv-2304.

Even if recent versions of ChatGPT (such as GTP 4o) do a better job with this specific example, this doesn’t make their answers reliable. So between the trustworthiness, reproducibility, explainability, and sustainability issues … LLMs have a long way to go. And it’s not clear whether they are going in the right direction right now.

Geospatial foundation models

Prithvi, a model developed by NASA, IBM, et al. in 2023, is one of the first geospatial foundation models. Like much of GeoAI, Prithvi deals with remote sensing data. Specifically, it is trained on Landsat and Sentinel-2 (HLS) imagery, with applications in flood mapping and wildfire prediction. And maybe best of all: the model is open-source and publicly available.

Spatiotemporal machine learning model specifications

In the general AI community, model cards have become a common way to share information about models. However, identifying the right model for spatiotemporal tasks is hard since there are no standardized descriptions in existing model catalogs (e.g. Hugging Face, DLHub or MLFlow). To address this issue, [Charette-Migneault et al. 2024] have proposed the Machine Learning Model (MLM) extension for the SpatioTemporal Asset Catalogs (STAC). But, yet again, this development is targeting models trained with remote sensing imagery.

Similarly, the OGC Training Data Markup Language for Artificial Intelligence (TrainingDML-AI) and its ISO equivalent are limited to EO as well …

Spatial knowledge graphs

For those among us working mostly with vector data, the KnowWhereGraph is an interesting development. It’s the first geo-enriched knowledge graph [Janowicz et al. 2022] that helps answer geospatial questions by integrating a variety of spatial datasets through hierarchical grids, standard region boundaries and appropriate ontology and knowledge graph schema development. However, so far, the KnowWhereGraph is mostly limited to the United States.

Explainable AI (XAI) and geo

While answers from knowledge graphs are intrinsically explainable, many other (Geo)AI solutions are built on AI approaches that result in black box models.

Graph neural networks (GNNs) have become very popular in GeoAI (including in urban analytics and mobility [Jalali et al. 2023] [Liu et al. 2024]) but their black box nature limits their practical usefulness in domains where transparency and trustworthiness are crucial. To offer insights into how model predictions are made, [Liu et al. 2024] propose a spatially explicit GeoAI-based method that combines a graph convolutional network and a graph-based XAI method, called GNNExplainer to explore the correlation between urban objects.

Reproducibility et al.

The AI hype in geo is still going strong. Journals are being flooded with paper submissions and good reviewers are hard to come by. In many geo-related venues, it is still acceptable to present an AI paper without making code or model available. (We recently discussed this issue for mobility AI specifically [Graser et al. 2024].)

I’m convinced we can and should do better: quality over quantity, moving steadily, building and fixing things.

References

After the initial ChatGPT hype in 2023 (when we saw the first LLM-backed QGIS plugins, e.g. QChatGPT and QGPT Agent), there has been a notable slump in new development. As far as I can tell, none of the early plugins are actively maintained anymore. They were nice tech demos but with limited utility.

However, in the last month, I saw two new approaches for combining LLMs with QGIS that I want to share in this post:

IntelliGeo plugin: generatingย PyQGIS scriptsย or graphical models

At the QGIS User Conference in Bratislava, I had the pleasure to attend the “Large Language Models and GIS” workshop presented by Gustavo Garcia and Zehao Lu from the the University of Twente. There, they presented the IntelliGeo Plugin which enables the automatic generation of PyQGIS scripts and graphical models.

The workshop was packed. After we installed all dependencies and the plugin, it was exciting to test the graphical model generation capabilities. During the workshop, we used OpenAI’s API but the readme also mentions support for Cohere.

I was surprised to learn that even simple graphical models are actually pretty large files. This makes it very challenging to generate and/or modify models because they take up a big part of the LLM’s context window. Therefore, I expect that the PyQGIS script generation will be easier to achieve. But, of course, model generation would be even more impressive and useful since models are easier to edit for most users than code.

Image source: https://github.com/MahdiFarnaghi/intelli_geo

ChatGeoAI:ย chat with PyQGIS

ChatGeoAI is an approach presented in Mansourian, A.; Oucheikh, R. (2024). ChatGeoAI: Enabling Geospatial Analysis for Public through Natural Language, with Large Language Models.ย ISPRS Int. J. Geo-Inf.,ย 13, 348.

It uses a fine-tuned Llama 2 model in combination with spaCy for entity recognition and WorldKG ontology to write PyQGIS code that can perform a variety of different geospatial analysis tasks on OpenStreetMap data.

The paper is very interesting, describing the LLM fine-tuning, integration with QGIS, and evaluation of the generated code using different metrics. However, as far as I can tell, the tool is not publicly available and, therefore, cannot be tested.

Image source: https://www.mdpi.com/2220-9964/13/10/348

Are you aware of more examples that integrate QGIS with LLMs? Please share them in the comments below. I’d love to hear about them.

Earlier this year, I shared my experience using ChatGPTโ€™s Data Analyst web interface for analyzing spatiotemporal data in the post “ChatGPT Data Analyst vs. Movement Data”. The Data Analyst web interface, while user-friendly, is not equipped to handle all types of spatial data tasks, particularly those involving more complex or large-scale datasets. Additionally, because the code is executed on a remote server, we’re limited to the libraries and tools available in that environment. I’ve often encountered situations where the Data Analyst simply doesnโ€™t have access to the necessary libraries in its Python environment, which can be frustrating if you need specific GIS functionality.

Today, we’ll therefore start to explore alternatives to ChatGPTโ€™s Data Analyst Web Interface, specifically, the OpenAI Assistant API. Later, I plan to dive deeper into even more flexible approaches, like Langchain’s Pandas DataFrame Agents. Weโ€™ll explore these options using spatial analysis workflow, such as:

  1. Loading a zipped shapefile and investigate its content
  2. Finding the three largest cities in the dataset
  3. Selecting all cities in a region, e.g. in Scandinavia from the dataset
  4. Creating static and interactive maps

To try the code below, you’ll need an OpenAI account with a few dollars on it. While gpt-3.5-turbo is quite cheap, using gpt-4o with the Assistant API can get costly fast.

OpenAI Assistant API

The OpenAI Assistant API allows us to create a custom data analysis environment where we can interact with our spatial datasets programmatically. To write the following code, I used the assistant quickstart and related docs (yes, shockingly, ChatGPT wasn’t very helpful for writing this code).

Like with Data Analyst, we need to upload the zipped shapefile to the server to make it available to the assistant. Then we can proceed to ask it questions and task it to perform analytics and create maps.

from openai import OpenAI

client = OpenAI()

file = client.files.create(
  file=open("H:/ne_110m_populated_places_simple.zip", "rb"),
  purpose='assistants'
)

Then we can hand the file over to the assistant:

assistant = client.beta.assistants.create(
  name="GIS Analyst",
  instructions="You are a personal GIS data analyst. Write and rund code to answer geospatial analysis questions",
  tools=[{"type": "code_interpreter"}],
  model="gpt-3.5-turbo",  # or "gpt-4o"
  tool_resources={
    "code_interpreter": {
      "file_ids": [file.id]
    }
  }  
)

Then we can start asking questions and giving our assistant tasks:

thread = client.beta.threads.create()

message = client.beta.threads.messages.create(
  thread_id=thread.id,
  role="user",
  content="Please load the zipped shapefile and describe the content"
)
assistant > The shapefile has been successfully loaded as a GeoDataFrame with the following details:

- The GeoDataFrame has 243 entries.
- It contains a total of 38 columns with various data types including integer, float, and object types.
- One of the columns is a geometry column representing the spatial information.

If you have any specific analysis or visualizations you would like to perform on this data, feel free to let me know!

The docs have a nice example for how to stream the answers back, e.g. into a Jupyter notebook. I’ve used it to create a small convenience function:

def ask_another_question(question):
    message = client.beta.threads.messages.create(
        thread_id=thread.id,
        role="user",
        content=question
    )
    with client.beta.threads.runs.stream(
        thread_id=thread.id,
        assistant_id=assistant.id,
        event_handler=EventHandler(),
    ) as stream:
        stream.until_done()

ask_another_question("Please list the three largest cities")
assistant > code_interpreter

# Sort the GeoDataFrame by the population in descending order
largest_cities = gdf.sort_values(by='pop_max', ascending=False).head(3)[['name', 'pop_max']]
largest_cities
assistant > The three largest cities in the dataset, along with their maximum population numbers, are as follows:

1. Tokyo: 35,676,000
2. New York: 19,040,000
3. Mexico City: 19,028,000

These cities have the highest recorded maximum population numbers in the dataset. Let me know if you need any further information or analysis!
ask_another_question("extract all scandinavian cities ")
assistant > To extract all the Scandinavian cities from the dataset, we can filter the GeoDataFrame based on the countries typically considered part of Scandinavia. These countries include Norway, Sweden, Denmark, Finland, and sometimes Iceland. Let's filter the GeoDataFrame to extract the Scandinavian cities.
assistant > code_interpreter

# List of countries typically considered part of Scandinavia
scandinavian_countries = ['Norway', 'Sweden', 'Denmark', 'Finland', 'Iceland']

# Filter the GeoDataFrame for Scandinavian cities
scandinavian_cities = gdf[gdf['adm0name'].isin(scandinavian_countries)][['name', 'adm0name']]

scandinavian_cities.head()  # Displaying the first few Scandinavian cities
assistant > Here are some of the Scandinavian cities extracted from the dataset:

1. Reykjavรญk, Iceland
2. Oslo, Norway
3. Helsinki, Finland
4. Kรธbenhavn (Copenhagen), Denmark
5. Stockholm, Sweden

These are just a few examples of Scandinavian cities present in the dataset. If you would like the complete list or perform any specific analysis on these cities, feel free to let me know!
ask_another_question(
    "please create a world map with all cities (use gray circles) "
    "with scandinavian cities highlighted as red circles. "
)
assistant > To create a world map with all cities represented as gray circles and Scandinavian cities highlighted as red circles, we will use the GeoPandas library for mapping. Let's go ahead and create the map.
assistant > code_interpreter

import matplotlib.pyplot as plt

# Plot all cities with gray circles and Scandinavian cities with red circles
fig, ax = plt.subplots(figsize=(12, 8))

# Plot all cities in gray circles
gdf.plot(ax=ax, color='lightgrey', markersize=5)

# Plot Scandinavian cities in red circles
scandinavian_cities.plot(ax=ax, color='red', markersize=50, edgecolor='black', label='Scandinavian Cities')

# Set the title
ax.set_title('World Map with Scandinavian Cities Highlighted', fontsize=16)

# Add legend
ax.legend()

# Display the map
plt.show()
assistant > It seems that there was an error while plotting the map because the GeoDataFrame `scandinavian_cities` does not have the necessary numeric data to plot the map directly.
...
plt.show()

output >

assistant > Here is the world map with all cities represented as gray circles and Scandinavian cities highlighted as red circles. The map provides a visual representation of the locations of the Scandinavian cities in relation to the rest of the cities around the world. If you need any further assistance or modifications, feel free to let me know!

To load and show the image, we can use:

import matplotlib.pyplot as plt
import matplotlib.image as mpimg

def show_image():
    messages = client.beta.threads.messages.list(thread_id=thread.id)

    for m in messages.data:
        if m.role == "user":
            continue
        if m.content[0].type == 'image_file':
            m.content[0].image_file.file_id
            image_data = client.files.content(messages.data[0].content[0].image_file.file_id)
            image_data_bytes = image_data.read()
            with open("./out/my-image.png", "wb") as file:
                file.write(image_data_bytes)
            image = mpimg.imread("./out/my-image.png")
            plt.imshow(image)
            plt.box(False)
            plt.xticks([])
            plt.yticks([])
            plt.show() 
            break

Asking for an interactive map in an html file works in a similar fashion.

You can see the whole analysis workflow it in action here:

This way, we can use ChatGPT to perform data analysis from the comfort of our Jupyter notebooks. However, it’s important to note that, like the Data Analyst, the code we execute with the Assistant API runs on a remote server. So, again, we are restricted to the libraries available in that server environment. This is an issue we will address next time, when we look into Langchain.

Conclusion

ChatGPTโ€™s Data Analyst Web Interface and the OpenAI Assistant API both come with their own advantages and disadvantages.

The results can be quite random. In the Scandinavia example, every run can produce slightly different results. Sometimes the results just use different assumptions such as, e.g. Finland and Iceland being part of Scandinavia or not, other times, they can be outright wrong.

As always, Iโ€™m interested to hear your experiences and thoughts. Have you been testing the LLM plugins for QGIS when they originally came out?

Today, I took ChatGPT’s Data Analyst for a spin. You’ve probably seen the fancy advertising videos: just drop in a dataset and AI does all the analysis for you?! Let’s see …

Of course, I’m not going to use some lame movie database or flower petals data. Instead, let’s go all in and test with a movement dataset.

You don’t get a second chance to make a first impression, they say. — Well, Data Analyst, you didn’t impress on the first try. How hard can it be to guess the delimiter and act accordingly?

Anyway, let’s help it a little:

That looks much better. It makes an effort to guess what the columns could mean and successfully identifies the spatiotemporal information.

Now for some spatial analysis. On first try, it didn’t want to calculate the length of the trajectories in geographic terms, but we can make it to:

It will also show the code used to get to the results:

And indeed, these are close enough to the results computed using MovingPandas:

“What about plots?” I hear you ask.

For a first try, not bad at all:

Let’s see if we can push it further:

Looks like poor Data Analyst ended up in geospatial library dependency hell ๐Ÿ˜ˆ

It’s interesting to watch it try find a solution.

Alas, no background map appears:

Not giving up yet :)

Woah, what happened here? It claims it created an interactive map in an HTML file.

And indeed it did:

This has been a very interesting experiment for me with many highs and lows. The whole process is a bit hit and miss. But when it does work, it’s fun.

I wasn’t sure what to expect with regards to Data Analyst’s spatial data processing capabilities. Looks like there are enough examples in its training data to find solutions for the basic trajectory analysis problems I asked it solve today, eventually, at least.

What’s the conclusion? Most AI marketing videos are severely overselling the capabilities of these tools. However, that doesn’t mean that they are completely useless, either. I’m looking forward to seeing the age of smaller open source models specifically trained for geospatial analysis to finally make it unnecessary for humans to memorize data analysis library syntax.