GeoParquet in QGIS – smaller & faster files for the win!
tldr; Tired of working with large CSV files? Give GeoParquet a try!
“Parquet is a powerful column-oriented data format, built from the ground up to as a modern alternative to CSV files.” https://geoparquet.org/
(Geo)Parquet is both smaller and faster than CSV. Additionally, (Geo)Parquet columns are typed. Text, numeric values, dates, geometries retain their data types. GeoParquet also stores CRS information and support in GIS solutions is growing.
I’ll be giving a quick overview using AIS data in GeoPandas 1.0.1 (with pyarrow) and QGIS 3.38 (with GDAL 3.9.2).
File size
The example AIS dataset for this demo contains ~10 million rows with 22 columns. I’ve converted the original zipped CSV into GeoPackage and GeoParquet using GeoPandas to illustrate the huge difference in file size: ~470 MB for GeoParquet and zipped CSV, 1.6 GB for CSV, and a whopping 2.6 GB for GeoPackage:

Reading performance
Pandas and GeoPandas both support selective reading of files, i.e. we can specify the specific columns to be loaded. This does speed up reading, even from CSV files:
| Whole file | Selected columns | |
| CSV | 27.9 s | 13.1 s |
| Geopackage | 2min 12s 😵 | 20.2 s |
| GeoParquet | 7.2 s | 4.1 s |
Indeed, reading the whole GeoPackage is getting quite painful.
Here’s the code I used for timing the read times:

As you can see, these times include the creation of the GeoPandas.GeoDataFrame.
If we don’t need a GeoDataFrame, we can read the files even faster:
Non-spatial DataFrames
GeoParquet files can be read by non-GIS tools, such as Pandas. This makes it easier to collaborate with people who may not be familiar with geospatial data stacks.
And reading plain DataFrames is much faster than creating GeoDataFrames:

But back to GIS …
GeoParquet in QGIS
In QGIS, GeoParquet files can be loaded like any other vector layer, thanks to GDAL:

Loading the GeoParquet and GeoPackage files is pretty quick, especially if we zoom into a small region of interest (even though, unfortunately, it doesn’t seem possible to restrict the columns to further speed up loading). Loading the CSV, however, is pretty painful due to the lack of spatial indexing, which becomes apparent very quickly in the direct comparison:

As far as I can tell, my QGIS 3.38 ‘Grenoble’ does not support writing to or editing of GeoParquet files. So I’m limited to reading GeoParquet for now.
However, seeing how much smaller GeoParquets are compared to GeoPackages (and also faster to write), I hope that we will soon get the option to export to GeoParquet.
For now, I’ll start by converting my large CSV files to GeoParquet using GeoPandas.
More reading
If you’re into GeoJSON and/or PyGeoAPI, check out Joana Simoes’ post: “Navigating GeoParquet: Lessons Learned from the eMOTIONAL Cities Project”
And if you want to see a global dataset example, have a look at Matt Travis’ presentation using Overture data:
Hello!
I’m surprised to read that you found QGIS 3.38 ‘Grenoble’ doesn’t have the ability to save GeoParquet files.
Only because I was surprised this week to discover just that feature in my LTR version QGIS 3.34.9 ‘Prizen’ + GDAL 3.9.1! Perhaps it was removed? I found it in Format dropdown under Layer > Save As.
I actually found your blog while searching for information on the indexing used (i.e. whether it indexes & bins the features based on bounding box extents).
Some operating systems / versions have it, others don’t. I don’t yet know the exact requirements. Users of conda on Ubuntu seem to not have acces to geoparquet writing yet
it seems QGIS might be faster to read geopackages if you make a tiny adjustment to them
https://gis.stackexchange.com/questions/338478/reading-geopackage-with-large-feature-classes-is-very-slow
I’m curios if the timings in this post are for a slower reading time or for an already optimal read
Thanks for the pointer, Paulo. Very interesting. The reading times in the table above is for GeoPandas (and not for QGIS). Reading times in QGIS were quite ok, if I remember correctly.
And checking for the geometry type, it already is specific, i.e. POINT:
sqlite> SELECT ROWID, "table_name", "column_name", "geometry_type_name", "srs_id", "z", "m" FROM "gpkg_geometry_columns" ORDER BY ROWID...> ;
1|aisdk_20170101|geom|POINT|4326|0|0