The journey continues: QgsArrowIterator is now merged! This makes it possible to iterate over QgsFeatures as Arrow batches.
This is where we are now, quoting Dewey Dunnington:
import geopandas
from nanoarrow.c_array import allocate_c_array
import qgis
from qgis.core import QgsVectorLayer
# Create a vector layer
layer = QgsVectorLayer("tests/testdata/zonalstatistics/polys.shp", "layer_name", "ogr")
schema = qgis.core.QgsArrowIterator.inferSchema(layer)
it = qgis.core.QgsArrowIterator(layer.getFeatures())
it.setSchema(schema, 1)
c_array = allocate_c_array()
schema.exportToAddress(c_array.schema._addr())
it.nextFeatures(5, c_array._addr())
print(geopandas.GeoDataFrame.from_arrow(c_array))
#> lev3_name geometry
#> 0 poly_1 MULTIPOLYGON (((100.37934 -0.96049, 100.37934 ...
#> 1 poly_2 MULTIPOLYGON (((100.37944 -0.96044, 100.37955 ...
#> 2 poly_3 MULTIPOLYGON (((100.37938 -0.96049, 100.37949 ...
print(geopandas.read_file("tests/testdata/zonalstatistics/polys.shp"))
#> lev3_name geometry
#> 0 poly_1 POLYGON ((100.37934 -0.96049, 100.37934 -0.960...
#> 1 poly_2 POLYGON ((100.37944 -0.96044, 100.37955 -0.960...
#> 2 poly_3 POLYGON ((100.37938 -0.96049, 100.37949 -0.960...
Further improvements are already being planned. To quote from the ticket:
“The final state after this improvement would be a compact way for Arrow Python consumers like GeoPandas to ergonomically consume a layer. Maybe:
geopandas.GeoDataFrame.from_arrow(qgis_layer_object)
Or maybe:
geopandas.GeoDataFrame.from_arrow(qgis_layer_object.getArrowStream())
Looking forward to seeing this develop further.





























