Advertisements

Getting started with GeoMesa using Geodocker

In a previous post, I showed how to use docker to run a single application (GeoServer) in a container and connect to it from your local QGIS install. Today’s post is about running a whole bunch of containers that interact with each other. More specifically, I’m using the images provided by Geodocker. The Geodocker repository provides a setup containing Accumulo, GeoMesa, and GeoServer. If you are not familiar with GeoMesa yet:

GeoMesa is an open-source, distributed, spatio-temporal database built on a number of distributed cloud data storage systems … GeoMesa aims to provide as much of the spatial querying and data manipulation to Accumulo as PostGIS does to Postgres.

The following sections show how to load data into GeoMesa, perform basic queries via command line, and finally publish data to GeoServer. The content is based largely on two GeoMesa tutorials: Geodocker: Bootstrapping GeoMesa Accumulo and Spark on AWS and Map-Reduce Ingest of GDELT, as well as Diethard Steiner’s post on Accumulo basics. The key difference is that this tutorial is written to be run locally (rather than on AWS or similar infrastructure) and that it spells out all user names and passwords preconfigured in Geodocker.

This guide was tested on Ubuntu and assumes that Docker is already installed. If you haven’t yet, you can install Docker as described in Install using the repository.

To get Geodocker set up, we need to get the code from Github and run the docker-compose command:

$ git clone https://github.com/geodocker/geodocker-geomesa.git
$ cd geodocker-geomesa/geodocker-accumulo-geomesa/
$ docker-compose up

This will take a while.

When docker-compose is finished, use a second console to check the status of all containers:

$ docker ps
CONTAINER ID        IMAGE                                     COMMAND                  CREATED             STATUS              PORTS                                        NAMES
4a238494e15f        quay.io/geomesa/accumulo-geomesa:latest   "/sbin/entrypoint...."   19 hours ago        Up 23 seconds                                                    geodockeraccumulogeomesa_accumulo-tserver_1
e2e0df3cae98        quay.io/geomesa/accumulo-geomesa:latest   "/sbin/entrypoint...."   19 hours ago        Up 22 seconds       0.0.0.0:50095->50095/tcp                     geodockeraccumulogeomesa_accumulo-monitor_1
e7056f552ef0        quay.io/geomesa/accumulo-geomesa:latest   "/sbin/entrypoint...."   19 hours ago        Up 24 seconds                                                    geodockeraccumulogeomesa_accumulo-master_1
dbc0ffa6c39c        quay.io/geomesa/hdfs:latest               "/sbin/entrypoint...."   19 hours ago        Up 23 seconds                                                    geodockeraccumulogeomesa_hdfs-data_1
20e90a847c5b        quay.io/geomesa/zookeeper:latest          "/sbin/entrypoint...."   19 hours ago        Up 24 seconds       2888/tcp, 0.0.0.0:2181->2181/tcp, 3888/tcp   geodockeraccumulogeomesa_zookeeper_1
997b0e5d6699        quay.io/geomesa/geoserver:latest          "/opt/tomcat/bin/c..."   19 hours ago        Up 22 seconds       0.0.0.0:9090->9090/tcp                       geodockeraccumulogeomesa_geoserver_1
c17e149cda50        quay.io/geomesa/hdfs:latest               "/sbin/entrypoint...."   19 hours ago        Up 23 seconds       0.0.0.0:50070->50070/tcp                     geodockeraccumulogeomesa_hdfs-name_1

At the time of writing this post, the Geomesa version installed in this way is 1.3.2:

$ docker exec geodockeraccumulogeomesa_accumulo-master_1 geomesa version
GeoMesa tools version: 1.3.2
Commit ID: 2b66489e3d1dbe9464a9860925cca745198c637c
Branch: 2b66489e3d1dbe9464a9860925cca745198c637c
Build date: 2017-07-21T19:56:41+0000

Loading data

First we need to get some data. The available tutorials often refer to data published by the GDELT project. Let’s download data for three days, unzip it and copy it to the geodockeraccumulogeomesa_accumulo-master_1 container for further processing:

$ wget http://data.gdeltproject.org/events/20170710.export.CSV.zip
$ wget http://data.gdeltproject.org/events/20170711.export.CSV.zip
$ wget http://data.gdeltproject.org/events/20170712.export.CSV.zip
$ unzip 20170710.export.CSV.zip
$ unzip 20170711.export.CSV.zip
$ unzip 20170712.export.CSV.zip
$ docker cp ~/Downloads/geomesa/gdelt/20170710.export.CSV geodockeraccumulogeomesa_accumulo-master_1:/tmp/20170710.export.CSV
$ docker cp ~/Downloads/geomesa/gdelt/20170711.export.CSV geodockeraccumulogeomesa_accumulo-master_1:/tmp/20170711.export.CSV
$ docker cp ~/Downloads/geomesa/gdelt/20170712.export.CSV geodockeraccumulogeomesa_accumulo-master_1:/tmp/20170712.export.CSV

Loading or importing data is called “ingesting” in Geomesa parlance. Since the format of GDELT data is already predefined (the CSV mapping is defined in geomesa-tools/conf/sfts/gdelt/reference.conf), we can ingest the data:

$ docker exec geodockeraccumulogeomesa_accumulo-master_1 geomesa ingest -c geomesa.gdelt -C gdelt -f gdelt -s gdelt -u root -p GisPwd /tmp/20170710.export.CSV
$ docker exec geodockeraccumulogeomesa_accumulo-master_1 geomesa ingest -c geomesa.gdelt -C gdelt -f gdelt -s gdelt -u root -p GisPwd /tmp/20170711.export.CSV
$ docker exec geodockeraccumulogeomesa_accumulo-master_1 geomesa ingest -c geomesa.gdelt -C gdelt -f gdelt -s gdelt -u root -p GisPwd /tmp/20170712.export.CSV

Once the data is ingested, we can have a look at the the created table by asking GeoMesa to describe the created schema:

$ docker exec geodockeraccumulogeomesa_accumulo-master_1 geomesa describe-schema -c geomesa.gdelt -f gdelt -u root -p GisPwd
INFO  Describing attributes of feature 'gdelt'
globalEventId       | String
eventCode           | String
eventBaseCode       | String
eventRootCode       | String
isRootEvent         | Integer
actor1Name          | String
actor1Code          | String
actor1CountryCode   | String
actor1GroupCode     | String
actor1EthnicCode    | String
actor1Religion1Code | String
actor1Religion2Code | String
actor2Name          | String
actor2Code          | String
actor2CountryCode   | String
actor2GroupCode     | String
actor2EthnicCode    | String
actor2Religion1Code | String
actor2Religion2Code | String
quadClass           | Integer
goldsteinScale      | Double
numMentions         | Integer
numSources          | Integer
numArticles         | Integer
avgTone             | Double
dtg                 | Date    (Spatio-temporally indexed)
geom                | Point   (Spatially indexed)

User data:
  geomesa.index.dtg     | dtg
  geomesa.indices       | z3:4:3,z2:3:3,records:2:3
  geomesa.table.sharing | false

In the background, our data is stored in Accumulo tables. For a closer look, open an interactive terminal in the Accumulo master image:

$ docker exec -i -t geodockeraccumulogeomesa_accumulo-master_1 /bin/bash

and open the Accumulo shell:

# accumulo shell -u root -p GisPwd

When we store data in GeoMesa, there is not only one table but several. Each table has a specific purpose: storing metadata, records, or indexes. All tables get prefixed with the catalog table name:

root@accumulo> tables
accumulo.metadata
accumulo.replication
accumulo.root
geomesa.gdelt
geomesa.gdelt_gdelt_records_v2
geomesa.gdelt_gdelt_z2_v3
geomesa.gdelt_gdelt_z3_v4
geomesa.gdelt_queries
geomesa.gdelt_stats

By default, GeoMesa creates three indices:
Z2: for queries with a spatial component but no temporal component.
Z3: for queries with both a spatial and temporal component.
Record: for queries by feature ID.

But let’s get back to GeoMesa …

Querying data

Now we are ready to query the data. Let’s perform a simple attribute query first. Make sure that you are in the interactive terminal in the Accumulo master image:

$ docker exec -i -t geodockeraccumulogeomesa_accumulo-master_1 /bin/bash

This query filters for a certain event id:

# geomesa export -c geomesa.gdelt -f gdelt -u root -p GisPwd -q "globalEventId='671867776'"
Using GEOMESA_ACCUMULO_HOME = /opt/geomesa
id,globalEventId:String,eventCode:String,eventBaseCode:String,eventRootCode:String,isRootEvent:Integer,actor1Name:String,actor1Code:String,actor1CountryCode:String,actor1GroupCode:String,actor1EthnicCode:String,actor1Religion1Code:String,actor1Religion2Code:String,actor2Name:String,actor2Code:String,actor2CountryCode:String,actor2GroupCode:String,actor2EthnicCode:String,actor2Religion1Code:String,actor2Religion2Code:String,quadClass:Integer,goldsteinScale:Double,numMentions:Integer,numSources:Integer,numArticles:Integer,avgTone:Double,dtg:Date,*geom:Point:srid=4326
d9e6ab555785827f4e5f03d6810bbf05,671867776,120,120,12,1,UNITED STATES,USA,USA,,,,,,,,,,,,3,-4.0,20,2,20,8.77192982456137,2007-07-13T00:00:00.000Z,POINT (-97 38)
INFO  Feature export complete to standard out in 2290ms for 1 features

If the attribute query runs successfully, we can advance to some geo goodness … that’s why we are interested in GeoMesa after all … and perform a spatial query:

# geomesa export -c geomesa.gdelt -f gdelt -u root -p GisPwd -q "CONTAINS(POLYGON ((0 0, 0 90, 90 90, 90 0, 0 0)),geom)" -m 3
Using GEOMESA_ACCUMULO_HOME = /opt/geomesa
id,globalEventId:String,eventCode:String,eventBaseCode:String,eventRootCode:String,isRootEvent:Integer,actor1Name:String,actor1Code:String,actor1CountryCode:String,actor1GroupCode:String,actor1EthnicCode:String,actor1Religion1Code:String,actor1Religion2Code:String,actor2Name:String,actor2Code:String,actor2CountryCode:String,actor2GroupCode:String,actor2EthnicCode:String,actor2Religion1Code:String,actor2Religion2Code:String,quadClass:Integer,goldsteinScale:Double,numMentions:Integer,numSources:Integer,numArticles:Integer,avgTone:Double,dtg:Date,*geom:Point:srid=4326
139346754923c07e4f6a3ee01a3f7d83,671713129,030,030,03,1,NIGERIA,NGA,NGA,,,,,LIBYA,LBY,LBY,,,,,1,4.0,16,2,16,-1.4060533085217,2017-07-10T00:00:00.000Z,POINT (5.43827 5.35886)
9e8e885e63116253956e40132c62c139,671928676,042,042,04,1,NIGERIA,NGA,NGA,,,,,OPEC,IGOBUSOPC,,OPC,,,,1,1.9,5,1,5,-0.90909090909091,2017-07-10T00:00:00.000Z,POINT (5.43827 5.35886)
d6c6162d83c72bc369f68bcb4b992e2d,671817380,043,043,04,0,OPEC,IGOBUSOPC,,OPC,,,,RUSSIA,RUS,RUS,,,,,1,2.8,2,1,2,-1.59453302961275,2017-07-09T00:00:00.000Z,POINT (5.43827 5.35886)
INFO  Feature export complete to standard out in 2127ms for 3 features

Functions that can be used in export command queries/filters are (E)CQL functions from geotools for the most part. More sophisticated queries require SparkSQL.

Publishing GeoMesa tables with GeoServer

To view data in GeoServer, go to http://localhost:9090/geoserver/web. Login with admin:geoserver.

First, we create a new workspace called “geomesa”.

Then, we can create a new store of type Accumulo (GeoMesa) called “gdelt”. Use the following parameters:

instanceId = accumulo
zookeepers = zookeeper
user = root
password = GisPwd
tableName = geomesa.gdelt

Geodocker

Then we can configure a Layer that publishes the content of our new data store. It is good to check the coordinate reference system settings and insert the bounding box information:

Geodocker2

To preview the WMS, go to GeoServer’s preview:

http://localhost:9090/geoserver/geomesa/wms?service=WMS&version=1.1.0&request=GetMap&layers=geomesa:gdelt&styles=&bbox=-180.0,-90.0,180.0,90.0&width=768&height=384&srs=EPSG:4326&format=application/openlayers&TIME=2017-07-10T00:00:00.000Z/2017-07-10T01:00:00.000Z#

Which will look something like this:

Geodocker3

GeoMesa data filtered using CQL in GeoServer preview

For more display options, check the official GeoMesa tutorial.

If you check the preview URL more closely, you will notice that it specifies a time window:

&TIME=2017-07-10T00:00:00.000Z/2017-07-10T01:00:00.000Z

This is exactly where QGIS TimeManager could come in: Using TimeManager for WMS-T layers. Interoperatbility for the win!

Advertisements
2 comments
  1. Vasanth Reddy said:

    getting this error while ingesting:”ERROR Unable to connect to Zookeeper instance within timeout period. Consider increasing the Zookeeper timeout by setting the system property ‘instance.zookeeper.timeout’ (default: 5000ms).”

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: