Archive

Visualization

Today, I’ve been working on some station statistics. From the trip data, I calculated incoming and outgoing trips per station as well as the station’s first day of operations. Combining this information makes it possible to calculate the average day’s “bike balance”. A balanced station has the same number of incoming and outgoing trips while an unbalanced station will either run out of bikes or empty slots for returns.

I’ve published the resulting station map on QGIS Cloud (http://qgiscloud.com/anitagraser/hubway_cloud1) where you can have a look at the bike balance values.

Additionally, I’ve created a mashup in Leaflet pulling together background tiles from Stamen and the cloud-hosted WMS for better orientation:

Today, I’ve been experimenting with a new way to visualize origin-destination pairs (ODs). The following image shows my first results:

The ideas was to add a notion of direction as well as uncertainty. The “flower petals” have a pointed origin and grow wider towards the middle. (Looking at the final result, they should probably go much narrower towards the end again.) The area covered by the petals is a simple approximation of where I’d expect the bike routes without performing any routing.

To get there, I reprojected the connection lines to EPSG:3857 and calculated connection length and line orientation using QGIS Field Calculator $length operator and the bearing formula given in QGIS Wiki:

(atan((xat(-1)-xat(0))/(yat(-1)-yat(0)))) * 180/3.14159 + (180 *(((yat(-1)-yat(0)) < 0) + (((xat(-1)-xat(0)) < 0 AND (yat(-1) - yat(0)) >0)*2)))

For the style, I created a new “flower petal” SVG symbol in Inkscape and styled it with varying transparency values: Rare connections are more transparent than popular ones. This style is applied to the connection start points. Using the advanced options “size scale” and “rotation”, it is possible to rotate the petals into the right direction as well as scale them using the previously calculated values for connection length and orientation.

Update

While the above example uses pretty wide petals this one is done with a much narrower petal. I think it’s more appropriate for the data at hand:

Most of the connections are clearly heading south east, across Charles River, except for that group of connections pointing the opposite direction, to Harvard Square.

Hubway is a bike sharing system in Boston and they are currently hosting a data visualization challenge. What a great chance to play with some real-world data!

To get started, I loaded both station Shapefile and trip CSV into a new Spatialite database. The GUI is really helpful here – everything is done in a few clicks. Afterwards, I decided to look into which station combinations are most popular. The following SQL script creates my connections table:

create table connections (
start_station_id INTEGER,
end_station_id INTEGER,
count INTEGER,
Geometry GEOMETRY);


insert into connections select 
start_station_id, 
end_station_id, 
count(*) as count, 
LineFromText('LINESTRING('||X(a.Geometry)||' '||Y(a.Geometry)||','
                          ||X(b.Geometry)||' '||Y(b.Geometry)||')') as Geometry
 from trips, stations a, stations b
where start_station_id = a.ID 
and end_station_id = b.ID
and a.ID != b.ID
and a.ID is not NULL
and b.ID is not NULL
group by start_station_id, end_station_id;

(Note: This is for Spatialite 2.4, so there is no MakeLine() method. Use MakeLine if you are using 3.0.)

For a first impression, I decided to map popular connections with more than one hundred entries. Wider lines mean more entries. The points show the station locations and they are color coded by starting letter. (I’m not yet sure if they mean anything. They seem to form groups.)

Some of the stations don’t seem to have any strong connections at all. Others are rather busy. The city center and the dark blue axis pointing west seem most popular.

I’m really looking forward to what everyone else will be finding in this dataset.

Data from various vehicles is collected for many purposes in cities worldwide. To get a feeling for just how much data is available, I created the following video using QGIS Time Manager which has been shown at the Austrian Museum of Applied Arts “MADE 4 YOU – Design for Change”. It shows one hour of taxi tracks in the city of Vienna:

If you like the video, please go to http://www.ertico.com/2012-its-video-competition-open-vote and vote for it in the category “Videos directed at the general public”.

This post continues my quest of exploring the spatial dimension of Twitter streams. I wanted to try one of the classic spatio-temporal visualization methods: Space-time cubes where the vertical axis represents time while the other two map space. Like the two previous examples, this visualization is written in pyprocessing, a Python port of the popular processing environment.

This space-time cube shows twitter trajectories that contain at least one tweet in New York Times Square. The 24-hour day starts at the bottom of the cube and continues to the top. Trajectories are colored based on the time stamp of their start tweet.

Additionally, all trajectories are also drawn in context of the coastline (data: OpenStreetMap) on the bottom of the cube.

While there doesn’t seem to be much going on in the early morning hours, we can see quite a busy coming and going during the afternoon and evening. From the bunch of vertical lines over Times Square, we can also assume that some of our tweet authors spent a considerable time at and near Times Square.

I’ve also created an animated version. Again, I recommend to watch it in HD.

After my first shot at analyzing Twitter data visually I received a lot of great feedback. Thank you!

For my new attempt, I worked on incorporating your feedback such as: filter unrealistic location changes, show connections “grow” instead of just popping up and zoom to an interesting location. The new animation therefore focuses on Manhattan – one of the places with reasonably high geotweet coverage.

The background is based on OpenStreetMap coastline data which I downloaded using QGIS OSM plugin and rendered in pyprocessing together with the geotweets. To really see what’s going on, switch to HD resolution and full screen:

It’s pretty much work-in-progress. The animation shows similar chaotic patterns seen in other’s attempts at animating tweets. To me, the distribution of tweets looks reasonable and many of the connection lines seem to actually coincide with the bridges spanning to and from Manhattan.

This work is an attempt at discovering the potential of Twitter data and at the same time learning some pyprocessing which will certainly be useful for many future tasks. The next logical step seems to be to add information about interactions between users and/or to look at the message content. Another interesting task would be to add interactivity to the visualization.

QGIS Cloud is a great cloud hosting service for QGIS Server by Sourcepole. After online registration and installation of an (experimental) plugin, QGIS projects can be uploaded to the cloud quite comfortably.

For a quick test, I tried to recreate the map from “Open Data for Physical Maps”. Right now, one of the limitations is that raster layers cannot be uploaded. Instead of the SRTM data, I therefore chose OCM landscape from OpenLayers plugin to provide some hillshade. The process of uploading data and publishing the project went smoothly and I didn’t encounter any problems.

You can explore the map online at qgiscloud.com/anitagraser/corine_austria.

Considering the complexity of the Corine dataset, rendering is quite fast – certainly much better than on my notebook.

Twitter streams are curious things, especially the spatial data part. I’ve been using Tweepy to collect tweets from the public timeline and what did I discover? Tweets can have up to three different spatial references: “coordinates”, “geo” and “place”. I’ll still have to do some more reading on how to interpret these different attributes.

For now, I have been using “coordinates” to explore the contents of a stream which was collected over a period of five hours using

stream.filter(follow=None,locations=(-180,-90,180,90))

for global coverage. In the video, each georeferenced tweet produces a new dot on the map and if the user’s coordinates change, a blue arrow is drawn:

While pretty, these long blue arrows seem rather suspicious. I’ve only been monitoring the stream for around five hours. Any cross-Atlantic would take longer than that. I’m either misinterpreting the tweets or these coordinates are fake. Seems like it is time to dive deeper into the data.

Corine Land Cover is a European program to create a land cover inventory of Europe. The data is freely available and a valuable input for many analyses. In this post, we’ll be using it to create a physical map.

For the background, reused the hillshade presented in “Mapping Open Data With Open GIS”

Instead of the standard grayscale, I defined a sand-colored colormap that looks warmer and more natural:

On top of this hillshade, I put the Corine land cover layer. Instead of the official, rather bright colors I selected a more neutral color palette and varying transparency values: Water areas are drawn with no transparency while forests are set to 50 % and built-up areas to up to 80 % transparency. I also skipped classes such as “bare rock” by setting them to be totally transparent:

On top of the land cover, I added a river dataset and styled it with the same color used for water surfaces in the Corine layer. Obviously, this is an optional step. Big rivers are visible within the land cover data too.

After adding a mask and labels, the map is ready to add the finishing touches in Print Composer: Title, explanatory text and a scale bar. I decided against adding a legend to this particular map since I hope that the color choices are intuitive enough.

If you want to create your own physical map, you can use Corine Land Cover for European regions or the National Vegetation Classification in the U.S.

This post explores some cartographic features of QGIS while mapping the river network of Tirol, Austria. All data used is freely available.

For the background, I downloaded NASA SRTM elevation data from CGIAR-CSI and created a hillshade using Terrain Analysis tools in QGIS 1.8.

To emphasize both state borders and the fact that Tirol consists of two separate areas, I created a mask using the Difference tool and styled it a transparent white.

The river network is too dense to label all rivers on an A4 map. Expression-based labels make it possible to only label selected features. For this dataset, the expression limits labeling to features with certain values of GEW_WRRL attribute:

CASE WHEN (GEW_WRRL = '10.000 km2 Fluss' OR "GEW_WRRL"= '4.000 km2 Fluss' OR "GEW_WRRL"='1.000 km2 Fluss') AND length("GEW_NAME_A") < 10 THEN "GEW_NAME_A" END

Labels of neighboring areas together with map title, descriptions and scale bar were added in Print Composer.

Working with Print Composer, it is useful to know that you can use Copy&Paste to duplicate map components and right-click to lock them from being moved. Also, every new component by default comes with a black outline and white background which can (and should) be disabled/changed in “General options”.

This is the final QGIS Print Composer output – without any further post-processing in Inkscape or Gimp: