Archive

Visualization

Yesterday, I described my process to generate a basic population density map from the city of Vienna’s open government data. In the end of that post, I described some ideas for further improvement. Today, I want to follow-up on those ideas using what is known as dasymetric mapping. GIS Dictionary defines it well (much better than Wikipedia):

Dasymetric mapping is a technique in which attribute data that is organized by a large or arbitrary area unit is more accurately distributed within that unit by the overlay of geographic boundaries that exclude, restrict, or confine the attribute in question.
For example, a population attribute organized by census tract might be more accurately distributed by the overlay of water bodies, vacant land, and other land-use boundaries within which it is reasonable to infer that people do not live.

That’s exactly what I want to do: Based on subdistricts with population density values and auxiliary data – Corine Land Cover to be exact – I want to create an improved representation of population density within the city.

This is the population density map I start out with:

… and this is the Corine Land Cover dataset for the same area:

It shows built-up areas (red), parks and natural areas (green) as well as water-covered regions (blue). For further analysis, I follow the assumption that people only live in areas with Corine code 111 “Continuous urban fabric” and 112 “Discontinuous urban fabric”. Therefore, I use the Intersection tool to clip only these residential areas from the subdistrict polygons. The subdistrict population can now be distributed over these new, smaller areas (use Field Calculator) to create a more realistic visualization of population density:

For easier comparison, I put the original density and the dasymetric map into a looping animation. Some subdistricts change their population density values quite drastically, especially in regions where big parts covered by water or rail infrastructure were removed:

Corine Land Cover is not too detailed but I think it still usable on this scale. One thing to note is that I used data from 2006 with population data from 2012 so some areas in the outer districts will have been turned residential in the meantime. But I hope this doesn’t distort the overall picture too much.

The city of Vienna provides both subdistrict geometries and population statistics. Mapping the city’s population density should be straightforward, right? Let’s see …

We should be able to join on ZBEZ and SUB_DISTRICT_CODE, check! But what about the actual population counts? Unfortunately, there is no file which simply lists population per subdistrict. The file I found contains four lines for each subdistrict: females 2011, males 2011, females 2012 and males 2012. That’s not the perfect format for mapping general population density.

A quick way to prepare our input data is applying pivot tables, eg. in Open Office: The goal is to have one row per subdistrict and columns for population in 2011 and 2012:

Export as CSV, add CSVT and load into QGIS. Finally, we can join geometries and CSV table:

A quick look at the joined data confirms that each subdistrict now has a population value. But visualizing absolute values results in misleading maps. Big subdistricts with only average density will overrule smaller but much denser subdistricts:

That’s why we need to calculate population density. This is easy to do using Field Calculator. The subdistrict file already contains area values but even if they were missing, we could calculate it using the $area operator: "pop2012" / ($area / 10000). The resulting population density in population per ha finally shows which subdistricts are the most densely populated:

One could argue that this is still no accurate representation of population density: Big parts of some subdistricts are actually covered by water – especially along the Danube – and therefore uninhabited. There are also big parks which could be excluded from the subdistrict area. But that’s going to be the topic of another post.

If you want to use my results so far, you can download the GeoJSON file from Github.

Today, I’ve finished my submission for the Hubway Data Visualization Challenge. All parts of the resulting dataviz were created using open source tools. My toolbox for this work contains: QGIS, Spatialite, Inkscape, Gimp and Open Office Calc. To see the complete submission and read more about it, check the project page.

Today, I’ve been working on some station statistics. From the trip data, I calculated incoming and outgoing trips per station as well as the station’s first day of operations. Combining this information makes it possible to calculate the average day’s “bike balance”. A balanced station has the same number of incoming and outgoing trips while an unbalanced station will either run out of bikes or empty slots for returns.

I’ve published the resulting station map on QGIS Cloud (http://qgiscloud.com/anitagraser/hubway_cloud1) where you can have a look at the bike balance values.

Additionally, I’ve created a mashup in Leaflet pulling together background tiles from Stamen and the cloud-hosted WMS for better orientation:

Today, I’ve been experimenting with a new way to visualize origin-destination pairs (ODs). The following image shows my first results:

The ideas was to add a notion of direction as well as uncertainty. The “flower petals” have a pointed origin and grow wider towards the middle. (Looking at the final result, they should probably go much narrower towards the end again.) The area covered by the petals is a simple approximation of where I’d expect the bike routes without performing any routing.

To get there, I reprojected the connection lines to EPSG:3857 and calculated connection length and line orientation using QGIS Field Calculator $length operator and the bearing formula given in QGIS Wiki:

(atan((xat(-1)-xat(0))/(yat(-1)-yat(0)))) * 180/3.14159 + (180 *(((yat(-1)-yat(0)) < 0) + (((xat(-1)-xat(0)) < 0 AND (yat(-1) - yat(0)) >0)*2)))

For the style, I created a new “flower petal” SVG symbol in Inkscape and styled it with varying transparency values: Rare connections are more transparent than popular ones. This style is applied to the connection start points. Using the advanced options “size scale” and “rotation”, it is possible to rotate the petals into the right direction as well as scale them using the previously calculated values for connection length and orientation.

Update

While the above example uses pretty wide petals this one is done with a much narrower petal. I think it’s more appropriate for the data at hand:

Most of the connections are clearly heading south east, across Charles River, except for that group of connections pointing the opposite direction, to Harvard Square.

Hubway is a bike sharing system in Boston and they are currently hosting a data visualization challenge. What a great chance to play with some real-world data!

To get started, I loaded both station Shapefile and trip CSV into a new Spatialite database. The GUI is really helpful here – everything is done in a few clicks. Afterwards, I decided to look into which station combinations are most popular. The following SQL script creates my connections table:

create table connections (
start_station_id INTEGER,
end_station_id INTEGER,
count INTEGER,
Geometry GEOMETRY);


insert into connections select 
start_station_id, 
end_station_id, 
count(*) as count, 
LineFromText('LINESTRING('||X(a.Geometry)||' '||Y(a.Geometry)||','
                          ||X(b.Geometry)||' '||Y(b.Geometry)||')') as Geometry
 from trips, stations a, stations b
where start_station_id = a.ID 
and end_station_id = b.ID
and a.ID != b.ID
and a.ID is not NULL
and b.ID is not NULL
group by start_station_id, end_station_id;

(Note: This is for Spatialite 2.4, so there is no MakeLine() method. Use MakeLine if you are using 3.0.)

For a first impression, I decided to map popular connections with more than one hundred entries. Wider lines mean more entries. The points show the station locations and they are color coded by starting letter. (I’m not yet sure if they mean anything. They seem to form groups.)

Some of the stations don’t seem to have any strong connections at all. Others are rather busy. The city center and the dark blue axis pointing west seem most popular.

I’m really looking forward to what everyone else will be finding in this dataset.

Data from various vehicles is collected for many purposes in cities worldwide. To get a feeling for just how much data is available, I created the following video using QGIS Time Manager which has been shown at the Austrian Museum of Applied Arts “MADE 4 YOU – Design for Change”. It shows one hour of taxi tracks in the city of Vienna:

If you like the video, please go to http://www.ertico.com/2012-its-video-competition-open-vote and vote for it in the category “Videos directed at the general public”.

This post continues my quest of exploring the spatial dimension of Twitter streams. I wanted to try one of the classic spatio-temporal visualization methods: Space-time cubes where the vertical axis represents time while the other two map space. Like the two previous examples, this visualization is written in pyprocessing, a Python port of the popular processing environment.

This space-time cube shows twitter trajectories that contain at least one tweet in New York Times Square. The 24-hour day starts at the bottom of the cube and continues to the top. Trajectories are colored based on the time stamp of their start tweet.

Additionally, all trajectories are also drawn in context of the coastline (data: OpenStreetMap) on the bottom of the cube.

While there doesn’t seem to be much going on in the early morning hours, we can see quite a busy coming and going during the afternoon and evening. From the bunch of vertical lines over Times Square, we can also assume that some of our tweet authors spent a considerable time at and near Times Square.

I’ve also created an animated version. Again, I recommend to watch it in HD.

After my first shot at analyzing Twitter data visually I received a lot of great feedback. Thank you!

For my new attempt, I worked on incorporating your feedback such as: filter unrealistic location changes, show connections “grow” instead of just popping up and zoom to an interesting location. The new animation therefore focuses on Manhattan – one of the places with reasonably high geotweet coverage.

The background is based on OpenStreetMap coastline data which I downloaded using QGIS OSM plugin and rendered in pyprocessing together with the geotweets. To really see what’s going on, switch to HD resolution and full screen:

It’s pretty much work-in-progress. The animation shows similar chaotic patterns seen in other’s attempts at animating tweets. To me, the distribution of tweets looks reasonable and many of the connection lines seem to actually coincide with the bridges spanning to and from Manhattan.

This work is an attempt at discovering the potential of Twitter data and at the same time learning some pyprocessing which will certainly be useful for many future tasks. The next logical step seems to be to add information about interactions between users and/or to look at the message content. Another interesting task would be to add interactivity to the visualization.

Follow

Get every new post delivered to your Inbox.

Join 1,529 other followers

%d bloggers like this: