Infrastructure Coverage based on Open Data

This is something I have been wanting to do for a long time: map which areas of Vienna have fast access to a certain kind of infrastructure. Now, I finally found time and data to perform this analysis. Data used is OSM road data (Cloudmade shapefile) for Austria and metro station coordinates for Vienna by Max Kossatz and Robert Harm.

Before importing the OSM roads into PostGIS, I cut out my area of interest and created a clean topology using GRASS v.clean.break. Once loaded into the database, assign_vertex_id() function does the rest and the network is ready for routing and distance calculations.
For the metro stations, I calculated the nearest network node using George MacKerron’s Nearest Neighbor function.

Catchments were calculated using driving_distance() function. It returns distance to a given metro station for all network nodes (up to a maximum distance). The result can be interpolated to show e.g. which areas are at most 1 km away from any metro station.

1 km catchments around metro stations in Vienna

Close-up look at the 1 km catchment zone border

Once set up, performing this analysis is reasonably fast. Instead of metro stations, any other infrastructure coverage can be analyzed easily. I could imagine this being really useful when looking for a new flat: “Find me an area close to work, a metro station and a highschool.”

The next great thing would be to have all data for calculation of transit travel times too. Yes, I’m looking at you Wiener Linien!

  1. mattwigway said:

    The only problem I believe I see with this—please correct me if I’m wrong—is that I don’t believe it will split the lines at the exact point where they are, say, 400m from a metro station.

    Perhaps an example will explain. Let’s say we’re in a typical gridded downtown, with all blocks exactly 250m long, and nodes placed at every intersection but nowhere else. There is a metro station at a corner. All of the intersections one block away are clearly within 400m of the metro station, but the intersections two blocks away are not; they are 100m past the boundary of the search. But if you are 150m closer to the metro station than that intersection (i.e. 1.5 blocks from the station), then you are only 350m from the metro station, but not within the service are that this algorithm will calculate.

    See also

    • underdark said:

      Thanks for your feedback! You’re right that this approach does not clip the links when reaching max costs.

      Instead, I calculated costs for every node and interpolated the results using a TIN. The resulting cost raster can then be styled to show any zones (e.g., 500m, 1km catchment, etc.). The edges of the resulting zones can very well cut links and are not limited to the nodes. I’ll post a close-up shot later. I think this approach is ok for a high-level overview but your point is valid of course.

      Best wishes

      • mattwigway said:

        Ah, I see . . . should have read your post more closely. How did you choose which metro station would serve a given node (did you calculate driving distances to all of them, then pick the shortest, choose the closest one by air distance, &c)?

      • underdark said:

        Yes, I calculated driving distances to all metro stations within 10 km of the given node. Then I picked the shortest.

        The query works like this: For every metro station node, I run driving_distance() with a maximum distance of 10 km. Then I select the minimum distance for every node in the network.

      • mattwigway said:

        Did you do it in pure SQL, or did you use a Python &c. script?

      • underdark said:

        Everything up to the interpolation has been done in SQL.

  2. Stefan said:

    Nice one! I’ll come back to you when looking for a new flat! I will only have to include flat prices and neighbour loudlyness in my analysis. Thx for inspiration, Stefan

  3. Niket said:

    Really nice article. I have one doubt, you calculated driving distance from individual node in the network to each metro station within 1km range to figure out the catchment area. What difference it will make if we take individual metro station and then calculate all the nodes within 10km distance and then remove duplicate nodes from the result?

    • underdark said:

      Hi Niket,
      I’m not sure I quite understand your question correctly. The function I mentioned, driving_distance(), starts at a given (metro) node and calculates distances until it reaches the specified maximum distance (10 km).
      Thanks for the note. I’ll try to clarify my comment above.

      • Niket said:

        Hi Anita,
        Oh okay. Now I get it right. Actually I got confused with the statement above: “…using driving_distance() function. It returns distance to a given metro station for all network nodes (up to a maximum distance).”
        Thank you for the clarification.

  4. mattwigway said:

    Did you include pedestrian links (tag highway=footpath, for example)? I don’t know how many pedestrian links there are in Vienna, but in some US locations, this can significantly change the situation, even more so if you also consider whether it’s safe to walk on certain streets; in the US, we often have large sidewalk-less arterials. Of course mapping this largely depends on the quality of the data. Check out for more on these issues.

    • underdark said:

      For this proof-of-concept, I simply used the Cloudmade shapefile entitled roads. I’m quite certain footpaths are included since it contained links through parks and other areas where you are not allowed to drive.

      There is much to be done for a more detailed analysis. Eventually, a multi-modal network would be great that can answer “How far can I drive/walk/go by public transportation/bike on bike routes from here?”.

      • mattwigway said:

        Absolutely! I’d love to see this integrated with Mapnificent, also; maybe I’ll have some time to clone their GitHub repository at some point and see how hard that would be. (I believe they use PostGIS already, but I think that the buffer zones are calculated client side with a web worker thread.)

    • underdark said:

      Thanks George,
      I’ll have a look at your changes. Are you regularly contributing to the pgRouting project?
      Best wishes

  5. Alex said:

    Your Blog is very professional and valuable.
    I am looking for k-NN implementation and nearest neighbor function (Long/Lat)for SQL Access 2000.
    May I ask You for Help, please?
    Best Regard

    • Hi Alex,
      Sorry I cannot be of help with Access. That’s not my domain.

%d bloggers like this: