Game Theory and Snow Shoveling

. . .

by mattwigway on February 26, 2015

Snow in Washington, DC

My adopted city of Washington, DC, is shoveling out from under some of the last snow of the season. At the same time, we’ve just enacted a law that levies fines on residents who don’t clear their sidewalks, after years of debate.

There’s a very good reason to fine property owners for not shoveling their sidewalks. Snow shoveling is a what’s known as a multi-player prisoner’s dilemma. If everyone shovels, everyone is better off (because they can get around more easily, don’t face the danger of falls, don’t have to pay extra cleaning bills due to muddy/salty pant cuffs and skirt hems, &c.). However, any one person is better off if they don’t shovel, because they’ve saved time (I actually like shoveling my sidewalk, but let’s assume that the utility for shoveling is uniformly negative). Each individual isn’t affected much by the state of their own sidewalk, but rather by the state of everyone else’s.

We can represent this issue graphically, as Thomas Schelling does in chapter 7 of Micromotives and Macrobehavior. For simplicity’s sake, let’s assume that the only benefit to shoveled sidewalks is a quicker walk to wherever you need to go. Let’s also assume that it takes five minutes to shovel your sidewalk, and that you can save twenty minutes in transportation time if every sidewalk is shoveled, as opposed to none. Finally, let’s assume that if half of the sidewalks are shoveled, you save half as much time: ten minutes.

The graph below shows how much time is saved by people who do not shovel (red line), and those who do shovel, as a function of how many people choose to shovel. The time is the amount of time saved commuting, minus the time spent shoveling. As you can see, at every point there is an incentive not to shovel, but take advantage of your neighbors' shoveled walks. However, if everyone does that, you reach a situation where no one’s walks are shoveled, and no one is saving any time. If everyone’s walks are shoveled, everyone saves time despite having to spend time shoveling (the right end of the blue line). Since the second situation is clearly the preferable one, we need to have some system to encourage people to shovel. This could take many forms: a general sense of community standards, public shaming of those who do not shovel, or a regulation with fines attached.

Plot of payoffs for shoveling your walk

We can carry the economic theory further, and say that the amount of the fine has to be greater than the distance between the two lines, but that’s not really a meaningful thing to do. For one thing, the distance between the lines is in units of time, while the fine is in units of money, two things that convert in wildly different ways for different people. Also, we must remember that the numbers we used were completely arbitrary and furthermore not uniform across the population.

Schelling also points out that there is a minimum viable coalition size, a minimum number of people who have to cooperate to shovel their sidewalks in order to realize a net gain. This is the point where the payoff curve for shoveling reaches the payoff that everyone would get if no one shoveled. In the figure above, at about 0.2, those who choose to shovel are better off than they would have been had no one chosen to shovel. This isn’t really a meaningful number here for several reasons. As mentioned before, the assumption of uniform arbitrary payoffs is invalid. Additionally, you are not equally affected by every shoveled sidewalk. A geographically dispersed coalition may be no better than no coalition, if no member of the coalition walks on the shoveled sidewalks of any other member. A geographically concentrated coalition may be viable even at small numbers, because the sidewalks being shoveled are the sidewalks most relevant to the members. (Consider a commercial strip where the business cooperate to shovel their stretch of road. Even though the coalition is small, its members benefit because customers can walk up and down their commercial strip).

Thus, fines for not shoveling are a way of encouraging a socially desirable outcome, when individual motivations would result in a socially undesirable outcome.

Permalink to this post

Dijkstra's Algorithm Visualization in the Browser

. . .

by mattwigway on December 26, 2014

visualization of Dijkstra's algorithm

Dijkstra’s algorithm is a method for finding the shortest path through a graph with non-negative edge weights. It is a relatively efficient algorithm, and is guaranteed to find the shortest path (unlike some heuristic algorithms). It’s a fairly simple algorithm, but also one that can be difficult to wrap one’s head around. Descriptions abound on the Internet, and there are a number of videos, and even a few interactive demonstrations, but I thought there needed to be a demonstration that was interactive, worked in modern browsers without a plugin, used a non-trivial graph, and was open source. So I wrote one. A prose description of the algorithm is there; I hope it’s easier to understand with the interactive component. Visualizing algorithms tends to make them easier to understand, as observed by Mike Bostock.

Permalink to this post

Accessibility Analysis with Python and OpenTripPlanner

. . .

by mattwigway on November 01, 2014

Walking time to the nearest grocery store for all Census blocks in Chicago.

OpenTripPlanner is a great bit of software for both customer-facing tools and analysis. Until recently, it had the capability to perform batch queries, calculating an origin-destination matrix or an aggregate measure of accessibility. Configuring this functionality, however, was somewhat awkward, as it used a verbose XML format that was more suited to allowing developers to configure application components than as a user-facing interface (and I say that having been one of the last defenders of this approach on the OTP mailing list).

This batch analysis tool was removed as a side effect of a restructuring and simplification of the OpenTripPlanner codebase that has been ongoing for several months. Its absence sparked a debate on the opentripplanner-dev mailing list, which broke down roughly into two camps: one camp arguing for something that is purely a configuration file, with another camp arguing for “configuration files” that are simply scripts of some sort (I argued for both camps at one point or another). Where that conversation lies now, to make a long story short, is that there are tentative plans to rebuild Batch Analyst using Java Preferences as a configuration file format.

In parallel with this development, development has been ongoing on a web-based analytics framework. This is a very useful (and just plain neat) tool for accessibility analysis in a graphical user interface driven package. This is exactly what is needed for probably the majority of those doing accessibility analysis. However, coming from a research background (quantitative/computational geography), I often want tools that I can script, commit my methods to a git repo, and integrate with other tools. That said, work on this graphical interface to Analyst has driven a rethinking of how analysis is done in OTP and the creation of many useful components.

In some personal projects, I needed to be able to run batch jobs again, and I decided to try to build a quick and dirty Python library to call the OTP analysis functions. (To be fair, integrating OTP and Python was originally proposed by Tuukka Hastrup in the aforementioned thread). The result is here. It’s a Jython library that wraps up the functionality of OTP’s analysis functions in a hacker-friendly library. I decided to take a simple approach and build a library that does one thing and one thing well: creates origin-destination matrices. What you build around that is up to you. If you want a simple cumulative accessibility measure, you can sum the number of links that are below a threshold. If you want to use a more complicated accessibility measure, with distance decays and such, you can just implement some Python code to do that.

The map above is the result of a demonstration of this project. It shows the walking time to the nearest grocery store from every Census block in Chicago. Here’s how I made it. First, I downloaded the binary distribution of OTP’s master (development) version from here. I grabbed OpenStreetMap data for Chicago from mapzen’s metro extracts site, and Census blocks and grocery store locations from the City of Chicago Data Portal. I built an OTP graph using the standard methods. I then edited the grocery stores file to have only latitude and longitude columns (because, internally, OTP seems to try to convert the other columns to integers for use as inputs to aggregators). I then ran this code to perform the analysis. It must be run in Jython as opposed to standard Python, the OTP jar must be on the Jython classpath, and the opentripplanner-jython module must be in Jython’s Python search path somewhere. I ran it like so:

CLASSPATH=~/opentripplanner/otp-latest-master.jar jython -J-Xmx8192m

The -J-Xmx8192m tells the Java Virtual Machine to use 8GB of RAM. If you don’t have that much, you can experiment with smaller numbers.

I’ll walk you through what the code does. It loads the graph which was previously built (which it expects to find in the graph subdirectory of the working directory), loads the destinations, links them to the graph, creates a batch processor with the origins, and then evaluates that batch processor on the destinations. The result of the call to BatchProcessor.eval() is an origin-destination matrix, with origins on the rows and destinations on the columns. Unfortunately, numpy is not available in Jython, so data is returned using the opentripplanner.batch.Matrix class.

This tool helps eliminate a lot of the repeated computation in classic batch analyst runs. You load the graph only once, for example, and you could link the destinations only once if you were running the batch processor multiple times, say with different mode sets. You could calculate travel times to multiple destination sets without re-running the batch processor, but by simply calling eval() more than once. Remember that adding additional destinations, or calculating accessibility for additional sets of destinations, is cheap; you’re just sampling points in the graph. Adding additional origins is expensive: for each origin, OTP builds a shortest path tree.

Under the hood, it uses the new Analyst framework, which calculates the travel time from each origin to every vertex in the graph and stores it in a time surface, which we can then sample inexpensively.

One caveat is that this library doesn’t yet support profile routing, although OTP does. Profile routing is a much better way of doing general accessibility analysis for general queries for public transportation (e.g. how long does it take to get to work) versus extremely specific queries (if I leave right now, how long exactly will it take me to get to work today, right now).

Update 2014-12-31: I added notes about memory consumption.

Permalink to this post

Using GeoTools with Multiple User Accounts

. . .

by mattwigway on October 20, 2014

I have a situation where I have multiple GeoTools applications being run on a server by different users. I was having a lot of issues with exceptions about not being able to decode CRS codes, even though I was sure I had the gt-epsg-hsql file included in my JAR, and had set up Maven correctly to include the dependency.

It turns out the issue was that the gt-epsg-hsql extracts its hsql database of projections to Geotools in the system temporary directory, and if there are multiple geotools apps running as different users, the first one to start gets the directory, and the remaining ones crash because they don’t have permissions to access it.

The workaround is to use separate temporary directories for each user. The easy way to do this is TMPDIR=`mktemp -d` application, which creates a new unique temporary directory each time an application is started.

Permalink to this post

Running RStudio Server on Amazon EC2

. . .

by mattwigway on August 10, 2014

RStudio Server running on Amazon AWS, accessed via SSH tunneling

R is a great environment for statistical computing; I’ve used it in a number of projects. RStudio is hands-down the best IDE for R (there is even less debate here than there is about emacs being the best editor). Sometimes, though, I find that I need to run analyses that require more computing power than my personal computer can provide (especially since my desktop is currently in storage in California and I’m in Illinois with a circa-2007 netbook with 2GB of RAM).

Amazon EC2 is a great solution for this type of issue; it allows you to spin up powerful computers on-demand and access them over ssh. You don’t get a graphical interface, though, which precludes running RStudio Desktop. However, RStudio provides a server tool that allows you to run R on a server and access it through your browser. Configuring it on EC2, however, is a wee bit tricky because most people use public key authentication to access their instances (which is good for security), while RStudio assumes that you can log in with a password. One solution is to switch to password authentication for your entire instance, but it’s possible (and more secure) to keep the public key authentication. Here’s how to do it.

  1. Spin up an EC2 instance. I like to use the ones from Ubuntu Cloud.
  2. You can now log in using your key pair like so: ssh -i ~/.ssh/key.pem ubuntu@<aws-ip>
  3. Install a recent version of R from CRAN by following the directions at (assuming you’re on Ubuntu).
  4. Download RStudio Server and install it. You may need to replace gdebi with dpkg -i because X11 is not available.
  5. Add the following line to /etc/rstudio/rserver.conf. This forces RStudio to listen only for connections on localhost, so that a public key is still needed to access it. www-address=
  6. Restart RStudio Server: sudo restart rstudio-server
  7. Create a password for your login account: sudo passwd your-user-name. You won’t be able to SSH in with this (assuming that you only allow public key auth), but you’ll use it to log into RStudio.
  8. On your local machine, run ssh -N -L localhost:8787:localhost:8787 -i ~/.ssh/aws2.pem ubuntu@<aws-ip>. This forwards the RStudio Server session securely to your computer using SSH tunneling. Note that any user of your local machine can now access RStudio Server, although they’ll need the password you created in step 7.
  9. Go to http://localhost:8787/ and log in with the password you just created.
  10. Analyze away.

Note that you are just accessing RStudio on AWS, so you’ll need to have all of your data and R scripts on the server.

Permalink to this post

CC-BY-NC 4.0 by Matthew Wigginton Conway, 2011–2015.