Moving to Medium

. . .

by mattwigway on January 24, 2016

Here we are early in 2016, and I decided it was time to revisit how I publish my writing. I’ve had a blog for a while, but I’ve recently been realizing that the future is social, and my ideas would be easier to share if they were hosted on a social platform. With that in mind, I’m going to put new posts exclusively on my Medium profile:

Matthew Wigginton Conway

For those that prefer an old fashioned web address, you can still visit, which will redirect you to Medium.

As always, I’ll continue to tweet all my new posts. This site will continue to be a place to hosts projects, etc.

Permalink to this post

Millenials and Communal Living

. . .

by mattwigway on September 12, 2015

A Citylab article from a few weeks back observes that the rate of household formation among millenials remains low. A large proportion of millenials are still living with their parents, certainly a sign of economic hardship. However, the article also notes that it is “alarming” that when millenials eventually do leave their parents’ households, they often do not form new households but instead live with other adults. The article views this as a sign of economic malaise, which indeed it is, but fails to note that there are concomitant social changes that drive millenials to live together, independent of economic concerns.

The article, and likely society at large, assumes that the reasons millenials are living together is that they cannot afford to live alone, that young adults would not choose to live with unrelated adults unless forced to to make ends meet. To some extent, this is true: rents are high and many millenials struggle to afford them, and live with others for this reason. However, there is a significant minority who could afford to live by themselves, but choose not to. The below plot shows the distribution of personal (not household) income for millenials (age 18-34) in the District of Columbia (one of the more expensive places to live in the country) who live in households comprised of at least three 18-34 year olds, and no others. The data is derived from the 2009-2013 American Community Survey Public Use Microdata Sample.

Histogram of annual personal income of millenials who live with others

Median monthly housing costs per housing unit in the District of Columbia were $1,454 in 2013 (source). Going by the common rule of thumb that housing costs should be less than 30% of income, one finds that to afford the median rent in DC one must make $58,000 per year, which is the location of the black line on the above chart. 21% (+/- 4%, 95% confidence interval using the replicate weights method to calculate standard errors) of 18-34 year olds currently living with other 18-34 year olds in the District make more than that, and could potentially afford to live alone. 21% isn’t huge number (put another way, 79% of 18-34 year olds who live together in DC couldn’t afford to live alone), but it is a significant minority.

I see several obvious explanations. One is simply that millenials don’t like to live alone. This can hardly be called out as a sign of economic malaise; rather it is a sign of social change. Millenials find social benefits in communal living, from forming lasting friendships to just having someone to hang out with in the evenings. Additionally, millenials who could afford to live alone but are choosing to live together are not just letting the savings evaporate into thin air; they are either spending them on other goods and services (a boon for the local economy) or saving them (a boon for themselves later in life).

Another possible cause has to do with where people are choosing to live. More and more, millenials are choosing to live in central areas of cities, where rents are higher and space is at a premium. Many of the people who could afford their own place somewhere in DC might not be able to afford their own place in the neighborhood in which they would like to live. Again, this is not necessarily a sign of economic malaise. There is not enough housing in the core for everyone who would like to live there to live alone, and this is reflected in housing prices. Many millenials are making a perfectly rational economic choice, choosing the benefits of living in their desired neighborhood over the sacrifice of not having their own place. The District of Columbia is fairly compact, but there are certainly lower-cost areas that may not be seen as desirable by millenials.

Finally, there may be a supply-side problem. The Census numbers are based on what people who currently live in the District pay. If you’re trying to establish a new household, you are renting or buying not from the pool of all properties, but from the pool of properties currently on the market. It’s entirely possible that median cost of properties on the market is higher than the median cost of properties that are currently occupied; people may be less likely to leave more reasonably-priced properties. Zillow, for example, finds that the median price of District of Columbia properties listed for rent there is $2,300 per month. (This discrepancy could also be due to selectivity bias, i.e. more expensive than average properties being listed on Zillow).

Economic hardship certainly is a huge factor in millenials choosing to live together, and certainly in choosing to continue living at home. The majority of millenials who live with other adults in DC could not afford the median housing cost in the District. However, affordability is not the only factor; there is also a significant minority of millenials who choose to live with a group of other adults, even though they could afford to live by themselves. We should be careful to recognize the effects of social change as well as economic hardship on household formation.

Permalink to this post

Game Theory and Snow Shoveling

. . .

by mattwigway on February 26, 2015

Snow in Washington, DC

My adopted city of Washington, DC, is shoveling out from under some of the last snow of the season. At the same time, we’ve just enacted a law that levies fines on residents who don’t clear their sidewalks, after years of debate.

There’s a very good reason to fine property owners for not shoveling their sidewalks. Snow shoveling is a what’s known as a multi-player prisoner’s dilemma. If everyone shovels, everyone is better off (because they can get around more easily, don’t face the danger of falls, don’t have to pay extra cleaning bills due to muddy/salty pant cuffs and skirt hems, &c.). However, any one person is better off if they don’t shovel, because they’ve saved time (I actually like shoveling my sidewalk, but let’s assume that the utility for shoveling is uniformly negative). Each individual isn’t affected much by the state of their own sidewalk, but rather by the state of everyone else’s.

We can represent this issue graphically, as Thomas Schelling does in chapter 7 of Micromotives and Macrobehavior. For simplicity’s sake, let’s assume that the only benefit to shoveled sidewalks is a quicker walk to wherever you need to go. Let’s also assume that it takes five minutes to shovel your sidewalk, and that you can save twenty minutes in transportation time if every sidewalk is shoveled, as opposed to none. Finally, let’s assume that if half of the sidewalks are shoveled, you save half as much time: ten minutes.

The graph below shows how much time is saved by people who do not shovel (red line), and those who do shovel, as a function of how many people choose to shovel. The time is the amount of time saved commuting, minus the time spent shoveling. As you can see, at every point there is an incentive not to shovel, but take advantage of your neighbors’ shoveled walks. However, if everyone does that, you reach a situation where no one’s walks are shoveled, and no one is saving any time. If everyone’s walks are shoveled, everyone saves time despite having to spend time shoveling (the right end of the blue line). Since the second situation is clearly the preferable one, we need to have some system to encourage people to shovel. This could take many forms: a general sense of community standards, public shaming of those who do not shovel, or a regulation with fines attached.

Plot of payoffs for shoveling your walk

We can carry the economic theory further, and say that the amount of the fine has to be greater than the distance between the two lines, but that’s not really a meaningful thing to do. For one thing, the distance between the lines is in units of time, while the fine is in units of money, two things that convert in wildly different ways for different people. Also, we must remember that the numbers we used were completely arbitrary and furthermore not uniform across the population.

Schelling also points out that there is a minimum viable coalition size, a minimum number of people who have to cooperate to shovel their sidewalks in order to realize a net gain. This is the point where the payoff curve for shoveling reaches the payoff that everyone would get if no one shoveled. In the figure above, at about 0.2, those who choose to shovel are better off than they would have been had no one chosen to shovel. This isn’t really a meaningful number here for several reasons. As mentioned before, the assumption of uniform arbitrary payoffs is invalid. Additionally, you are not equally affected by every shoveled sidewalk. A geographically dispersed coalition may be no better than no coalition, if no member of the coalition walks on the shoveled sidewalks of any other member. A geographically concentrated coalition may be viable even at small numbers, because the sidewalks being shoveled are the sidewalks most relevant to the members. (Consider a commercial strip where the business cooperate to shovel their stretch of road. Even though the coalition is small, its members benefit because customers can walk up and down their commercial strip).

Thus, fines for not shoveling are a way of encouraging a socially desirable outcome, when individual motivations would result in a socially undesirable outcome.

Permalink to this post

Dijkstra's Algorithm Visualization in the Browser

. . .

by mattwigway on December 26, 2014

visualization of Dijkstra's algorithm

Dijkstra’s algorithm is a method for finding the shortest path through a graph with non-negative edge weights. It is a relatively efficient algorithm, and is guaranteed to find the shortest path (unlike some heuristic algorithms). It’s a fairly simple algorithm, but also one that can be difficult to wrap one’s head around. Descriptions abound on the Internet, and there are a number of videos, and even a few interactive demonstrations, but I thought there needed to be a demonstration that was interactive, worked in modern browsers without a plugin, used a non-trivial graph, and was open source. So I wrote one. A prose description of the algorithm is there; I hope it’s easier to understand with the interactive component. Visualizing algorithms tends to make them easier to understand, as observed by Mike Bostock.

Permalink to this post

Accessibility Analysis with Python and OpenTripPlanner

. . .

by mattwigway on November 01, 2014

Walking time to the nearest grocery store for all Census blocks in Chicago.

OpenTripPlanner is a great bit of software for both customer-facing tools and analysis. Until recently, it had the capability to perform batch queries, calculating an origin-destination matrix or an aggregate measure of accessibility. Configuring this functionality, however, was somewhat awkward, as it used a verbose XML format that was more suited to allowing developers to configure application components than as a user-facing interface (and I say that having been one of the last defenders of this approach on the OTP mailing list).

This batch analysis tool was removed as a side effect of a restructuring and simplification of the OpenTripPlanner codebase that has been ongoing for several months. Its absence sparked a debate on the opentripplanner-dev mailing list, which broke down roughly into two camps: one camp arguing for something that is purely a configuration file, with another camp arguing for “configuration files” that are simply scripts of some sort (I argued for both camps at one point or another). Where that conversation lies now, to make a long story short, is that there are tentative plans to rebuild Batch Analyst using Java Preferences as a configuration file format.

In parallel with this development, development has been ongoing on a web-based analytics framework. This is a very useful (and just plain neat) tool for accessibility analysis in a graphical user interface driven package. This is exactly what is needed for probably the majority of those doing accessibility analysis. However, coming from a research background (quantitative/computational geography), I often want tools that I can script, commit my methods to a git repo, and integrate with other tools. That said, work on this graphical interface to Analyst has driven a rethinking of how analysis is done in OTP and the creation of many useful components.

In some personal projects, I needed to be able to run batch jobs again, and I decided to try to build a quick and dirty Python library to call the OTP analysis functions. (To be fair, integrating OTP and Python was originally proposed by Tuukka Hastrup in the aforementioned thread). The result is here. It’s a Jython library that wraps up the functionality of OTP’s analysis functions in a hacker-friendly library. I decided to take a simple approach and build a library that does one thing and one thing well: creates origin-destination matrices. What you build around that is up to you. If you want a simple cumulative accessibility measure, you can sum the number of links that are below a threshold. If you want to use a more complicated accessibility measure, with distance decays and such, you can just implement some Python code to do that.

The map above is the result of a demonstration of this project. It shows the walking time to the nearest grocery store from every Census block in Chicago. Here’s how I made it. First, I downloaded the binary distribution of OTP’s master (development) version from here. I grabbed OpenStreetMap data for Chicago from mapzen’s metro extracts site, and Census blocks and grocery store locations from the City of Chicago Data Portal. I built an OTP graph using the standard methods. I then edited the grocery stores file to have only latitude and longitude columns (because, internally, OTP seems to try to convert the other columns to integers for use as inputs to aggregators). I then ran this code to perform the analysis. It must be run in Jython as opposed to standard Python, the OTP jar must be on the Jython classpath, and the opentripplanner-jython module must be in Jython’s Python search path somewhere. I ran it like so:

CLASSPATH=~/opentripplanner/otp-latest-master.jar jython -J-Xmx8192m

The -J-Xmx8192m tells the Java Virtual Machine to use 8GB of RAM. If you don’t have that much, you can experiment with smaller numbers.

I’ll walk you through what the code does. It loads the graph which was previously built (which it expects to find in the graph subdirectory of the working directory), loads the destinations, links them to the graph, creates a batch processor with the origins, and then evaluates that batch processor on the destinations. The result of the call to BatchProcessor.eval() is an origin-destination matrix, with origins on the rows and destinations on the columns. Unfortunately, numpy is not available in Jython, so data is returned using the opentripplanner.batch.Matrix class.

This tool helps eliminate a lot of the repeated computation in classic batch analyst runs. You load the graph only once, for example, and you could link the destinations only once if you were running the batch processor multiple times, say with different mode sets. You could calculate travel times to multiple destination sets without re-running the batch processor, but by simply calling eval() more than once. Remember that adding additional destinations, or calculating accessibility for additional sets of destinations, is cheap; you’re just sampling points in the graph. Adding additional origins is expensive: for each origin, OTP builds a shortest path tree.

Under the hood, it uses the new Analyst framework, which calculates the travel time from each origin to every vertex in the graph and stores it in a time surface, which we can then sample inexpensively.

One caveat is that this library doesn’t yet support profile routing, although OTP does. Profile routing is a much better way of doing general accessibility analysis for general queries for public transportation (e.g. how long does it take to get to work) versus extremely specific queries (if I leave right now, how long exactly will it take me to get to work today, right now).

Update 2014-12-31: I added notes about memory consumption.

Permalink to this post

CC-BY-NC 4.0 by Matthew Wigginton Conway, 2011–2015.