Correlation matrices show up often in papers and anywhere data is being analyzed. They are useful because they succinctly summarize the observed relationships between a set of variables; this also makes them very good for exploratory data analysis.
However, correlation matrices by themselves are still a bit difficult
to interpret, as they are simply numbers. For example, here is the
output of the R
cor() function. There’s a lot of useful information
there, but it’s still a bit difficult to interpret.
x1 x2 x3 x4 x5 x1 0.00000000 0.03297151 0.85017673 -0.69401590 0.5354154 x2 0.03297151 0.00000000 0.01985976 -0.02100622 0.1290689 x3 0.85017673 0.01985976 0.00000000 -0.61088013 0.5123067 x4 -0.69401590 -0.02100622 -0.61088013 0.00000000 -0.5308175 x5 0.53541535 0.12906890 0.51230666 -0.53081745 0.0000000
This data can also be displayed visually, in a color-coded matrix. Here is exactly the same data, displayed in visual form:
In particular, this improves on Tufte’s 6th and 7th principles of data graphics: encouraging visual comparisons and “reveal[ing] the data at several levels of detail” (page 13). It is much easier to compare the correlations of different variables visually than by doing mental arithmetic to compare the numbers in the correlation matrix. The correlation matrix also presents the data only at a high level of specificity. The visual display, on the other hand, uses colors to display the general patterns in the data, while still having the numbers to diplay the specific relationships.
This idea can be executed in many different data analysis
environments, but I use R. The R code
used to produce the above plot follows. Calling the function
on a data frame will create and display the plot, and return the
Schelling’s segregation model is an interesting model of neighborhood dynamics developed by the economist Thomas C. Schelling. It’s an agent based model, in which agents of two groups (which could be based on income, political affiliation, race, &c.) are placed on a grid. There is some threshold for what percentage of an agent’s neighbors must be of the same group for it to be happy. For instance, agents might want 30% of their neighbors to be of the same group. If they are not happy, they move. This continues until all agents are happy.
What’s interesting about the model is that relatively low thresholds for individuals (e.g. 30%) end up leading to extreme segregation in the aggregate (Indeed, Schelling’s book is called Micromotives and Macrobehavior). This segregation can be easily seen in the above screenshot: the tolerance is set at 30%, but on average 77% of each agents neighbors are of the same group.
A full description of the model can be found in Schelling’s book Micromotives and Macrobehavior, on pages 147—155 (in the 1978 edition anyhow).Permalink to this post
Bikeshare systems have been taking off in the US of late. One of the first of these systems, Capital Bikeshare in Washington, DC, has been in operation since 2010. The automated bikeshare stations generate a wealth of information; the start and end stations and times of each trip are recorded, and are available to the public in anonymous form. This project used the approximately 4.5 million trips taken on the system from the fourth quarter of 2010 until the second quarter of 2013.
I was interested in how space and time affect the usage patterns of bikeshare systems. This data allows one to test the patterns statistically.
It is generally acknowledged that patterns of bikeshare use differ at different times of day. This makes sense; for instance, commuters may ride downtown in the morning and back to a metro station in the afternoon. To test this, eight time periods were defined: morning (6a–9a), midday (9a–3p), afternoon (3p–7p) and overnight (7p–9p) for both weekdays and weekends. These time periods match those used in the Metropolitan Washington Council of Governments travel model and add weekends. Each trip was assigned to one of these time periods, and origin-destination matrices were computed for each time period. The labels were then scrambled (preserving the number of trips in each time period, as well as the origin-destination matrix of the entire dataset). Origin-destination matrices for each randomized time period were then recomputed. Pairwise comparisons of time periods were then computed for both the observed and simulated data.
There is a statistically significant difference between every time period and every other time period. That is, the patterns of bikeshare use differ at different times of day and on the weekends. This is a driver of rebalancing: the system operator must move the bikes to meet differing demands throughout the day. The other driver of rebalancing would be if there is a general trend for the bikes to move to a certain area regardless of time period; this study did not address this question but it could be addressed with the data used.
Some stations are, of course, more popular than others. As it turns out, the popularities of the stations are spatially autocorrelated—-that is, nearby stations tend to have similar popularities. Moran’s I value is 0.78 (p < 0.05). This is not surprising; one can hypothesize several reasons for this finding. The most obvious is that there are certain areas that are more popular than others (for instance, downtown stations are probably more popular than stations in lower-density residential areas). Also, bikeshare trips require both a start and end station; stations that are popular likely have many trips to nearby stations, making those stations popular as well.
Further research could include looking more into the patterns of use by time period, attempting to determine the general flow of bikes at different times of day. One team has developed statistical models to inform rebalancing, however, they modeled each station individually as the trip-level data used in this project is not available where they were working in Chicago. This origin-destination matrices could potentially improve this type of model.
This research was undertaken in Dr. Stuart Sweeney’s Geography 172, Spatial Data Analysis, class in the Department of Geography at UCSB. For a more complete treatment of the project, see the full report.Permalink to this post
Analysis of accessibility is generally undertaken in large regions, such as metropolitan areas or entire countries. Frequently it also uses macro temporal scales, as in before-and-after analysis. This analysis instead looks at micro scales, both spatial and temporal. The study area is the University of California, Santa Barbara campus and the adjoining student community of Isla Vista.
I analyzed accessibility at every hour of a typical week, so that accessibility can be compared at different times of day and on different days. This has been done before, looking at accessibility at different times of day (page 8) in the Los Angeles area. I used tighter temporal scales (one hour instead of four chunks) and also analyzed accessibility over the entire week to allow the discernment of weekly cycles.
Only accessibility to eateries was analyzed. Data were obtained from OpenStreetMap for network data and from the UCSB Interactive Campus Map for data on eatery locations. Animations of accessibility over a typical week follow; in the darker blue areas more eateries are accessibile within five minutes' travel time. Five minutes was chosen as the cutoff because it is half of the walking time between the intersection of Pardall and Embarcadero Del Norte and the front of the University Center, two areas where many eateries are concentrated. A more systematic study would need to estimate this from travel data. Acessibility was analyzed for both walking and cycling.
The two animated maps show the accessibility to eateries at different times of day by different modes. The bicycle map shows much more accessibility because with a bicycle one can reach many more opportunities in 5 minutes' time. A daily cycle can easily be determined, with most (but not all) businesses closing in the late evening and opening again in the morning, creating a pulsing accessibility. The eateries on campus (the eastern portion of the maps) do not have the same span of service as the eateries in Isla Vista. On the weekends, most of the campus eateries are not open at all.
There are a few limitations. OpenTripPlanner’s cycling mode currently does not support bicycle parking; at UCSB, there are many bicycle parking areas where one must park before going to one’s building. At a micro scale of analysis, correctness of the network is also very important because small absolute errors can be large relative to the total length of the trip; OpenStreetMap data was improved for this project but is still not perfect, especially given construction on campus.
Further research would use behavioral data to better estimate parameters for the accessibility measure, as well as to interpret the results. Sara Matthews analyzed mode choice in trips to Humboldt State University in the context of residential location. Accessibility could be used as a independent variable in a similar analysis of mode choice.
Even in the context of comprehensive transportation models such as SimAGENT (Southern California Association of Governments) and SF-CHAMP (San Francisco County Transportation Authority), accessibility measures rendered as maps such as these are valuable. They are understandable and thus can easily be presented to non-technical decisionmakers and to the public. They also generally have more of a descriptive rather than projective role; that is, they describe current situations rather than predicting future ones. Finally, they can play a role in individual decision support; Jarrett Walker has noted the usefulness of isochrones for decision support, and these accessibility measures can play the same role. Walk Score® has recently announced understandable accessibility maps; this makes these types of measures much more available.
For a more in-depth treatment, see the full report.
I gave a presentation on the connections between Jane Jacobs and Global Cities Theory at the California Geographical Society 2013 conference. The slides from the presentation are on the Publications page.Permalink to this post
CC-BY-NC 4.0 by Matthew Wigginton Conway, 2011-2014. Created with Jekyll and Bootstrap.