Gerrymandering math

UPDATE! The Pennsylvania Supreme Court has ruled in the LWV vs PA General Assembly case. I was an expert witness in this case, presenting an analysis based on our joint work with Alan Frieze and Maria Chikina (discussed below). The State Supreme Court has ordered PA's map redrawn on an accelerated schedule (before the 2018 midterm elections.) My expert report (along with testimony and other case documents) can be found at the second link above.

This page gives an overview of some of our work on gerrymandering. This work is on mathematics relevant to gerrymandering. In particular, we are proving mathematical theorems which are intended to be useful for the purpose of evaluating and addressing the gerrymandering problem in practice.

This work falls into two categories:

We discuss each of these below.

A protocol for drawing fairer districtings

In our paper with Ariel Procaccia and Dingli Yu, we define the following simple protocol for districting a state:

I-cut-you-freeze protocol:

Suppose we need to divide a state into n districts. Initially, we have k=0 frozen districts. On each turn:

  1. The dividing player divides the unfrozen part of the state into n-k districts and passes the result to the freezing player.
  2. The freezing player chooses one district drawn by the dividing player in this unfrozen part and freezes it. This district will be in the final districting.

The players then switch roles; so the freezing player becomes the dividing player, and they go back to step 1. This continues until all of the state is frozen, after n steps (or really n-1, since in the last step, there is no division to make).

We prove in a mathematical model of a state that the number of districts won by each party will have a reasonable and symmetric relationship with the number of voters loyal to each party in the state (e.g., it doesn't matter much who goes first, as long as the number of districts is not very small), and also that neither player can pack a specific population of voters into a single district against the wishes of the other player.

One nice thing about this kind of protocol is that it doesn't require a reconception of traditional districting requirements. For example, whatever requirements are imposed in a state currently on the people responsible for drawing its districts (e.g., that the districts are connected, are roughly equal in population, etc.) can be imposed in the context of this protocol.

Rigorously detecting gerrymandering

One key task is to be able to rigorously identify gerrymandering. It is not hard to tell that a districting favors one political party or another; for example, in 2012, the new Congressional districting of Pennsylvania resulted in Democrats winning only 5 out of the 18 seats up for election, despite winning a majority of the total Congressional vote in the state. Metrics such as the efficiency gap capture this advantage that one party may enjoy from the particular characteristics of a districting.

It has been pointed out, however, that this kind of advantage can arise naturally, without bias in how districtings are drawn, simply because of the differences in how Democrats and Republicans are distributed in various states. For example, when Democrats are more clustered in cities, it makes it more likely that "blue" districts in a state will be more heavily Democratic than "red" districts are Republican; in this scenario, Democrat voters are concentrated in fewer districts, and elect fewer representatives than would be proportional to their number.

The task then is to evaluate whether a districting not only favors a given political party, but does so to a degree that is atypical for the particular political geography of the state. For example, if we drew new district lines for Pennsylvania from scratch, would we be likely to see a districting with the same characteristics as the current one? Or is the current districting really an outlier in the space of possible districtings of Pennsylvania? Moreover, even when circumstantial evidence makes such a judgment obvious, we wish to give objective mathematical tools for the problem, which can be embraced as part of a rigorous legal framework for adjudicating gerrymandering questions.

A space of valid districtings for comparison

To make a rigorous analysis of the outlier status of a districting of a state, we need a notion of what constitutes a valid districting of a state. This allows us, in principle, to compare the current districting to the space of a valid districtings. Our analysis uses on simple criteria for validity of a districtings of a state into a prescribed number of districts, for example:

  1. Each district should be contiguous,
  2. The districts should have nearly equal population, and
  3. The districts should have reasonable shapes.

In practice, there are many possible ways of formalizing this last constraint; one simple way is to require that the total perimeter of all districts is not too large. It is also possible to add more constraints on districtings: for example that they avoid splitting certain counties, or include specified majority-minority districts. Given a notion of what constitutes a valid districting, our task is to compare hypothetical election outcomes from the present districting of a state with the those which would result from the other valid districtings of the same state. However, the space of possible valid districtings of a state is enormous, and far too large to be enumerable by a computer, even if districts are restricted to being made up of discrete units like municipal wards or census blocks. Thus the way to compare the districting of a state with the set of possible valid districtings is to make a statistical comparison. This requires being able to draw valid districtings randomly; here, we do not mean randomly as in "haphazardly", but really randomly, in the sense that we can randomly choose valid districtings in a way such that each valid districting is equally likely to be chosen.

Randomly generating districtings

Markov Chains are well-studied probabilistic processes which, among other things, can be used to generate random samples from complicated sets. The basic idea of a Markov Chain is simple: suppose I have some space of objects A I would like to draw a random sample from, but I don't have random access to the elements of the set. Instead, I begin with an arbitrary element X0 of the set (which may not be a typical random element) and make small iterative changes to the initial element X0, to produce a random sequence X0, X1, X2, X3, ... of elements of X, each slightly different, in a random way, from the one before it. This process is a Markov Chain on the space X, and it is known, under simple assumptions (for example, that each element of X can in principle be reached from every other one by a sequence of changes) that once the sequence is long enough, a term Xn in the sequence is essentially a random element of X.

Markov chains can thus be used to generate random valid districtings of a state quite easily. In this application, A is the set of all districtings of the state into the prescribed number of districts. Even though we have no direct way of accessing random elements of A, we do know at least one member of A; namely, the current districting of the state, and we can use this as X0. Small random changes can be made to districtings by swapping small areas on the boundary of a district to the neighboring district, so long as the change preserves the requirements on valid districtings. Thus by making many such swaps, we get a long sequence X0, X1, X2, X3, ..., Xn of districtings, and if n is large enough, we know that Xn is a random districting of A. Generating many such random valid districtings of the state would allow us to make a rigorous statistical statement about the status of the current districting of the state. For example, if we generate 1000 random valid districtings, and the present districting is more biased towards Republicans than all the generated districtings, than we have observed an event which should have happened with probability less than 1/1000, if the present districting was really a typical member of the space of valid districtings.

The problem with this method is that while mathematical theorems guarantee in principle that Xn is random if n is sufficiently large, they tell us nothing about how large n must be to have truly random samples, and in general, it is quite possible that n would have to be so large as to be completely impractical to generate even a single random sample. Indeed, for the districting Markov Chain, it is not even clear that it is possible to make small changes to turn any valid districting into any other, and if this is false, then one cannot generate random samples using the chain no matter how large n is. While Markov chains are used effectively across science in spite of this limitation, we have an ambitious goal: we want a method for which we can prove mathematically that the chance of a false positive (that is, saying that a districting is gerrymandered when its not) is vanishingly small, in some well-defined space of comparison districtings. We can achieve this goal with a slightly different approach, which gets at the heart of whether a districting has been drawn to intentionally favor one party.

Demonstrating bias without randomly generating districtings

Our method gives a way around this problem uses Markov Chains to demonstrate that the element X0 is an outlier in the set A, without having to run the Markov Chain for long enough to generate random samples of A. The statistical test we prove the correctness of is simple; one simply generates the sequence X0, X1, X2, X3, ..., Xk for as large a k as is desired (larger k will tend to allow better detection of bias). Now, if X0 is among the worst ε fraction of districtings observed in the Markov sequence, and ε is very small, than this would seem to indicate that X0 is somehow a very carefully chosen member of A. And indeed, Theorem 1.1 in our paper implies that this observation would have a probability of at most √ 2ε . For example, if the current districting of a state was typical of the space of valid districtings of the state, then we should have at most a 1% chance of observing that it is in the worst 1/20,000 fraction of the districtings in our Markov sequence of districtings, since 1/100=√ 2/20,000; . Such an observation would thus be statistically significant at a threshold of p=.01.

In the language of redistricting, this amounts to a test which examines whether a districting has carefully crafted partisan bias—that is, partisan bias which evaporates when small random changes are made to the districting. In the context of redistricting, the theorem from our paper proves rigorously that it is impossible for any state, with any hypothetical political geography, to have the proprerty that typical districtings have a partisan bias which evaporates quickly when small changes are made to the districtings.

Results for Pennsylvania and Wisconsin

We have applied our test to the current Congressional districting of Pennsylvania, and to the current state assembly districting of Wisconsin. In each case, we have tried various notions how to formalize the requirement that districts have reasonable shapes. The way each of these conditions works is discussed in detail to the supplement of our paper. Roughly speaking, the perimeter constraint cares about the total perimeter of all districtings, the L1 constraint cares about the how much larger the perimeter of each district is relative to what you would expect just from its area, and the L2 constraint cares about the same thing, but heavily penalizes districtings which have even a few bad districts in them. In the tables below, each entry corresponds to a Markov sequence with 240 (roughly 1 trillion) districtings, generated with the indicated parameters. The ε value, as above, is the fraction of observed districtings which were as bad as the initial districting. This is generally tiny. In fact, for some parameters, no districting in the sequence of 1 trillion districtings was as bad as the initial districting! The p value in the table is the statistical significance of the outlier status from this run, computed simply as √ 2ε .

Pennsylvania
Shape constraint (Steps) metric ε p=
Perimeter 240 ωvar 3.0974 · 10−8 2.4889 · 10−4
Perimeter 240 ωMM 5.7448 · 10−10 3.3896 · 10−5
L1 240 ωvar 5.0123 · 10−11 1.0012 · 10−5
L1 240 ωMM 5.6936 · 10−10 3.3745 · 10−5
L2 240 ωvar 8.2249 · 10−11 1.2826 · 10−5
L2 240 ωMM 6.8038 · 10−10 3.6888 · 10−5
L 240 ωvar 3.3188 · 10−13 8.1472 · 10−7
L 240 ωMM 6.9485 · 10−8 3.7279 · 10−4

The metric column gives the technical parameter used to compare the level of electoral bias between districtings; these precise choices are discussed in the supplement of our paper. Since our method is based on relative bias of districtings, however, the precise metric we use is not so important (as demonstrated by the consistency achieved from two very different metrics in the case of Pennsylvania). For our analysis of Wisconsin, the results in the following table use the efficiency gap as a metric, because of the attention given to that metric in the Gill v. Whitford case:

Wisconsin
Shape constraint (Steps) Preserve Counties? Preserve MM districts? ε p=
Perimeter 240 yes yes 2.7317 · 10−8 2.3374 · 10−4
L1 240 yes yes 1.6378 · 10−8 1.8099 · 10−4
L2 240 yes yes 1.0051 · 10−8 1.4178 · 10−4
L1 240 no no 3.5840 · 10−7 8.4664 · 10−4

For Wisconsin, we took the additional step of designing a Markov Chain which left unaltered any parts of the state belonging to counties lying entirely in one of the current districts of the Wisconsin assembly map, or any lying in one of the current Majority-Minority districts. These constraints are designed to show that the method is robust even when respecting a wide range of valid considerations map designers may have followed when designing the current districting. Our table also includes a run of the chain without these constraints.

For both states, the p values obtained are thus quite small; in principle we expect even smaller p values could be obtained with longer runs of the Markov Chain, but our relatively short runs (1 trillion steps, taking a week or two) have the advantage that they can be easily verified by third parties.

For more results on Pennsylvania, incorporating constraints on county splits and considering possible Majority Minority district issues, refer to to the report from the LWV vs PA General Assembly case.