Element 84 Logo

Accounting for Uncertainty with Empirical Bayes Smoothing

09.05.2013

One of the inherent difficulties of working with Census Data is uncertainty. The Census itself is commonly thought of as the survey taken every ten years by just about every household in the United States, called the Decennial Census. However, in the interest of having more up-to-date information in the face of a rapidly changing nation the Census Bureau also produces shorter “American Community Surveys” in one, three, and five-year increments. While this information is hopefully more reflective of current trends, the shorter survey period and smaller sample sizes mean that the level of uncertainty and the margins of error are elevated.

Part of my Summer of Maps tenure at Azavea involved working on a project for the Greater Philadelphia Coalition Against Hunger to identify populations in Philadelphia that were both vulnerable to hunger and eligible for SNAP benefits. Many of the best measures for assessing these two factors are only available through the American Community Survey, so uncertainty is unavoidable. While it’s possible to ignore or throw out high-error values, it’s also possible to strengthen uncertain estimates and weaken outliers through “rate smoothing”–specifically Empirical Bayesian Smoothing.

Empirical Bayes Smoothing uses the population in a region as a measure of the confidence in the data, with higher populations in a given area lending a higher confidence to the estimated number of events in that location. Empirical Bayesian Smoothing leaves estimates for areas with low margins of error alone, but nudges estimates in regions with high margins of error closer to the global average of the event rate. For the Hunger Coalition, the event being measured is the number of people who fall below an income-to-poverty ratio (IPR) of 1.5, which determines their eligibility for SNAP (formerly known as Food Stamp) benefits. The IPR divides an individual’s income by the poverty threshold appropriate to their household size. For example, an IPR of 1 means household income is equal to the poverty line, and an IPR of 2 refers to a household that earns twice the poverty threshold.

The simplest available GIS implementation of Empirical Bayes Smoothing is in Open GeoDa, a free spatial statistical tool developed by Arizona State University and Luc Anselin, a prominent statistician. GeoDa has many advanced statistical GIS functions, some of which aren’t even available in ArcGIS. To use Empirical Bayes Smoothing:

  1. Download GeoDa from https://spatial.uchicago.edu/ with a registered (free) account.
  2. Open a shapefile with an event and base variable.Event Example: Number of people per tract living below an IPR of 1.5
  3. Base Example: Total population of each tract
  4. Right click the map and choose “Select Rates”.
  5. Select Empirical Bayes.
  6. Select Event and Base variables, press Okay.
  7. Right click the map and choose “Save Rates”.
  8. Click Add Variable to name a new field, press Okay.

Notes: The base variable may not have any records with zero values. If the event variables are extremely small (single digits) for many areas prior to smoothing, the calculation may produce a homogenous map due to negative estimates of variance, which means that the calculated rates are zero. This can sometimes be fixed by multiplying the event and base variable fields by the same factor prior to using Empirical Bayes Smoothing. This will not change the results since the calculation computes rates and not raw estimates.

There are two other smoothing techniques within GeoDa that function differently than Empirical Bayes. The first, Spatial Empirical Bayes, uses local rather than global estimates of the event variable. These estimates are based on a weighting scheme that requires detailed knowledge of how a study area varies at a small scale. The second, Spatial Rate smoothing, uses regional instead of global or local estimates, and its estimates are also based on a selected weighting scheme.

Using Empirical Bayes smoothing on American Community Survey data for all of Philadelphia for the Hunger Coalition will result in stronger estimates, and smaller margins of error. This will create a more complete picture of SNAP Eligibility within Philadelphia, rather than a patchwork with uncertain data.