🌬️🗳 Assignment 2: Wind Turbines, Matching, and Difference-in-Differences

Assignment instructions

Working with classmates to troubleshoot code and concepts is encouraged. If you collaborate, list collaborators at the top of your submission.

All written responses must be written independently (in your own words).

Keep your work readable: Use clear headings and label plot elements thoughtfully.

Assignment submission (YOUR NAME): ______________________________________

Introduction

In this assignment you will be doing political weather forecasting except the “storms” we care about are electoral swings that might follow local wind turbine development.

In Stokes (2015), the idea is that a policy with diffuse benefits (cleaner electricity) can create concentrated local costs (turbines nearby), and those local opponents may “send a signal” at the ballot box (i.e., NIMBYISM). Your job is to use two statistical tools:

Matching: Can we create a more apples-to-apples comparison between precincts that did vs. did not end up near turbine proposals?
Fixed effects + Difference-in-Differences: Can we use repeated elections to estimate how within-precinct changes in turbine exposure relate to changes in incumbent vote share?

Learning goal: Replicate the matching and fixed effects analyses from study:

Stokes (2015): “Electoral Backlash against Climate Policy: A Natural Experiment on Retrospective Voting and Local Resistance to Public Policy.

Study: Stokes (2015) - Article
Data source: Dataverse-Stokes2015

NOTE: Replication of study estimates will be approximate. An alternative matching procedure and fixed effects estimation package are utilized in this assignment for illustration purposes.

Setup: Load libraries

Load libraries (+ install if needed)

library(tidyverse)
library(here)
library(janitor)
library(jtools)

library(gtsummary)
library(gt)

library(MatchIt) # matching
library(cobalt)  # balance + love plots

library(fixest) # fast fixed effects
library(scales) # plotting

Part 1: Study Background

1A. Dive into the details of the study design and evaluation plan

Goal: Get familiar with the study setting, environmental issue, and policy under evaluation.

NOTE: Read over study to inform your response to the assignment questions. For this assignment we will skip-over sections that describe the Instrumental Variables identification strategy. We will cover instrumental variable designs weeks 6-7.

1A.Q1 Summarize the environmental policy issue, the outcome of interest, and the intervention being evaluated. Be sure to include a brief description of each of the following key elements of the study: unit of analysis, outcome, treatment, comparison group):

Response: _________________________

1A.Q2 Why might turbine proposals be correlated with baseline political preferences or rural areas? Provide 2 plausible mechanisms, and explain why that creates confounding.

Response: _________________________

1B. Break down the causal inference strategy and identify threats to identification:

1B.Q1 What is the key identifying assumption for a fixed effects / Difference-in-Difference design? Explain how this assumption when satisfied provides evidence of causal effect:

Response: _________________________

1B.Q2 What is the reason for using a fixed effects approach from a causal inference perspective? Summarize within the context of study (in your own words).

Response: _________________________

1B.Q3 What part of the SUTVA assumption is most likely violated in the context of this study design (and why)?

Response: _________________________

1B.Q4 Why does spillover matter when estimating an unbiased treatment effect?

Response: _________________________

1B.Q5 How do the authors assess the risk of spillovers, and what analytic choice do they make to attempt to mitigate the risk that spillover biases the causal estimate?

Response: _________________________

Part 2: Matching

We will start by evaluating the 2007 survey (cross-sectional) data. Treatment is defined by whether a precinct is near a turbine proposal (within 3 km).

Goal: Match precincts using pre-treatment covariates and then estimate the effect of proposed wind turbines on incumbent vote share.

2A. Load data for matching

Read in data file stokes15_survey2007.csv
Code precinct_id and district_id as factors
Take a look at the data

match_data <-

2A.Q1 Intuition check: Why match? Explain rationale for using this method.

Response: _________________________

2B. Check imbalance (before matching)

Create a covariate balance table comparing treated and control precincts
Treatment indicator: proposed_turbine_3km
Include pre-treatment covariates: log_home_val_07, p_uni_degree, log_median_inc, log_pop_denc
Use the tbl_summary() function from the {gtsummary} package.

match_data %>%

2B.Q1 Summarize the table output: Which covariates look balanced/imbalanced?

Response: _________________________

2B.Q2 Describe in your own words why these covariates might be expected to confound the treatment estimate:

Response (2-4 sentences): _________________________

2B.Q3 Intuition check: What type of data do you need to conduct a matching analysis?

Response: _________________________

Conduct matching estimation using the {`MatchIt`} package:

📜 Documentation - MatchIt

Learning goals:

Approximate the Mahalanobis matching method used in Stokes (2015)
Implement another common matching approach called propensity score matching

NOTE: In the replication code associated with Stokes (2015) the {AER} package is used for Mahalanobis matching. In this assignment we use the {MatchIt} package. The results are comparable but will not be exactly the same.

2C. Mahalanobis nearest-neighbor matching

Conduct Mahalanobis matching
Use nearest-neighbor match without replacement using Mahalanobis distance
Use 1-to-1 matching (match one control unit to each treatment unit)
Extract the matched data using match.data()

set.seed(2412026)

match_model <- matchit(
     # Treatment_indicator ~  Pre_treatment_covariates
  data = match_data, 
  method = "nearest",       # Nearest neighbor matching
  distance = "mahalanobis", # Mahalanobis distance
  ratio = 1,                # Match one control unit to one treatment unit (1:1 matching)
  replace = FALSE           # Control observations are not replaced
)

# Extract matched data
matched_data <- match.data(match_model)

summary(match_model)

2C.Q1 Using the summary() output: Which covariate had the largest and smallest Std. Mean Diff. before matching. Next, compare largest/smallest Std. Mean Diff. after matching.

Response: _________________________

2D. Create a “love plot” using `love.plot()` ❤️

📜 Documentation - cobalt

Plot mean differences for data before & after matching across all pre-treatment covariates
This is an effective way to evaluate how effective matching was at achieving balance.

Make a love plot of standardized mean differences (SMDs) before vs after matching.
Include a threshold line at 0.1.
In love plot display mean.diffs

new_names <- data.frame(
    old = c("log_home_val_07", "p_uni_degree", "log_median_inc", "log_pop_denc"),
    new = c("Home Value (log)", "Percent University Degree",
            "Median Income (log)", "Population Density (log)"))

# Love plot
love.plot(match_model, stats = "mean.diffs",
          thresholds = c(m = 0.1),
          var.names = new_names)

2D.Q1 Interpret the love plot in your own words:

Response: _________________________

Propensity score matching

2E. Propensity Score Matching (PSM)

Estimate 1:1 nearest-neighbor Propensity Score Matching
Same code as above except change distance = "logit"

set.seed(2412026)

propensity_scores <-

Create table displaying covariate balance using `cobalt::bal.tab()`

📜 Documentation - cobalt

Use bal.tab() to report balance before and after matching.

bal.tab(propensity_scores, 
        var.names = new_names)

2E.Q1 Compare Mahalanobis vs propensity score matching. Which method did a better job at achieving balance?

Response: _________________________

2F. Estimate an effect in the matched sample

Using the matched data (Mahalanobis method), estimate the effect of treatment on the change in incumbent vote share (change_liberal).

reg_match <- 

#summ(, model.fit = FALSE)

2F.Q1 Have you identified a causal estimate using this approach: Why or why not?

Response: _________________________

2F.Q2 When using a matching method, what is the main threat to causal identification?

Response: _________________________

2F.Q3 Describe why the treatment estimate represents the Average Treatment for the Treated (ATT) and explain why this is the case relative to estimation of the Average Treatment Effect (ATE).

Response: _________________________

Part 3: Panel Data, Fixed Effects, and Difference-in-Difference

Data source: Dataverse-Stokes2015

3A: Read in the panel data + code variables `precinct_id` and `year` as factors

panel_data <- 

# HINT: Try running `tabyl(panel_data$year)`. Review article to make sense of the row numbers (n).

3A.Q1: Why are there 18,558 rows in panel_data?

Response: _________________________

# How many years are included in the panel?

# How many precincts are there?

3A.Q2: How many unique precincts are ever treated (i.e., proposed & operational)?

Response: _________________________

panel_data %>%
  group_by(precinct_id) %>%
  summarise(
    ever_proposed    = any(proposed_turbine == 1, na.rm = TRUE),
    ever_operational = any(operational_turbine == 1, na.rm = TRUE),
    .groups = "drop") %>%
  summarise(
    n_ever_proposed    = sum(ever_proposed),
    n_ever_operational = sum(ever_operational))

3B. Plot and evaluate parallel trends: Replicate `Figure.2` (Stokes, 2015)

Create indicators for whether each precinct is ever treated by 2011 (treat_p, treat_o; separate indicator for proposals and operational turbines).
Plot mean incumbent vote share by year for treated vs control precincts (with 95% CIs).
Facet by turbine type (proposed & operational)

Step 1: Prepare data

trends_data <- panel_data %>%
  group_by(precinct_id) %>%
  mutate(
    treat_p = as.integer(any(proposed_turbine == 1, na.rm = TRUE)),  # ever proposed (in any year)
    treat_o = as.integer(any(operational_turbine == 1, na.rm = TRUE))) %>% # ever operational (in any year)
  ungroup() %>% 
  pivot_longer(c(treat_p, treat_o),
               names_to = "turbine_type", values_to = "treat") %>% 
  mutate(
      turbine_type = factor(turbine_type,
                            levels = c("treat_p", "treat_o"),
                            labels = c("Proposed turbines", "Operational turbines")),  
    status = if_else(treat == 1, "Treated", "Control"),
    year   = factor(year))

Step 2: Create trends plot

pd <- position_dodge(width = 0.15)

trends_data %>%
  group_by(turbine_type, status, year) %>%
  summarise(
    mean = mean(perc_lib, na.rm = TRUE),
    n    = sum(!is.na(perc_lib)),
    se   = sd(perc_lib, na.rm = TRUE) / sqrt(n), 
    ci   = qt(.975, df = pmax(n - 1, 1)) * se,
    .groups = "drop") %>%
ggplot(aes(year, mean, color = status, group = status)) +
  geom_line(position = pd, linewidth = 1.2) +
  geom_point(position = pd, size = 2.6) +
  geom_errorbar(
    aes(ymin = mean - ci, ymax = mean + ci),
    position = pd, width = .12, linewidth = .7, color = "black") +
  facet_wrap(~ turbine_type, nrow = 1) +
  scale_color_manual(values = c(Control = "#0072B2", Treated = "#B22222")) +
  scale_y_continuous(labels = percent_format(accuracy = 1)) +
  coord_cartesian(ylim = c(.20, .57)) +
  labs(
    title = "Figure 2. Trends in the Governing Party’s Vote Share",
    x = "Election Year",
    y = "Liberal Party Vote Share",
    color = NULL) +
  theme_minimal(base_size = 14) +
  theme(
    panel.grid.minor = element_blank(),
    legend.position = "bottom",
    strip.text = element_text(face = "bold"))

3B.Q1: Write a short paragraph assessing the parallel trends assumption for each outcome.

Response (4-6 sentences): _________________________

Estimating Fixed Effects Models (DiD) for proposals

\[ \text{Y}_{it} = \alpha_0 + \beta \cdot (\text{proposed_turbine}_{it}) + \gamma_i + \delta_t + \varepsilon_{it} \]

\(Y_{it}\) is the vote share for the Liberal Party in precinct i in time t
\(\beta\) is the treatment effect of a turbine being proposed within a precinct
\(\gamma_i\) is the precinct fixed effect
\(\delta_t\) is the year fixed effect

Example 1: Randomly sample 40 precincts

To illustrate the “dummy variable method” of estimating fixed effects using the the general lm() function we are going to randomly sample 40 precincts (20 “treated” precincts with proposed turbines).
If we attempted to use this approach with the full sample estimating all 6185 (n-1) precinct-level coefficients is impractical (it would take a long time).

set.seed(40002026)

precinct_frame <- panel_data %>%
  group_by(precinct_id) %>%
  summarise(
    proposed_turbine_any = as.integer(any(proposed_turbine == 1, na.rm = TRUE)),
    .groups = "drop"
  )

ids_40 <- precinct_frame %>%
  group_by(proposed_turbine_any) %>%
  slice_sample(n = 20) %>%
  ungroup() %>%
  select(precinct_id)

sample_40_precincts <- panel_data %>%
  semi_join(ids_40, by = "precinct_id")

3C: Estimate a fixed effects model using `lm()` with fixed effects added for `precinct` and `year` using the sample of 40 precincts just created.

model1_ff <- 

# summ(*** , model.fit = FALSE)

# summ(*** , model.fit = FALSE, [. . .])

3C.Q1: Intuition check: Is the signal-to-noise ratio for the treatment estimate greater than 2-to-1?

Response: _________________________

HINT: Add the argument digits = 3 to the summ() function above

3C.Q2: Re-run the summ() function using the heteroscedasticiy robust standard error adjustment (robust = TRUE). Did the standard error (S.E.) estimates change? Explain why.

Response: _________________________

3C.Q3: Compare results of the model above to the findings from the fixed effects analysis in the Stokes (2015) study. Why might the results be similar or different?

Response: _________________________

3C.Q4: In your own words, explain why it is advantageous from a causal inference perspective to include year and precinct fixed effects. Explain how between-level and within-level variance is relevant to the problem of omitted variable bias (OVB).

Response (2-4 sentences): _________________________

3D. Now using the full sample, estimate the treatment effect of wind turbine proposals on incumbent vote share. Use `feols()` from the `{fixest}` package to estimate the fixed effects.

See vignette here: fixest walkthrough

model2_ff <- 

summary()

3D.Q1: Interpret the model results and translate findings to be clear to an audience that may not have a background in causal inference (Econometrics) methods.

In panel data settings, why is clustering by precinct important (i.e., cluster = ~precinct_id) ?”

Response (4-6 sentences): _________________________

3E. Estimate the treatment effect of operational wind turbines on incumbent vote share. Use the same approach as the previous model.

model3_ff <- 

summary()

3E.Q1: Interpret the model3_ff results as clearly and concisely as you can.

Response: _________________________

3E.Q2: Why do you think the effect of proposed wind turbines is different from operational wind turbines. Develop your own theory about why incumbent vote share is affected in this way. Use the Stokes (2015) study to inform your response as needed.

Response: _________________________

Assignment instructions

Introduction

Learning goal: Replicate the matching and fixed effects analyses from study:

Setup: Load libraries

Part 1: Study Background

1A. Dive into the details of the study design and evaluation plan

1B. Break down the causal inference strategy and identify threats to identification:

Part 2: Matching

2A. Load data for matching

2B. Check imbalance (before matching)

Conduct matching estimation using the {MatchIt} package:

2C. Mahalanobis nearest-neighbor matching

2D. Create a “love plot” using love.plot() ❤️

Propensity score matching

2E. Propensity Score Matching (PSM)

Create table displaying covariate balance using cobalt::bal.tab()

2F. Estimate an effect in the matched sample

Part 3: Panel Data, Fixed Effects, and Difference-in-Difference

3A: Read in the panel data + code variables precinct_id and year as factors

3B. Plot and evaluate parallel trends: Replicate Figure.2 (Stokes, 2015)

Estimating Fixed Effects Models (DiD) for proposals

Example 1: Randomly sample 40 precincts

3C: Estimate a fixed effects model using lm() with fixed effects added for precinct and year using the sample of 40 precincts just created.

3D. Now using the full sample, estimate the treatment effect of wind turbine proposals on incumbent vote share. Use feols() from the {fixest} package to estimate the fixed effects.

3E. Estimate the treatment effect of operational wind turbines on incumbent vote share. Use the same approach as the previous model.

Conduct matching estimation using the {`MatchIt`} package:

2D. Create a “love plot” using `love.plot()` ❤️

Create table displaying covariate balance using `cobalt::bal.tab()`

3A: Read in the panel data + code variables `precinct_id` and `year` as factors

3B. Plot and evaluate parallel trends: Replicate `Figure.2` (Stokes, 2015)

3C: Estimate a fixed effects model using `lm()` with fixed effects added for `precinct` and `year` using the sample of 40 precincts just created.

3D. Now using the full sample, estimate the treatment effect of wind turbine proposals on incumbent vote share. Use `feols()` from the `{fixest}` package to estimate the fixed effects.