🌬️🍃 Assignment 3: Instrumental Variable Estimation

Assignment instructions:

Working with classmates to troubleshoot code and concepts is encouraged. If you collaborate, list collaborators at the top of your submission.

All written responses must be written independently (in your own words).

Keep your work readable: Use clear headings and label plot elements thoughtfully (where applicable).

Submit both your rendered output and the Quarto (.qmd) file.

Assignment submission (YOUR NAME): ______________________________________

Introduction

In this assignment, you will replicate the instrumental variable analysis from Stokes (2015), which examined how local wind turbine projects influenced electoral outcomes. Building on the matched dataset from Assignment 2, we will use Two-Stage Least Squares (2SLS) to estimate the causal effect of having a wind turbine proposed nearby on the change in Liberal Party vote share between 2007 and 2011. The instrument used in Stokes (2015) is a measure of local wind resource (average wind power, logged), which predicts where wind turbines are proposed. By using this instrument, we aim to isolate the portion of variation in turbine placement that is as-good-as-random, helping to meet the assumptions for causal identification.

Study: Stokes, 2015 – Article

Data source: Dataverse – Stokes, 2015 replication data

Note: The estimates you obtain may not exactly match the published results in Stokes (2015) due to the alternative matching procedure used for processing the data in the previous assignment. Estimates should approximate the findings reported in Table 2 of the article.

Load packages

library(tidyverse)
library(janitor)
library(here)
library(jtools)   # for export_summs (pretty regression tables)
library(AER)      # for ivreg (2SLS estimation)

Load the matched dataset (from Assignment 2)

The matched_data has been preprocessed by matching on key covariates (e.g. pretreatment home values, education, income, population density) to improve balance between treated and control precincts. We will now use this data for the IV analysis. Make sure to re-code the precinct_id variable as a factor.

matched_data <-

Part 1: IV Identification Rationale

Intuition for Using an Instrument:

Question 1: After matching on observables, why might we still need to utilize an instrumental variable approach to identify the causal effect of turbine proposals on vote share? In other words, what potential issues remain that an IV method can help address in this context? Use specific examples from the study to illustrate threats to a causal interpretation, then explain how an IV approach is designed to mitigate those threats.

Response: _________________________

Part 2: Two-Stage Least Squares (2SLS) Step-Wise Implementation

2A. First-Stage Estimation: Regress the treatment (\(D\)) on the instrument (\(Z\))

\[D_i = \alpha_0 + \alpha_1 Z_i \]

Estimate the first-stage regression of the treatment on the instrument (with controls). Regress proposed_turbine_3km on log_wind_power.
Include the control variables used in Stokes (2015) for both stages: Distance to lakes, geographic coordinates (latitude & longitude) with their squares and interaction, plus district fixed effects.
After running the first stage, report the F-statistic for the instrument.

first_stage <- lm()

export_summs(first_stage, digits = 3,
             model.names = c("First stage: Prpoposed Turbine 3km"),
             coefs = c("(Intercept)", "log_wind_power"))

Testing Instrument Relevance

Check instrument strength (F-statistic)

Question 2A: Based on the instrument relevance test reported in the study, would you conclude the instrument is strong enough to be credible? Explain what a weak instrument would mean in this setting: Specifically, what would it suggest about compliance with Ontario’s Green Energy Act policy?

Response: _________________________

2B. Second Stage Estimation

Regress the outcome (\(Y\)) on the fitted values from the 1st stage (\(\hat{X}_i\))

\[Y_i = \beta_0 + \beta_1 \hat{D}_i + \epsilon_i\]

Now estimate the second stage of the 2SLS.
First, use the first-stage model to generate the predicted values of proposed_turbine_3km for each precinct (these are \(\hat{D}_i\)).
Add these predicted values as a new column in matched_data (e.g. proposed_turbine_3km_HAT).
Then, regress the outcome change_liberal (the change in Liberal vote share from 2007 to 2011) on the predicted treatment (proposed_turbine_3km_HAT), including the same controls and fixed effects as in the first stage.
Fill in the code for these steps below to obtain the second-stage regression results.

Save predicted values \(\hat{X}_i\) from first stage

matched_data$proposed_turbine_3km_HAT <-

Estimate the second-stage regression

\[LiberalVoteShare_i = \beta_0 + \beta_1 \widehat{ProposedTurbine}_i + ControlVariables... + \epsilon_i\]

second_stage <- lm()

export_summs(second_stage, digits = 3, 
             model.names = c("Second stage: Change in Liberal Vote Share"),
             coefs = c("(Intercept)", "proposed_turbine_3km_HAT") )

Interpreting the 2SLS Estimate

Question 2B: Imagine you are explaining your 2SLS findings to a policymaker in Ontario. What does the estimated coefficient on proposed_turbine_3km_HAT imply about the electoral impact of a local wind turbine proposal (within 3 km) on liberal vote share?

Question 2C: Explain what it means that IV identifies a LATE in the context of the wind-turbine voting study. What specific subset of observations does the second-stage 2SLS estimate apply to, and what does it imply about interpretation and generalizability?

Response: _________________________

Part 3: IV Assumptions and Validity

Evaluate Instrument Validity

Question 3: List the four key assumptions required for the IV strategy (2SLS) to identify a causal effect, and briefly explain what each one means in the context of this study. (Hint: think about what conditions a valid instrument must satisfy (relevance, exclusion,…)

Response: _________________________

Part 4: Estimate 2SLS using `AER::ivreg()`

📜 SEE Documentation for specification details: AER package Viggnette Example

Tip

Syntax for specifying 2SLS using ivreg():

ivreg( Y ~ D + CONTROLS | Z + CONTROLS , data )

The first-stage predictor variables go after the ~ symbol
The second-stage predictor variables go after the | symbol

fit_2sls <- ivreg()

export_summs(fit_2sls, digits = 3,
             model.names = c("Change in Liberal Vote Share"),
             #coefs = c("(Intercept)", "proposed_turbine_3km")
             )

Robustness checking strategies utilized in Stokes, 2015

Question 4: Choose two robustness checks from the paper that the authors use to increase confidence in their causal identification strategy. For each one, summarize the logic and findings from the robustness check in your own words:

Response: _________________________

Assignment instructions:

Introduction

Load packages

Load the matched dataset (from Assignment 2)

Part 1: IV Identification Rationale

Part 2: Two-Stage Least Squares (2SLS) Step-Wise Implementation

2A. First-Stage Estimation: Regress the treatment (\(D\)) on the instrument (\(Z\))

Testing Instrument Relevance

2B. Second Stage Estimation

Regress the outcome (\(Y\)) on the fitted values from the 1st stage (\(\hat{X}_i\))

Estimate the second-stage regression

Interpreting the 2SLS Estimate

Part 3: IV Assumptions and Validity

Evaluate Instrument Validity

Part 4: Estimate 2SLS using AER::ivreg()

Robustness checking strategies utilized in Stokes, 2015

END

Part 4: Estimate 2SLS using `AER::ivreg()`