library(tidyverse)
library(janitor)
library(here)
library(jtools) # for export_summs (pretty regression tables)
library(AER) # for ivreg (2SLS estimation)Assignment instructions:
Working with classmates to troubleshoot code and concepts is encouraged. If you collaborate, list collaborators at the top of your submission.
All written responses must be written independently (in your own words).
Keep your work readable: Use clear headings and label plot elements thoughtfully (where applicable).
Submit both your rendered output and the Quarto (.qmd) file.
Assignment submission (YOUR NAME): ______________________________________
Introduction
In this assignment, you will replicate the instrumental variable analysis from Stokes (2015), which examined how local wind turbine projects influenced electoral outcomes. Building on the matched dataset from Assignment 2, we will use Two-Stage Least Squares (2SLS) to estimate the causal effect of having a wind turbine proposed nearby on the change in Liberal Party vote share between 2007 and 2011. The instrument used in Stokes (2015) is a measure of local wind resource (average wind power, logged), which predicts where wind turbines are proposed. By using this instrument, we aim to isolate the portion of variation in turbine placement that is as-good-as-random, helping to meet the assumptions for causal identification.
Study: Stokes, 2015 β Article
Data source: Dataverse β Stokes, 2015 replication data
Load packages
Load the matched dataset (from Assignment 2)
The matched_data has been preprocessed by matching on key covariates (e.g. pretreatment home values, education, income, population density) to improve balance between treated and control precincts. We will now use this data for the IV analysis. Make sure to re-code the
precinct_idvariable as afactor.
matched_data <- Part 1: IV Identification Rationale
Intuition for Using an Instrument:
Question 1: After matching on observables, why might we still need to utilize an instrumental variable approach to identify the causal effect of turbine proposals on vote share? In other words, what potential issues remain that an IV method can help address in this context? Use specific examples from the study to illustrate threats to a causal interpretation, then explain how an IV approach is designed to mitigate those threats.
Response: _________________________
Part 2: Two-Stage Least Squares (2SLS) Step-Wise Implementation
2A. First-Stage Estimation: Regress the treatment (\(D\)) on the instrument (\(Z\))
\[D_i = \alpha_0 + \alpha_1 Z_i \]
- Estimate the first-stage regression of the treatment on the instrument (with controls). Regress
proposed_turbine_3kmonlog_wind_power. - Include the control variables used in Stokes (2015) for both stages: Distance to lakes, geographic coordinates (latitude & longitude) with their squares and interaction, plus district fixed effects.
- After running the first stage, report the F-statistic for the instrument.
first_stage <- lm()
export_summs(first_stage, digits = 3,
model.names = c("First stage: Prpoposed Turbine 3km"),
coefs = c("(Intercept)", "log_wind_power")) Testing Instrument Relevance
Check instrument strength (F-statistic)
Question 2A: Based on the instrument relevance test reported in the study, would you conclude the instrument is strong enough to be credible? Explain what a weak instrument would mean in this setting: Specifically, what would it suggest about compliance with Ontarioβs Green Energy Act policy?
Response: _________________________
2B. Second Stage Estimation
Regress the outcome (\(Y\)) on the fitted values from the 1st stage (\(\hat{X}_i\))
\[Y_i = \beta_0 + \beta_1 \hat{D}_i + \epsilon_i\]
- Now estimate the second stage of the 2SLS.
- First, use the first-stage model to generate the predicted values of proposed_turbine_3km for each precinct (these are \(\hat{D}_i\)).
- Add these predicted values as a new column in matched_data (e.g. proposed_turbine_3km_HAT).
- Then, regress the outcome change_liberal (the change in Liberal vote share from 2007 to 2011) on the predicted treatment (proposed_turbine_3km_HAT), including the same controls and fixed effects as in the first stage.
- Fill in the code for these steps below to obtain the second-stage regression results.
Save predicted values \(\hat{X}_i\) from first stage
matched_data$proposed_turbine_3km_HAT <- Estimate the second-stage regression
\[LiberalVoteShare_i = \beta_0 + \beta_1 \widehat{ProposedTurbine}_i + ControlVariables... + \epsilon_i\]
second_stage <- lm()
export_summs(second_stage, digits = 3,
model.names = c("Second stage: Change in Liberal Vote Share"),
coefs = c("(Intercept)", "proposed_turbine_3km_HAT") ) Interpreting the 2SLS Estimate
Question 2B: Imagine you are explaining your 2SLS findings to a policymaker in Ontario. What does the estimated coefficient on proposed_turbine_3km_HAT imply about the electoral impact of a local wind turbine proposal (within 3 km) on liberal vote share?
Question 2C: Explain what it means that IV identifies a LATE in the context of the wind-turbine voting study. What specific subset of observations does the second-stage 2SLS estimate apply to, and what does it imply about interpretation and generalizability?
Response: _________________________
Part 3: IV Assumptions and Validity
Evaluate Instrument Validity
Question 3: List the four key assumptions required for the IV strategy (2SLS) to identify a causal effect, and briefly explain what each one means in the context of this study. (Hint: think about what conditions a valid instrument must satisfy (relevance, exclusion,β¦)
Response: _________________________
Part 4: Estimate 2SLS using AER::ivreg()
π SEE Documentation for specification details: AER package Viggnette Example
Syntax for specifying 2SLS using ivreg():
ivreg( Y ~ D + CONTROLS | Z + CONTROLS , data )
- The first-stage predictor variables go after the
~symbol - The second-stage predictor variables go after the
|symbol
fit_2sls <- ivreg()
export_summs(fit_2sls, digits = 3,
model.names = c("Change in Liberal Vote Share"),
#coefs = c("(Intercept)", "proposed_turbine_3km")
) Robustness checking strategies utilized in Stokes, 2015
Question 4: Choose two robustness checks from the paper that the authors use to increase confidence in their causal identification strategy. For each one, summarize the logic and findings from the robustness check in your own words:
Response: _________________________