Author
Affiliation

UCL Social Research Institute

Required packages

Code
pkgs <- c("plm", "feisr", "sandwich", "texreg", "tidyr", "haven", "dplyr", "ggplot2", "ggforce") 
lapply(pkgs, require, character.only = TRUE)

Session info

Code
sessionInfo()
R version 4.3.1 (2023-06-16 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19044)

Matrix products: default


locale:
[1] LC_COLLATE=English_United Kingdom.utf8 
[2] LC_CTYPE=English_United Kingdom.utf8   
[3] LC_MONETARY=English_United Kingdom.utf8
[4] LC_NUMERIC=C                           
[5] LC_TIME=English_United Kingdom.utf8    

time zone: Europe/London
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] ggforce_0.4.1  ggplot2_3.4.2  dplyr_1.1.2    haven_2.5.3    tidyr_1.3.0   
[6] texreg_1.38.6  sandwich_3.0-2 feisr_1.3.0    plm_2.6-3     

loaded via a namespace (and not attached):
 [1] utf8_1.2.3          generics_0.1.3      dreamerr_1.2.3     
 [4] lattice_0.21-8      hms_1.1.3           digest_0.6.32      
 [7] magrittr_2.0.3      evaluate_0.21       grid_4.3.1         
[10] fastmap_1.1.1       jsonlite_1.8.5      Matrix_1.5-4.1     
[13] Formula_1.2-5       httr_1.4.6          purrr_1.0.1        
[16] fansi_1.0.4         scales_1.2.1        tweenr_2.0.2       
[19] numDeriv_2016.8-1.1 lfe_2.9-0           Rdpack_2.4         
[22] cli_3.6.1           rlang_1.1.1         polyclip_1.10-4    
[25] rbibutils_2.2.13    miscTools_0.6-28    munsell_0.5.0      
[28] withr_2.5.0         yaml_2.3.7          fixest_0.11.1      
[31] tools_4.3.1         parallel_4.3.1      bdsmatrix_1.3-6    
[34] colorspace_2.1-0    forcats_1.0.0       maxLik_1.5-2       
[37] vctrs_0.6.3         R6_2.5.1            zoo_1.8-12         
[40] lifecycle_1.0.3     htmlwidgets_1.6.2   MASS_7.3-60        
[43] pkgconfig_2.0.3     gtable_0.3.3        pillar_1.9.0       
[46] glue_1.6.2          Rcpp_1.0.10         collapse_1.9.6     
[49] xfun_0.39           tibble_3.2.1        lmtest_0.9-40      
[52] tidyselect_1.2.0    rstudioapi_0.14     knitr_1.43         
[55] farver_2.1.1        xtable_1.8-4        htmltools_0.5.5    
[58] nlme_3.1-162        rmarkdown_2.23      compiler_4.3.1     

Load data

For the purpose of this exercise, we will use a real-world data set. Instead of constructing our own data, we use a shortcut and use data from the replication package of Hospido (2012). The replication package can be found here.

This is an unbalanced panel with 32,066 observations and 2066 individuals for the period 1968–1993 of the PSID. It consists of male household heads aged 25–55 with at least 9 years of usable wages data.

Code
# Load stata file
data.df <- read_dta("_data/h-data.dta")

# Lets order this
names <- names(data.df)
names <- c("pid", "year", names[-which(names %in% c("pid", "year"))])
data.df <- data.df[, names]

data.df <- data.df[order(data.df$pid, data.df$year), ]
variable name description
pid INDIVIDUAL IDENTIFIER
year YEAR OF INTERVIEW
age AGE OF INDIVIDUAL
white WHITE DUMMY
dropout DROPOUT DUMMY
grad GRADUATE DUMMY
college COLLEGE DUMMY
married MARRIED DUMMY
child NUMBER OF CHILDREN
fsize FAMILY SIZE
hours YEARLY HOURS OF WORK
logwages LOG OF REAL ANNUAL WAGES
changejob JOB CHANGE DUMMY
ten1 TENURE DUMMY less than a year
ten2 TENURE DUMMY a year
ten3 TENURE DUMMY 2-3 years
ten4 TENURE DUMMY 4 through 9 years
ten5 TENURE DUMMY 10 through 19 years
ten6 TENURE DUMMY 20 years or more
profes PROFESSIONAL, TECHNICAL, AND KINDRED WORKERS DUMMY
admin MANAGERS AND ADMINISTRATORS DUMMY
sales CLERICAL AND SALES WORKERS DUMMY
crafts CRAFTSMAN AND KINDRED WORKERS DUMMY
operat OPERATIVES WORKERS DUMMY
servic LABORERS AND SERVICES WORKERS DUMMY
smsa SMSA (Standard Metropolitan Statistical Area) DUMMY
neast NORTH-EAST DUMMY
ncentr NORTH-CENTRAL DUMMY
south SOUTH DUMMY
west WEST DUMMY

Exercise 1

Download and load the data.

Have a look at the data.

  • How many observations do we have in 1968? How many in 1990?

  • What is the average age in 1968? What was it in 1984? And how is this possible?

  • At which age did individual with ID “5790002” become father?

  • Please calculate the average age for each person.

  • Please calculate the lagged age (the age in the previous period)

Exercise 2

Just to play a little bit around with the data, let us estimate some models.

  • What is the correlation between age and wage? Please use different estimators to determine different types of correlations: Pooled, Between, FE, RE, CRE.

Exercise 3

Can we use this dataset to replicate our earlier analysis on the marital wage premium? What might be a problem here? (tip: have a look a the marriage variable).

However, we do something similar: We want to investigate if there is a fatherhood wage premium? In other words, do men experience an increase in wages when they become fathers?

* Restrict the age at the start (first wage) to people aged 25-35



* Use number of children to construct a binary indicator of wether there is a child in the household or not



* Make sure we start only with men who are not yet fathers in the first period.

Note: this feels like dropping a lot of information! However, it makes sense if we want to correctly identify the effect of interest.

* Do we need to drop observations where people go from child to no child?



* Calculate the effect of having a child on the wage of men (including controls if reasonable). 



* Calculate effects for POLS, RE, and FE (if you have some extra time, also FEIS). (#Hint: feis needs a class `data.frame` as input data)



* Compare using cluster robust standard errors (and screenreg).



* Interpret the results 
  • Try to perform a placebo test: what happens if you use the “lead” of becoming father. Why is this an interesting test?

Exercise 4

Can we use one of the new event-study approaches, such as the Callaway and SantAnna estimator?

* Preprocess data (treatment group and timing indicator)

* Estimate the model using `att_gt`

* Show the group-time specific estimates (use `method=ipw` and only restricted set of controls)

* Interpret the results 

References

Hospido, L. 2012. “Modelling Heterogeneity and Dynamics in the Volatility of Individual Wages.” Journal of Applied Econometrics 27 (3): 386–414. https://doi.org/10.1002/jae.1204.