Exercises

Author

Affiliation

UCL Social Research Institute

Required packages

Code

pkgs <- c("plm", "feisr", "sandwich", "texreg", "tidyr", "haven", "dplyr", "ggplot2", "ggforce") 
lapply(pkgs, require, character.only = TRUE)

Session info

Code

sessionInfo()

R version 4.3.1 (2023-06-16 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19044)

Matrix products: default


locale:
[1] LC_COLLATE=English_United Kingdom.utf8 
[2] LC_CTYPE=English_United Kingdom.utf8   
[3] LC_MONETARY=English_United Kingdom.utf8
[4] LC_NUMERIC=C                           
[5] LC_TIME=English_United Kingdom.utf8    

time zone: Europe/London
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] ggforce_0.4.1  ggplot2_3.4.2  dplyr_1.1.2    haven_2.5.3    tidyr_1.3.0   
[6] texreg_1.38.6  sandwich_3.0-2 feisr_1.3.0    plm_2.6-3     

loaded via a namespace (and not attached):
 [1] utf8_1.2.3          generics_0.1.3      dreamerr_1.2.3     
 [4] lattice_0.21-8      hms_1.1.3           digest_0.6.32      
 [7] magrittr_2.0.3      evaluate_0.21       grid_4.3.1         
[10] fastmap_1.1.1       jsonlite_1.8.5      Matrix_1.5-4.1     
[13] Formula_1.2-5       httr_1.4.6          purrr_1.0.1        
[16] fansi_1.0.4         scales_1.2.1        tweenr_2.0.2       
[19] numDeriv_2016.8-1.1 lfe_2.9-0           Rdpack_2.4         
[22] cli_3.6.1           rlang_1.1.1         polyclip_1.10-4    
[25] rbibutils_2.2.13    miscTools_0.6-28    munsell_0.5.0      
[28] withr_2.5.0         yaml_2.3.7          fixest_0.11.1      
[31] tools_4.3.1         parallel_4.3.1      bdsmatrix_1.3-6    
[34] colorspace_2.1-0    forcats_1.0.0       maxLik_1.5-2       
[37] vctrs_0.6.3         R6_2.5.1            zoo_1.8-12         
[40] lifecycle_1.0.3     htmlwidgets_1.6.2   MASS_7.3-60        
[43] pkgconfig_2.0.3     gtable_0.3.3        pillar_1.9.0       
[46] glue_1.6.2          Rcpp_1.0.10         collapse_1.9.6     
[49] xfun_0.39           tibble_3.2.1        lmtest_0.9-40      
[52] tidyselect_1.2.0    rstudioapi_0.14     knitr_1.43         
[55] farver_2.1.1        xtable_1.8-4        htmltools_0.5.5    
[58] nlme_3.1-162        rmarkdown_2.23      compiler_4.3.1

Load data

For the purpose of this exercise, we will use a real-world data set. Instead of constructing our own data, we use a shortcut and use data from the replication package of Hospido (2012). The replication package can be found here.

This is an unbalanced panel with 32,066 observations and 2066 individuals for the period 1968–1993 of the PSID. It consists of male household heads aged 25–55 with at least 9 years of usable wages data.

Code

# Load stata file
data.df <- read_dta("_data/h-data.dta")

# Lets order this
names <- names(data.df)
names <- c("pid", "year", names[-which(names %in% c("pid", "year"))])
data.df <- data.df[, names]

data.df <- data.df[order(data.df$pid, data.df$year), ]

variable name	description
pid	INDIVIDUAL IDENTIFIER
year	YEAR OF INTERVIEW
age	AGE OF INDIVIDUAL
white	WHITE DUMMY
dropout	DROPOUT DUMMY
grad	GRADUATE DUMMY
college	COLLEGE DUMMY
married	MARRIED DUMMY
child	NUMBER OF CHILDREN
fsize	FAMILY SIZE
hours	YEARLY HOURS OF WORK
logwages	LOG OF REAL ANNUAL WAGES
changejob	JOB CHANGE DUMMY
ten1	TENURE DUMMY less than a year
ten2	TENURE DUMMY a year
ten3	TENURE DUMMY 2-3 years
ten4	TENURE DUMMY 4 through 9 years
ten5	TENURE DUMMY 10 through 19 years
ten6	TENURE DUMMY 20 years or more
profes	PROFESSIONAL, TECHNICAL, AND KINDRED WORKERS DUMMY
admin	MANAGERS AND ADMINISTRATORS DUMMY
sales	CLERICAL AND SALES WORKERS DUMMY
crafts	CRAFTSMAN AND KINDRED WORKERS DUMMY
operat	OPERATIVES WORKERS DUMMY
servic	LABORERS AND SERVICES WORKERS DUMMY
smsa	SMSA (Standard Metropolitan Statistical Area) DUMMY
neast	NORTH-EAST DUMMY
ncentr	NORTH-CENTRAL DUMMY
south	SOUTH DUMMY
west	WEST DUMMY

Exercise 1

Download and load the data.

Have a look at the data.

How many observations do we have in 1968? How many in 1990?
What is the average age in 1968? What was it in 1984? And how is this possible?
At which age did individual with ID “5790002” become father?
Please calculate the average age for each person.
Please calculate the lagged age (the age in the previous period)

Exercise 2

Just to play a little bit around with the data, let us estimate some models.

What is the correlation between age and wage? Please use different estimators to determine different types of correlations: Pooled, Between, FE, RE, CRE.

Exercise 3

Can we use this dataset to replicate our earlier analysis on the marital wage premium? What might be a problem here? (tip: have a look a the marriage variable).

However, we do something similar: We want to investigate if there is a fatherhood wage premium? In other words, do men experience an increase in wages when they become fathers?

* Restrict the age at the start (first wage) to people aged 25-35



* Use number of children to construct a binary indicator of wether there is a child in the household or not



* Make sure we start only with men who are not yet fathers in the first period.

Note: this feels like dropping a lot of information! However, it makes sense if we want to correctly identify the effect of interest.

* Do we need to drop observations where people go from child to no child?



* Calculate the effect of having a child on the wage of men (including controls if reasonable). 



* Calculate effects for POLS, RE, and FE (if you have some extra time, also FEIS). (#Hint: feis needs a class `data.frame` as input data)



* Compare using cluster robust standard errors (and screenreg).



* Interpret the results

Try to perform a placebo test: what happens if you use the “lead” of becoming father. Why is this an interesting test?

Exercise 4

Can we use one of the new event-study approaches, such as the Callaway and SantAnna estimator?

* Preprocess data (treatment group and timing indicator)

* Estimate the model using `att_gt`

* Show the group-time specific estimates (use `method=ipw` and only restricted set of controls)

* Interpret the results

References

Hospido, L. 2012. “Modelling Heterogeneity and Dynamics in the Volatility of Individual Wages.” Journal of Applied Econometrics 27 (3): 386–414. https://doi.org/10.1002/jae.1204.