Panel Data Introduction

Tobias RÜttenauer

This course

Outline

  1. Introduction Panel Data
  2. Variance Components
  3. Estimators: FE & RE & Diff-in-Diff
  4. Dynamic Diff-in-Diff
  5. Fixed Effects Individual Slopes

Me

Tobias Rüttenauer

t.ruttenauer@ucl.ac.uk

Lecturer in Quantitative Social Science at UCL

Environmental Sociology

Further Materials

This material in Handbook format: Rüttenauer and Kapelle (2024)

Slides by Josef Brüderl and Volker Ludwig. See also Brüderl and Ludwig (2015).

Books:

  • Intuitive: Allison (2009)

  • Comprehensive and formal: Wooldridge (2010)

  • For R experts: Croissant and Millo (2019)

  • General introductions to causal estimation techniques: Angrist and Pischke (2015), Cunningham (2021), Firebaugh (2008), Huntington-Klein (2021)

The books by Cunningham (2021) (Link) and Huntington-Klein (2021) (Link) are freely available online!

Why panel data analysis?

In empirical social sciences, we are often interested in the causal research questions: we want to investigate questions of cause and effect.

However, randomized controlled trials (RCT) are often infeasible (e.g. effects of education, marriage, pregnancy).

A potential middle ground between those two approaches: “compare alike with alike”.

With panel data, we observe the same unit (person, region, or country) repeatedly over time. We can then not only compare two different units to each other. We can also compare a unit in an earlier stage to the same unit in a later stage.

Panel Data Structure

Usually cross-sectional data is organized as a matrix, where rows represent the observation / individual and the columns hold the variables. In panel data settings, we need to add the dimension of time. There are two ways to do so:

  • Long format: \(N \times T\) observations (rows), with variables “id” and “time”.

  • Wide format: \(N\) observations, and \(T \times K\) variables, which one variable for each time-period.

Panel Data Structure

Let’s have a look at the “Males” data of the plm package.

library("plm")
data("Males")
head(Males[,1:5], n = 16)
   nr year school exper union
1  13 1980     14     1    no
2  13 1981     14     2   yes
3  13 1982     14     3    no
4  13 1983     14     4    no
5  13 1984     14     5    no
6  13 1985     14     6    no
7  13 1986     14     7    no
8  13 1987     14     8    no
9  17 1980     13     4    no
10 17 1981     13     5    no
11 17 1982     13     6    no
12 17 1983     13     7    no
13 17 1984     13     8    no
14 17 1985     13     9    no
15 17 1986     13    10    no
16 17 1987     13    11    no

Panel Data Structure

Moreover, there two types of panel data:

  • Balanced: Contains information for each unit at each time period

  • Unbalanced: Some units have missing information at some time periods

is.pbalanced(Males, index = c("nr", "year"))
[1] TRUE

Panel Attrition

SOEP Documentation

Panel surveys

Some examples

The Comparative Panel File provides an infrastructure of data harmonisation across various panels. See Turek, Kalmijn, and Leopold (2021).

Panel transformation

A nice feature of panel data is that we can do some within-person transformation. For instance we can calculate the lags and leads, or the first differences of data.

Note

always make sure the data is sorted properly before you do!

Panel transformation

Person-specific summary values

# Order data
Males <- Males[order(Males$nr, Males$year),]

# Person specific means
Males$m_wage <- ave(Males$wage,
                    Males$nr,
                    FUN = function(x) mean(x, na.rm = TRUE))

Panel transformation

Lag and first difference

# Order data
Males <- Males[order(Males$nr, Males$year),]

# Lag (last years value)
Males$lag_wage <- ave(Males$wage,
                      Males$nr,
                      FUN = function(x) dplyr::lag(x, n = 1))

# First difference (this years value minus last years value)
Males$fd_wage <- ave(Males$wage,
                      Males$nr,
                      FUN = function(x) x - dplyr::lag(x, n = 1))

Panel transformation

head(Males[, c("nr", "year", "wage", "m_wage", "lag_wage", "fd_wage")], n = 16)
   nr year       wage   m_wage   lag_wage     fd_wage
1  13 1980  1.1975402 1.255652         NA          NA
2  13 1981  1.8530600 1.255652  1.1975402  0.65551979
3  13 1982  1.3444617 1.255652  1.8530600 -0.50859832
4  13 1983  1.4332133 1.255652  1.3444617  0.08875166
5  13 1984  1.5681251 1.255652  1.4332133  0.13491174
6  13 1985  1.6998909 1.255652  1.5681251  0.13176586
7  13 1986 -0.7202626 1.255652  1.6998909 -2.42015352
8  13 1987  1.6691879 1.255652 -0.7202626  2.38945049
9  17 1980  1.6759624 1.637786         NA          NA
10 17 1981  1.5183982 1.637786  1.6759624 -0.15756420
11 17 1982  1.5591905 1.637786  1.5183982  0.04079228
12 17 1983  1.7254101 1.637786  1.5591905  0.16621961
13 17 1984  1.6220223 1.637786  1.7254101 -0.10338777
14 17 1985  1.6085883 1.637786  1.6220223 -0.01343405
15 17 1986  1.5723854 1.637786  1.6085883 -0.03620286
16 17 1987  1.8203339 1.637786  1.5723854  0.24794844

References

Allison, Paul David. 2009. Fixed Effects Regression Models. Vol. 160. Quantitative Applications in the Social Sciences. Los Angeles: Sage.
Angrist, Joshua David, and Jörn-Steffen Pischke. 2015. Mastering ’Metrics: The Path from Cause to Effect. Princeton: Princeton Univ. Press.
Brüderl, Josef, and Volker Ludwig. 2015. “Fixed-Effects Panel Regression.” In The Sage Handbook of Regression Analysis and Causal Inference, edited by Henning Best and Christof Wolf, 327–57. Los Angeles: Sage.
Croissant, Yves, and Giovanni Millo. 2019. Panel Data Econometrics with R. Hoboken, NJ: John Wiley and Sons.
Cunningham, Scott. 2021. Causal Inference: The Mixtape. New Haven and London: Yale University Press.
Firebaugh, Glenn. 2008. Seven Rules for Social Research. Princeton, N.J. and Woodstock: Princeton University Press.
Huntington-Klein, Nick. 2021. The Effect: An Introduction to Research Design and Causality. Boca Raton: Chapman & Hall/CRC.
Rüttenauer, Tobias, and Nicole Kapelle. 2024. “Panel Data Analysis.” https://doi.org/10.31235/osf.io/3mfzq.
Turek, Konrad, Matthijs Kalmijn, and Thomas Leopold. 2021. “The Comparative Panel File: Harmonized Household Panel Surveys from Seven Countries.” European Sociological Review 37 (3): 505–23. https://doi.org/10.1093/esr/jcab006.
Wooldridge, Jeffrey M. 2010. Econometric Analysis of Cross Section and Panel Data. Cambridge, Mass.: MIT Press.