4  Spatial Regression Models

\[ \newcommand{\Exp}{\mathrm{E}} \newcommand\given[1][]{\:#1\vert\:} \newcommand{\Cov}{\mathrm{Cov}} \newcommand{\Var}{\mathrm{Var}} \newcommand{\rank}{\mathrm{rank}} \newcommand{\bm}[1]{\boldsymbol{\mathbf{#1}}} \newcommand{\tr}{\mathrm{tr}} \newcommand{\plim}{\operatornamewithlimits{plim}} \newcommand{\diag}{\mathrm{diag}} \]

Required packages

pkgs <- c("sf", "mapview", "spdep", "spatialreg", "tmap", "viridisLite") # note: load spdep first, then spatialreg
lapply(pkgs, require, character.only = TRUE)

Session info

R version 4.3.1 (2023-06-16 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19044)

Matrix products: default


locale:
[1] LC_COLLATE=English_United Kingdom.utf8 
[2] LC_CTYPE=English_United Kingdom.utf8   
[3] LC_MONETARY=English_United Kingdom.utf8
[4] LC_NUMERIC=C                           
[5] LC_TIME=English_United Kingdom.utf8    

time zone: Europe/Berlin
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods  
[7] base     

other attached packages:
[1] viridisLite_0.4.2 tmap_3.3-3        spatialreg_1.2-9 
[4] Matrix_1.5-4.1    spdep_1.2-8       spData_2.3.0     
[7] mapview_2.11.0    sf_1.0-13        

loaded via a namespace (and not attached):
 [1] xfun_0.39          raster_3.6-23      htmlwidgets_1.6.2 
 [4] lattice_0.21-8     vctrs_0.6.3        tools_4.3.1       
 [7] crosstalk_1.2.0    LearnBayes_2.15.1  generics_0.1.3    
[10] parallel_4.3.1     sandwich_3.0-2     stats4_4.3.1      
[13] tibble_3.2.1       proxy_0.4-27       fansi_1.0.4       
[16] pkgconfig_2.0.3    KernSmooth_2.23-21 satellite_1.0.4   
[19] RColorBrewer_1.1-3 leaflet_2.1.2      webshot_0.5.5     
[22] lifecycle_1.0.3    compiler_4.3.1     deldir_1.0-9      
[25] munsell_0.5.0      terra_1.7-39       leafsync_0.1.0    
[28] codetools_0.2-19   stars_0.6-1        htmltools_0.5.5   
[31] class_7.3-22       pillar_1.9.0       MASS_7.3-60       
[34] classInt_0.4-9     lwgeom_0.2-13      wk_0.7.3          
[37] abind_1.4-5        boot_1.3-28.1      multcomp_1.4-25   
[40] nlme_3.1-162       tidyselect_1.2.0   digest_0.6.32     
[43] mvtnorm_1.2-2      dplyr_1.1.2        splines_4.3.1     
[46] fastmap_1.1.1      grid_4.3.1         colorspace_2.1-0  
[49] expm_0.999-7       cli_3.6.1          magrittr_2.0.3    
[52] base64enc_0.1-3    dichromat_2.0-0.1  XML_3.99-0.14     
[55] survival_3.5-5     utf8_1.2.3         TH.data_1.1-2     
[58] leafem_0.2.0       e1071_1.7-13       scales_1.2.1      
[61] sp_1.6-1           rmarkdown_2.23     zoo_1.8-12        
[64] png_0.1-8          coda_0.19-4        evaluate_0.21     
[67] knitr_1.43         tmaptools_3.1-1    s2_1.1.4          
[70] rlang_1.1.1        Rcpp_1.0.10        glue_1.6.2        
[73] DBI_1.1.3          rstudioapi_0.14    jsonlite_1.8.5    
[76] R6_2.5.1           units_0.8-2       

Reload data from pervious session

load("_data/msoa2_spatial.RData")

There are various techniques to model spatial dependence and spatial processes (LeSage and Pace 2009). Here, we will just cover a few of the most common techniques / econometric models. One advantage of the most basic spatial model (SLX) is that this method can easily be incorporated in a variety of other methodologies, such as machine learning approaches.

For more in-depth materials see LeSage and Pace (2009) and Kelejian and Piras (2017). Franzese and Hays (2007), Halleck Vega and Elhorst (2015), LeSage (2014), Rüttenauer (2022), Rüttenauer (2024) and Wimpy, Whitten, and Williams (2021) provide article-length introductions.

4.1 Why do we need spatial regression models

4.1.1 Non-spatial OLS

Let us start with a linear model, where \(\bm y\) is the outcome or dependent variable (\(N \times 1\)), \(\bm X\) are various exogenous covariates (\(N \times k\)), and \(\bm \varepsilon\) (\(N \times 1\)) is the error term. We are usually interested in the coefficient vector \(\bm \beta\) (\(k \times 1\)) and its insecurity estimates.

\[ {\bm y}={\bm X}{\bm \beta}+ {\bm \varepsilon} \] The work-horse for estimating \(\bm \beta\) in the social science is the OLS estimator (Wooldridge 2010).

\[ \hat{\beta}=({\bm X}^\intercal{\bm X})^{-1}{\bm X}^\intercal{\bm y}. \]

OLS assumptions I
  1. \(\Exp(\epsilon_i|\bm X_i) = 0\): for every value of \(X\), the average / expectation of the error term \(\bm \varepsilon\) equals zero – put differently: the error term is independent of \(X\),

  2. the observations of the sample are independent and identically distributed (i.i.d),

  3. the fourth moments of the variables \(\bm X_i\) and \(Y_i\) are positive and definite – put differently: extreme values / outliers are very very rare,

  4. \(\text{rank}(\bm X) = K\): the matrix \(\bm X\) has full rank – put differently: no perfect multicollinearity between the covariates,

OLS assumptions II
  1. \(\Var(\varepsilon|x) = \sigma^2\): the error terms \(\varepsilon\) are homoskedastic / have the same variance given any value of the explanatory variable,

  2. \(\varepsilon \sim \mathcal{N}(0, \sigma^2)\): the error terms \(\varepsilon\) are normally distributed (conditional on the explanatory variables \(X_i\)).

Question

Which of the six assumptions above may be violated by spatial dependence?

4.1.2 Problem of ignoring spatial dependence

Does spatial dependence influence the results / coefficient estimates of non-spatial regression models, or in other words: is ignoring spatial dependence harmful?

I’ve heard different answers, ranging from “It only affects the standard errors” to “it always introduces bias”. As so often, the true (or best?) answer is somewhere in the middle: it depends (Betz, Cook, and Hollenbach 2020; Cook, Hays, and Franzese 2020; Pace and LeSage 2010; Rüttenauer 2022).

The easiest way to think of it is analogous to the omit variable bias (Betz, Cook, and Hollenbach 2020; Cook, Hays, and Franzese 2020):

\[ plim~\hat{\beta}_{OLS}= \beta + \gamma \frac{\Cov(\bm x, \bm z)}{\Var(\bm x)}, \]

where \(z\) is some omit variable, and \(\gamma\) is the conditional effect of \(\bm z\) on \(\bm y\). Now imagine that the neighbouring values of the dependent variable \(\bm W \bm y\) are autocorrelated to focal unit which we denote with \(\rho > 0\), and that the covariance between the focal unit’s exogenous covariates and \(\bm W \bm y\) is not zero. Then we will have an omitted variable bias due to spatial dependence:

\[ plim~\hat{\beta}_{OLS}= \beta + \rho \frac{\Cov(\bm x, \bm W \bm y)}{\Var(\bm x)} \neq \beta, \]

For completeness, the entire bias is a bit more complicated (Pace and LeSage 2010; Rüttenauer 2022) and looks like:

\[ plim~\hat{\beta}=\frac{\sum_{ij}({\bm M}(\delta){\bm M}(\delta)^\intercal\circ{\bm M}(\rho))_{ij}} {\tr({\bm M}(\delta){\bm M}(\delta)^\intercal)}\beta \\ +\frac{\sum_{ij}({\bm M}(\delta){\bm M}(\delta)^\intercal\circ{\bm M}(\rho){\bm W})_{ij}} {\tr({\bm M}(\delta){\bm M}(\delta)^\intercal)}\theta, \] where \(\circ\) denotes the Hadamard product, \({\bm M}(\delta)=({\bm I}_N-\delta{\bm W})^{-1}\), and \({\bm M}(\rho)=({\bm I}_N-\rho{\bm W})^{-1}\).

(Don’t worry, no need to learn by hard!!)

Essentially, the non-spatial OLS estimator \(\beta_{OLS}\) is biased in the presence of either (Pace and LeSage 2010; Rüttenauer 2022):

  • Spatial autocorrelation in the dependent variable (\(\rho\neq0\)) and spatial autocorrelation in the covariate (\(\delta\neq0\)). This bias increases with \(\rho\), \(\delta\), and \(\beta\).

  • Local spatial spillover effects (\(\theta\neq0\)) and spatial autocorrelation in the covariate (\(\delta\neq0\)). This is analogous to the omitted variable bias resulting from the omission of \({\bm W} {\bm x}\). It increases with \(\theta\) and \(\delta\), but additionally with \(\rho\) if \(\theta\neq0\) and \(\delta\neq0\).

  • An omitted variable and \(\mathrm{E}({\bm \varepsilon}|{\bm x})\neq0\). This non-spatial omitted variable bias \(\gamma\) is amplified by spatial dependence in the disturbances (\(\lambda\)) and spatial autocorrelation in the dependent variable (\(\rho\)), but also increases with positive values of \(\delta\) if either \(\rho\neq 0\) or \(\lambda\neq 0\). Obviously, it also increases with \(\gamma\).

4.2 Spatial Regression Models

Broadly, spatial dependence or clustering in some characteristic can be the result of three different processes:

flowchart TD
  id1{Spatial dependence} --> A[Spatial interdepence]
  id1 --> B[Spillovers from covariates]
  id1 --> C[Clustering unobservables]

Strictly speaking, there are some other possibilities too, such as measurement error or the wrong choice on the spatial level. For instance, imagine we have a city-specific characteristic (e.g. public spending) allocated to neighbourhood units. Obviously, this will introduce heavy autocorrelation on the neigbourhood level by construction.

There are three basic ways of incorporating spatial dependence, which then can be further combined. As before, the \(N \times N\) spatial weights matrix \(\bm W\) defines the spatial relationship between units.

4.2.1 Spatial Error Model (SEM)

  • Clustering on Unobservables

\[ \begin{split} {\bm y}&=\alpha{\bm \iota}+{\bm X}{\bm \beta}+{\bm u},\\ {\bm u}&=\lambda{\bm W}{\bm u}+{\bm \varepsilon} \end{split} \]

\(\lambda\) denotes the strength of the spatial correlation in the errors of the model: your errors influence my errors.

  • \(> 0\): positive error dependence,
  • \(< 0\): negative error dependence,
  • \(= 0\): traditional OLS model.

\(\lambda\) is defined in the range \([-1, +1]\).

4.2.2 Spatial Autoregressive Model (SAR)

  • Interdependence

\[ {\bm y}=\alpha{\bm \iota}+\rho{\bm W}{\bm y}+{\bm X}{\bm \beta}+ {\bm \varepsilon} \]

\(\rho\) denotes the strength of the spatial correlation in the dependent variable (spatial autocorrelation): your outcome influences my outcome.

  • \(> 0\): positive spatial dependence,
  • \(< 0\): negative spatial dependence,
  • \(= 0\): traditional OLS model.

\(\rho\) is defined in the range \([-1, +1]\).

4.2.3 Spatially lagged X Model (SLX)

  • Spillovers in Covariates

\[ {\bm y}=\alpha{\bm \iota}+{\bm X}{\bm \beta}+{\bm W}{\bm X}{\bm \theta}+ {\bm \varepsilon} \]

\(\theta\) denotes the strength of the spatial spillover effects from covariate(s) on the dependent variable: your covariates influence my outcome.

\(\theta\) is basically like any other coefficient from a covariate. It is thus not bound to any range.

Moreover, there are models combining two sets of the above specifications.

4.2.4 Spatial Durbin Model (SDM)

  • Interdependence
  • Spillovers in Covariates

\[ {\bm y}=\alpha{\bm \iota}+\rho{\bm W}{\bm y}+{\bm X}{\bm \beta}+{\bm W}{\bm X}{\bm \theta}+ {\bm \varepsilon} \]

4.2.5 Spatial Durbin Error Model (SDEM)

  • Clustering on Unobservables
  • Spillovers in Covariates

\[ \begin{split} {\bm y}&=\alpha{\bm \iota}+{\bm X}{\bm \beta}+{\bm W}{\bm X}{\bm \theta}+ {\bm u},\\ {\bm u}&=\lambda{\bm W}{\bm u}+{\bm \varepsilon} \end{split} \]

4.2.6 Combined Spatial Autocorrelation Model (SAC)

  • Clustering on Unobservables
  • Interdependence

\[ \begin{split} {\bm y}&=\alpha{\bm \iota}+\rho{\bm W}{\bm y}+{\bm X}{\bm \beta}+ {\bm u},\\ {\bm u}&=\lambda{\bm W}{\bm u}+{\bm \varepsilon} \end{split} \]

4.2.7 General Nesting Spatial Model (GNS)

  • Clustering on Unobservables
  • Interdependence
  • Spillovers in Covariates

\[ \begin{split} {\bm y}&=\alpha{\bm \iota}+\rho{\bm W}{\bm y}+{\bm X}{\bm \beta}+{\bm W}{\bm X}{\bm \theta}+ {\bm u},\\ {\bm u}&=\lambda{\bm W}{\bm u}+{\bm \varepsilon} \end{split} \]

Manski’s reflection problem

The General Nesting Spatial Model (GNS) is only weakly (or not?) identifiable (Gibbons and Overman 2012).

It’s analogous to Manski’s reflection problem on neighbourhood effects (Manski 1993): If people in the same group behave similar, this can be because a) imitating behaviour of the group, b) exogenous characteristics of the group influence the behaviour, and c) members of the same group are exposed to the same external circumstances. We just cannot separate those in observational data.

Note that all of these models assume different data generating processes (DGP) leading to the spatial pattern. Although there are specifications tests, it is generally not possible to let the data decide which one is the true underlying DGP (Cook, Hays, and Franzese 2020; Rüttenauer 2022). However, there might be theoretical reasons to guide the model specification (Cook, Hays, and Franzese 2020).

Just because SAR is probably the most commonly used model does not make it the best choice. In contrast, various studies (Halleck Vega and Elhorst 2015; Rüttenauer 2022; Wimpy, Whitten, and Williams 2021) highlight the advantages of the relative simple SLX model. Moreover, this specification can basically be incorporated in any other statistical method.

4.2.8 A note on missings

Missing values create a problem in spatial data analysis. For instance, in a local spillover model with an average of 10 neighbours, two initial missing values will lead to 20 missing values in the spatially lagged variable. For global spillover models, one initial missing will ‘flow’ through the neighbourhood system until the cutoff point (and create an excess amount of missings).

Depending on the data, units with missings can either be dropped and omitted from the initial weights creation, or we need to impute the data first, e.g. using interpolation or Kriging.

4.3 Mini Example

Let’s try to make sense of this. We rely on a mini example using a few units in Camden

sub.spdf <- msoa.spdf[c(172, 175, 178, 179, 181, 182), ]
mapview(sub.spdf)

We then construct queens neighbours, and have a look at the resulting non-normalized matrix \(\bm W\).

queens.nb <- poly2nb(sub.spdf, queen = TRUE, snap = 1)
W <- nb2mat(queens.nb, style = "B")
W
    [,1] [,2] [,3] [,4] [,5] [,6]
172    0    0    1    0    0    0
175    0    0    0    1    0    1
178    1    0    0    1    1    0
179    0    1    1    0    1    1
181    0    0    1    1    0    1
182    0    1    0    1    1    0
attr(,"call")
nb2mat(neighbours = queens.nb, style = "B")

We have selected 6 units. So, \(\bm W\) is a \(6 \times 6\) matrix. we see that observation 1 has one neighbour: observation 3. Observation 2 has two neighbours: observation 4 and observation 6. The diagonal is zero: no unit is a neighbour of themselves.

No we row-normalize this matrix.

queens.lw <- nb2listw(queens.nb,
                      style = "W")
W_rn <- listw2mat(queens.lw)
W_rn
         [,1]      [,2]      [,3]      [,4]      [,5]      [,6]
172 0.0000000 0.0000000 1.0000000 0.0000000 0.0000000 0.0000000
175 0.0000000 0.0000000 0.0000000 0.5000000 0.0000000 0.5000000
178 0.3333333 0.0000000 0.0000000 0.3333333 0.3333333 0.0000000
179 0.0000000 0.2500000 0.2500000 0.0000000 0.2500000 0.2500000
181 0.0000000 0.0000000 0.3333333 0.3333333 0.0000000 0.3333333
182 0.0000000 0.3333333 0.0000000 0.3333333 0.3333333 0.0000000

No every single weight \(w_{ij}\) is divided by the total number of neighbours \(n_i\) of the focal unit. For observation 1, observation 3 is the only neighbour, thus a weight = 1. For observation two, both neighbours have a weight of 1/2. For observation 3 (with three neighbours) each neighbour got a weight of 1/3.

Question

What happens if we multiply this matrix \(\bm W\) with a \(N \times 1\) vector \(\bm y\) or \(\bm x\)?

A short reminder on matrix multiplication.

\[ \bm W * \bm y = \begin{bmatrix} w_{11} & w_{12} & w_{13}\\ w_{21} & w_{22} & w_{23}\\ w_{31} & w_{32} & w_{33} \end{bmatrix} * \begin{bmatrix} y_{11} \\ y_{21} \\ y_{31} \end{bmatrix}\\ = \begin{bmatrix} w_{11}y_{11} + w_{12}y_{21} + w_{13}y_{31}\\ w_{21}y_{11} + w_{22}y_{21} + w_{23}y_{31}\\ w_{31}y_{11} + w_{32}y_{21} + w_{33}y_{31} \end{bmatrix} \]

Each line of \(\bm W * \bm y\) just gives a weighted average of the other \(y\)-values \(y_j\) in the sample. In case of the row-normalization, each neighbour gets the same weight \(\frac{1}{n_i}\). This is simply the mean of \(y_j\) of the neighbours in case of a row-normalized contiguity weights matrix.

Note that the mean interpretation is only valid with row-normalization. What would we get with inverse-distance based weights?

Let’s look at this in our example

y <- sub.spdf$med_house_price
x <- sub.spdf$pubs_count

W_rn
         [,1]      [,2]      [,3]      [,4]      [,5]      [,6]
172 0.0000000 0.0000000 1.0000000 0.0000000 0.0000000 0.0000000
175 0.0000000 0.0000000 0.0000000 0.5000000 0.0000000 0.5000000
178 0.3333333 0.0000000 0.0000000 0.3333333 0.3333333 0.0000000
179 0.0000000 0.2500000 0.2500000 0.0000000 0.2500000 0.2500000
181 0.0000000 0.0000000 0.3333333 0.3333333 0.0000000 0.3333333
182 0.0000000 0.3333333 0.0000000 0.3333333 0.3333333 0.0000000
y
[1] 376812.5 414625.0 713125.0 322750.0 495000.0 364000.0
x
[1] 1 3 3 1 9 7
W_rn_y <- W_rn %*% y
W_rn_x <- W_rn %*% x
W_rn_y
        [,1]
172 713125.0
175 343375.0
178 398187.5
179 496687.5
181 466625.0
182 410791.7
W_rn_x
        [,1]
172 3.000000
175 4.000000
178 3.666667
179 5.500000
181 3.666667
182 4.333333

Let’s check if our interpretation is true

W_rn_y[1] == y[3]
[1] TRUE
W_rn_y[2] == mean(y[c(4, 6)])
[1] TRUE
W_rn_y[4] == mean(y[c(2, 3, 5, 6)])
[1] TRUE

4.4 Real Example

First, we need the a spatial weights matrix.

# Contiguity (Queens) neighbours weights
queens.nb <- poly2nb(msoa.spdf, 
                     queen = TRUE, 
                     snap = 1) # we consider points in 1m distance as 'touching'
queens.lw <- nb2listw(queens.nb,
                      style = "W")

We can estimate spatial models using spatialreg.

4.4.1 SAR

Lets estimate a spatial SAR model using the lagsarlm() with contiguity weights. We use median house value as depended variable, and include population density (POPDEN), the air pollution (no2), and the share of ethnic minorities (per_mixed, per_asian, per_black, per_other).

mod_1.sar <- lagsarlm(log(med_house_price) ~ log(no2) + log(POPDEN) + 
                        per_mixed + per_asian + per_black + per_other,  
                      data = msoa.spdf, 
                      listw = queens.lw,
                      Durbin = FALSE) # we could here extend to SDM
summary(mod_1.sar)

Call:
lagsarlm(formula = log(med_house_price) ~ log(no2) + log(POPDEN) + 
    per_mixed + per_asian + per_black + per_other, data = msoa.spdf, 
    listw = queens.lw, Durbin = FALSE)

Residuals:
       Min         1Q     Median         3Q        Max 
-0.5281789 -0.1220524 -0.0099245  0.0992203  1.0936745 

Type: lag 
Coefficients: (asymptotic standard errors) 
               Estimate  Std. Error  z value  Pr(>|z|)
(Intercept)  3.17383180  0.29041604  10.9286 < 2.2e-16
log(no2)     0.39705423  0.04452880   8.9168 < 2.2e-16
log(POPDEN) -0.05583014  0.01242876  -4.4920 7.055e-06
per_mixed    0.01851577  0.00579832   3.1933  0.001407
per_asian   -0.00228346  0.00045876  -4.9775 6.442e-07
per_black   -0.01263650  0.00100282 -12.6009 < 2.2e-16
per_other   -0.00161419  0.00289082  -0.5584  0.576582

Rho: 0.66976, LR test value: 473.23, p-value: < 2.22e-16
Asymptotic standard error: 0.025311
    z-value: 26.461, p-value: < 2.22e-16
Wald statistic: 700.19, p-value: < 2.22e-16

Log likelihood: 196.7203 for lag model
ML residual variance (sigma squared): 0.035402, (sigma: 0.18815)
Number of observations: 983 
Number of parameters estimated: 9 
AIC: -375.44, (AIC for lm: 95.786)
LM test for residual autocorrelation
test value: 8.609, p-value: 0.0033451

This looks pretty much like a conventional model output, with some additional information: a highly significant mod_1.sar$rho of 0.67 indicates strong positive spatial autocorrelation.

Remember that is the coefficient for the term \(\bm y = \rho \bm W \bm y \ldots\). It is bound to be below 1 for positive autocorrelation.

In substantive terms, house prices in the focal unit positively influence house prices in neighbouring units, which again influences house prices among the neighbours of these neighbours, and so on (we’ll get back to this).

Warning

The coefficients of covariates in a SAR model are not marginal or partial effects, because of the spillovers and feedback loops in \(\bm y\) (see below)!

From the coefficient, we can only interpret the direction: there’s a positive effect of air pollution and a negative effect of population sensitivity, and so on…

4.4.2 SEM

SEM models can be estimated using errorsarlm().

mod_1.sem <- errorsarlm(log(med_house_price) ~ log(no2) + log(POPDEN) +
                          per_mixed + per_asian + per_black + per_other,  
                        data = msoa.spdf, 
                        listw = queens.lw,
                        Durbin = FALSE) # we could here extend to SDEM
summary(mod_1.sem)

Call:
errorsarlm(formula = log(med_house_price) ~ log(no2) + log(POPDEN) + 
    per_mixed + per_asian + per_black + per_other, data = msoa.spdf, 
    listw = queens.lw, Durbin = FALSE)

Residuals:
      Min        1Q    Median        3Q       Max 
-0.581785 -0.105218 -0.012758  0.094430  0.913425 

Type: error 
Coefficients: (asymptotic standard errors) 
               Estimate  Std. Error  z value  Pr(>|z|)
(Intercept) 12.92801104  0.35239139  36.6865 < 2.2e-16
log(no2)     0.15735296  0.10880727   1.4462 0.1481317
log(POPDEN) -0.08316270  0.01254315  -6.6301 3.354e-11
per_mixed   -0.03377962  0.00811054  -4.1649 3.115e-05
per_asian   -0.00413115  0.00096849  -4.2656 1.994e-05
per_black   -0.01653816  0.00126741 -13.0488 < 2.2e-16
per_other   -0.01693012  0.00462999  -3.6566 0.0002556

Lambda: 0.88605, LR test value: 623.55, p-value: < 2.22e-16
Asymptotic standard error: 0.015803
    z-value: 56.068, p-value: < 2.22e-16
Wald statistic: 3143.6, p-value: < 2.22e-16

Log likelihood: 271.8839 for error model
ML residual variance (sigma squared): 0.026911, (sigma: 0.16405)
Number of observations: 983 
Number of parameters estimated: 9 
AIC: NA (not available for weighted model), (AIC for lm: 95.786)

In this case mod_1.sem$lambda gives us the spatial parameter. A highly significant lambda of 0.89 indicates that the errors are highly spatially correlated (e.g. due to correlated unobservables). Again, $= 1 $ would be the maximum.

In spatial error models, we can interpret the coefficients directly, as in a conventional linear model.

4.4.3 SLX

SLX models can either be estimated with lmSLX() directly, or by creating \(\bm W \bm X\) manually and plugging it into any available model-fitting function.

mod_1.slx <- lmSLX(log(med_house_price) ~ log(no2) + log(POPDEN) + 
                     per_mixed + per_asian + per_black + per_other,  
                   data = msoa.spdf, 
                   listw = queens.lw, 
                   Durbin = TRUE) # use a formula to lag only specific covariates
summary(mod_1.slx)

Call:
lm(formula = formula(paste("y ~ ", paste(colnames(x)[-1], collapse = "+"))), 
    data = as.data.frame(x), weights = weights)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.50809 -0.16605 -0.01817  0.13055  1.09039 

Coefficients:
                 Estimate Std. Error t value Pr(>|t|)    
(Intercept)     10.582440   0.153862  68.779  < 2e-16 ***
log.no2.        -0.440727   0.181063  -2.434  0.01511 *  
log.POPDEN.     -0.076840   0.017345  -4.430 1.05e-05 ***
per_mixed       -0.033042   0.011298  -2.925  0.00353 ** 
per_asian       -0.002381   0.001474  -1.615  0.10655    
per_black       -0.016229   0.001801  -9.009  < 2e-16 ***
per_other       -0.020391   0.006564  -3.107  0.00195 ** 
lag.log.no2.     0.993602   0.199370   4.984 7.38e-07 ***
lag.log.POPDEN.  0.113262   0.028752   3.939 8.76e-05 ***
lag.per_mixed    0.126069   0.014294   8.820  < 2e-16 ***
lag.per_asian   -0.003828   0.001661  -2.305  0.02140 *  
lag.per_black   -0.018054   0.002241  -8.056 2.30e-15 ***
lag.per_other    0.048139   0.007971   6.039 2.20e-09 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.2262 on 970 degrees of freedom
Multiple R-squared:  0.653, Adjusted R-squared:  0.6487 
F-statistic: 152.1 on 12 and 970 DF,  p-value: < 2.2e-16

In SLX models, we can simply interpret the coefficients of direct and indirect (spatially lagged) covariates.

For instance, lets look at population density:

Interpretaion SLX
  1. A high population density in the focal unit is related to lower house prices (a 1% increase in population density decreses house prices by -0.08%), but

  2. A high population density in the neighbouring areas is related to higher house prices (while keeping population density in the focal unit constant). A 1% increase in the average population density across the adjacent neighbourhoods increases house prices in the focal unit by 0.11%)

Potential interpretation: areas with a low population density in central regions of the city (high pop density in surrounding neighbourhoods) have higher house prices. We could try testing this interpretation by including the distance to the city center as a control.

Also note how the air pollution coefficient has changed here, with a negative effect in the focal unit and positive one amonf the neighbouring units.

An alternative way of estimating the same model is lagging the covariates first.

# Loop through vars and create lagged variables
msoa.spdf$log_POPDEN <- log(msoa.spdf$POPDEN)
msoa.spdf$log_no2 <- log(msoa.spdf$no2)
msoa.spdf$log_med_house_price <- log(msoa.spdf$med_house_price)

vars <- c("log_med_house_price", "log_no2", "log_POPDEN", 
          "per_mixed", "per_asian", "per_black", "per_other",
          "per_owner", "per_social", "pubs_count")
for(v in vars){
  msoa.spdf[, paste0("w.", v)] <- lag.listw(queens.lw,
                                            var = st_drop_geometry(msoa.spdf)[, v])
}

# Alternatively:
w_vars <- create_WX(st_drop_geometry(msoa.spdf[, vars]),
                    listw = queens.lw,
                    prefix = "w")

head(w_vars)
  w.log_med_house_price w.log_no2 w.log_POPDEN w.per_mixed
1              12.98382  3.843750     4.662014    4.748368
2              12.28730  3.098960     3.300901    3.978275
3              12.21207  3.206338     4.009795    3.997487
4              12.18176  3.169934     3.630360    2.759082
5              12.11159  3.221203     3.993660    3.930061
6              12.08393  3.217865     3.876070    3.419488
  w.per_asian w.per_black w.per_other w.per_owner w.per_social
1   23.899916    7.879758   3.2080074    25.75738     33.85580
2   19.951593   10.451828   1.6368986    66.42278     15.75042
3   20.793559   12.965863   1.7526693    58.72637     21.38169
4    7.633439   12.135478   0.6992118    66.52519     19.70500
5   12.791140   16.108948   1.3817357    53.05539     29.44022
6    8.997514   15.312652   0.9611710    59.49460     23.81126
  w.pubs_count
1    8.5454545
2    0.6666667
3    0.2857143
4    0.2000000
5    0.4000000
6    0.1666667

And subsequently we use those new variables in a linear model.

mod_1.lm <- lm (log(med_house_price) ~ log(no2) + log(POPDEN) + 
                  per_mixed + per_asian + per_black + per_other +
                  w.log_no2 + w.log_POPDEN +
                  w.per_mixed + w.per_asian + w.per_black + w.per_other,
                data = msoa.spdf)
summary(mod_1.lm)

Call:
lm(formula = log(med_house_price) ~ log(no2) + log(POPDEN) + 
    per_mixed + per_asian + per_black + per_other + w.log_no2 + 
    w.log_POPDEN + w.per_mixed + w.per_asian + w.per_black + 
    w.per_other, data = msoa.spdf)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.50809 -0.16605 -0.01817  0.13055  1.09039 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)  10.582440   0.153862  68.779  < 2e-16 ***
log(no2)     -0.440727   0.181063  -2.434  0.01511 *  
log(POPDEN)  -0.076840   0.017345  -4.430 1.05e-05 ***
per_mixed    -0.033042   0.011298  -2.925  0.00353 ** 
per_asian    -0.002381   0.001474  -1.615  0.10655    
per_black    -0.016229   0.001801  -9.009  < 2e-16 ***
per_other    -0.020391   0.006564  -3.107  0.00195 ** 
w.log_no2     0.993602   0.199370   4.984 7.38e-07 ***
w.log_POPDEN  0.113262   0.028752   3.939 8.76e-05 ***
w.per_mixed   0.126069   0.014294   8.820  < 2e-16 ***
w.per_asian  -0.003828   0.001661  -2.305  0.02140 *  
w.per_black  -0.018054   0.002241  -8.056 2.30e-15 ***
w.per_other   0.048139   0.007971   6.039 2.20e-09 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.2262 on 970 degrees of freedom
Multiple R-squared:  0.653, Adjusted R-squared:  0.6487 
F-statistic: 152.1 on 12 and 970 DF,  p-value: < 2.2e-16

Looks pretty similar to lmSLX() results, and it should! A big advantage of the SLX specification is that we can use the lagged variables in basically all methods which take variables as inputs, such as non-linear models, matching algorithms, and machine learning tools.

Moreover, using the lagged variables gives a high degree of freedom. For instance, we could (not saying that it necessarily makes sense):

  • Use different weights matrices for different variables

  • Include higher order neighbours using nblag() (with an increasing number of orders we go towards a more global model, but we estimate a coefficient for each spillover, instead of estimating just one)

  • Use machine learning techniques to determine the best fitting weights specification.

4.4.4 SDEM

SDEM models can be estimated using errorsarlm() with the additional option Durbin = TRUE.

mod_1.sdem <- errorsarlm(log(med_house_price) ~ log(no2) + log(POPDEN) +
                          per_mixed + per_asian + per_black + per_other,  
                        data = msoa.spdf, 
                        listw = queens.lw,
                        Durbin = TRUE) # we could here extend to SDEM
summary(mod_1.sdem)

Call:
errorsarlm(formula = log(med_house_price) ~ log(no2) + log(POPDEN) + 
    per_mixed + per_asian + per_black + per_other, data = msoa.spdf, 
    listw = queens.lw, Durbin = TRUE)

Residuals:
      Min        1Q    Median        3Q       Max 
-0.617795 -0.106380 -0.014832  0.095826  0.927446 

Type: error 
Coefficients: (asymptotic standard errors) 
                  Estimate Std. Error  z value  Pr(>|z|)
(Intercept)     10.4422703  0.3652148  28.5921 < 2.2e-16
log(no2)        -0.2057493  0.1264914  -1.6266 0.1038248
log(POPDEN)     -0.0769743  0.0132094  -5.8272 5.635e-09
per_mixed       -0.0222406  0.0079705  -2.7904 0.0052649
per_asian       -0.0037484  0.0010054  -3.7284 0.0001927
per_black       -0.0179751  0.0012383 -14.5161 < 2.2e-16
per_other       -0.0150218  0.0044895  -3.3460 0.0008199
lag.log(no2)     1.0004491  0.1739833   5.7503 8.911e-09
lag.log(POPDEN) -0.0054241  0.0327802  -0.1655 0.8685763
lag.per_mixed    0.0669699  0.0169349   3.9545 7.668e-05
lag.per_asian   -0.0018566  0.0015957  -1.1635 0.2446368
lag.per_black   -0.0079949  0.0024833  -3.2195 0.0012842
lag.per_other    0.0273378  0.0087430   3.1268 0.0017671

Lambda: 0.76173, LR test value: 455.7, p-value: < 2.22e-16
Asymptotic standard error: 0.024949
    z-value: 30.531, p-value: < 2.22e-16
Wald statistic: 932.15, p-value: < 2.22e-16

Log likelihood: 300.847 for error model
ML residual variance (sigma squared): 0.027504, (sigma: 0.16584)
Number of observations: 983 
Number of parameters estimated: 15 
AIC: NA (not available for weighted model), (AIC for lm: -117.99)

And this SDEM can be interpreted like a combination of SEM and SLX.

First, we still see highly significant auto-correlation in the error term. However, it’s lower in magnitude now that we also include the \(\bm W X\) terms.

Second, the coefficients tell a smimilar story as in the SLX (use the same interpretation), but some coefficient magnitudes have become smaller.

4.4.5 SDM

SDM models can be estimated using lagsarlm() with the additional option Durbin = TRUE.

mod_1.sdm <- lagsarlm(log(med_house_price) ~ log(no2) + log(POPDEN) + 
                        per_mixed + per_asian + per_black + per_other,  
                      data = msoa.spdf, 
                      listw = queens.lw,
                      Durbin = TRUE) # we could here extend to SDM
summary(mod_1.sdm)

Call:
lagsarlm(formula = log(med_house_price) ~ log(no2) + log(POPDEN) + 
    per_mixed + per_asian + per_black + per_other, data = msoa.spdf, 
    listw = queens.lw, Durbin = TRUE)

Residuals:
      Min        1Q    Median        3Q       Max 
-0.614314 -0.107947 -0.013509  0.092234  0.917398 

Type: mixed 
Coefficients: (asymptotic standard errors) 
                  Estimate Std. Error  z value  Pr(>|z|)
(Intercept)      2.7843426  0.2944721   9.4554 < 2.2e-16
log(no2)        -0.3112762  0.1308101  -2.3796 0.0173312
log(POPDEN)     -0.0802866  0.0125213  -6.4120 1.436e-10
per_mixed       -0.0368998  0.0081596  -4.5223 6.118e-06
per_asian       -0.0033726  0.0010636  -3.1711 0.0015189
per_black       -0.0159770  0.0013006 -12.2848 < 2.2e-16
per_other       -0.0209743  0.0047369  -4.4279 9.516e-06
lag.log(no2)     0.4880923  0.1456778   3.3505 0.0008067
lag.log(POPDEN)  0.0781188  0.0207600   3.7629 0.0001679
lag.per_mixed    0.0640880  0.0104646   6.1243 9.110e-10
lag.per_asian    0.0017665  0.0012101   1.4598 0.1443498
lag.per_black    0.0070487  0.0017938   3.9295 8.511e-05
lag.per_other    0.0284822  0.0057774   4.9299 8.226e-07

Rho: 0.73126, LR test value: 501.83, p-value: < 2.22e-16
Asymptotic standard error: 0.025889
    z-value: 28.246, p-value: < 2.22e-16
Wald statistic: 797.86, p-value: < 2.22e-16

Log likelihood: 323.9111 for mixed model
ML residual variance (sigma squared): 0.026633, (sigma: 0.1632)
Number of observations: 983 
Number of parameters estimated: 15 
AIC: -617.82, (AIC for lm: -117.99)
LM test for residual autocorrelation
test value: 36.704, p-value: 1.3747e-09

And this SDM can be interpreted like a combination of SAR and SLX.

First, there’s still substantial auto-correlation in \(\bm y\), and this has become even stronger as compared to SAR.

Second, we can interpret the direction of the effect, but we cannot interpret the coefficient as marginal effects.

4.5 Estimation

Note that most of the spatial model specifications can not be estimated by Least Squares (LS), as using (constrained) LS estimators for models containing a spatially lagged dependent variable or disturbance leads to inconsistent results (Anselin and Bera 1998; Franzese and Hays 2007). However, an extensive amount of econometric literature discusses different estimation methods based on (quasi-) maximum likelihood (Anselin 1988; Lee 2004; Ord 1975) or instrumental variable approaches using generalized methods of moments (Drukker, Egger, and Prucha 2013; Kelejian and Prucha 1998, 2010), in which the endogenous lagged variables can be instrumented by \(q\) higher order lags of the exogenous regressors \(({\bm X}, {\bm W \bm X}, {\bm W^2 \bm X},..., {\bm W^q \bm X})\) (Kelejian and Prucha 1998).

4.5.1 Simulataneity bias

Remember what is happening when we estimate a spatial auto-regressive model.

Note the circular process here: My \(X\) influences my \(Y\), which then influences your \(Y\), which then influences my \(Y\) again. We write this as

\[ {\bm y}=\alpha{\bm \iota}+\rho{\bm W}{\bm y}+{\bm X}{\bm \beta}+ {\bm \varepsilon}. \]

If we ignore \({\bm X}{\bm \beta}\) and write the pure auto-regressive term in its reduce form, we get:

\[ \bm y =\left(\bm I_n - \rho\bm W\right)^{-1}\varepsilon, \]

and the spatial lag term is

\[ \bm W \bm y =\bm W\left(\bm I_n - \rho\bm W\right)^{-1}\varepsilon. \]

The OLS estimator for the spatial lag term then is

\[ \hat{\rho}_{OLS} = \left[\underbrace{\left(\bm W\bm y \right)^\top}_{(1\times n)}\underbrace{\left(\bm W\bm y \right)}_{(n\times 1)}\right]^{-1}\underbrace{\left(\bm W\bm y \right)^\top}_{(1\times n)}\underbrace{\bm y}_{(n\times 1)}. \]

It can then be shown that the OLS estimators equals

\[ \hat{\rho}_{OLS} = \rho + \left[\left(\bm W\bm y \right)^\top\left(\bm W\bm y \right)\right]^{-1}\left(\bm W\bm y \right)^\top\varepsilon \\ = \rho + \left(\sum_{i = 1}^n \bm y_{Li}^2\right)^{-1}\left(\sum_{i = 1}^{n}\bm y_{Li}\epsilon_i\right), \]

with \(\bm y_{Li}\) defined as the \(i\)th element of the spatial lag operator \(\bm W\bm y = \bm y_L\). It can further be shown that the second part of the equation \(\neq 0\), which demonstrates that OLS gives a biased estimate of \(\rho\) (Franzese and Hays 2007; Sarrias 2023).

Warning

Do not estimate spatial lags of the dependent variable in OLS. It will suffer from simultaneity bias.

4.5.2 Instrumental variable

A potential way of estimating spatial lag /SAR models is 2SLS (Kelejian and Prucha 1998).

We start with our standard model

\[ {\bm y}=\alpha{\bm \iota}+\rho{\bm W}{\bm y}+{\bm X}{\bm \beta}+ {\bm \varepsilon}. \]

As we have seen above, there is a problem of simultaneity: the “covariate” \({\bm W}{\bm y}\) is endogenous. One way of dealing with this endogeneity problem is the Instrumental Variable approach.

So, the question is what are good instruments \(\bm H\) for \({\bm W}{\bm y}\)? As we have specified the mode, we are sure that \({\bm X}\) determines \({\bm y}\). Thus, it must be true that \({\bm W}{\bm X}\) and \({\bm W}^2{\bm X},\ldots, {\bm W}^l{\bm X}\) determines \({\bm W}{\bm y}\).

Note that \({\bm W}^l\) denotes higher orders of \({\bm W}\). So \({\bm W}^2\) are the second order neighbours (neighbours of neighbours), and \({\bm W}^3\) are the third order neighbours (the neighbours of my neighbour’s neighbours), and so on…

We will discuss this in more detail later, but note for now that the reduced form of the SAR always contains a series of higher order neighbours.

\[ ({\bm I_N}-\rho {\bm W})^{-1}\beta_k =({\bm I_N} + \rho{\bm W} + \rho^2{\bm W}^2 + \rho^3{\bm W}^3 + ...)\beta_k = ({\bm I_N} + \sum_{h=1}^\infty \rho^h{\bm W}^h)\beta_k . \]

Thus, Kelejian and Prucha (1998) suggested to use a set of lagged covariates as instruments for \(\bm W \bm Y\):

\[ \bm H = \bm X, \bm W\bm X, \bm W^2\bm X, ... , \bm W^l\bm X, \]

where \(l\) is a pre-defined number for the higher order neighbours included. In practice, \(l\) is usually restricted to \(l=2\).

This has further been developed by, for instance, using a (truncated) power series as instruments (Kelejian, Prucha, and Yuzefovich 2004):

\[ \bm H =\left[\bm X, \bm W\left(\sum_{l = 1}^{\infty}\rho^{l}\bm W^l\right)\bm X \bm\beta\right]. \]

We can estimate this using the pacakge spatialreg with the function stsls(),

mod_1.sls <- stsls(log(med_house_price) ~ log(no2) + log(POPDEN) + 
                     per_mixed + per_asian + per_black + per_other,  
                   data = msoa.spdf, 
                   listw = queens.lw,
                   robust = TRUE, #  heteroskedasticity robust SEs
                   W2X = TRUE) # Second order neighbours are included as instruments (else only first)
summary(mod_1.sls)

Call:
stsls(formula = log(med_house_price) ~ log(no2) + log(POPDEN) + 
    per_mixed + per_asian + per_black + per_other, data = msoa.spdf, 
    listw = queens.lw, robust = TRUE, W2X = TRUE)

Residuals:
       Min         1Q     Median         3Q        Max 
-0.5464924 -0.1238002 -0.0052299  0.0989150  1.0793093 

Coefficients: 
               Estimate HC0 std. Error z value  Pr(>|z|)
Rho          0.71004211     0.04678235 15.1776 < 2.2e-16
(Intercept)  2.73582523     0.50997823  5.3646 8.113e-08
log(no2)     0.37752751     0.04920257  7.6729 1.688e-14
log(POPDEN) -0.05710992     0.01684036 -3.3913 0.0006957
per_mixed    0.01634307     0.00588488  2.7771 0.0054842
per_asian   -0.00205426     0.00045905 -4.4750 7.640e-06
per_black   -0.01166456     0.00128557 -9.0734 < 2.2e-16
per_other   -0.00280423     0.00332302 -0.8439 0.3987377

Residual variance (sigma squared): 0.035213, (sigma: 0.18765)

4.6 Maximum Likelihood

Maximum Likelihood estimation of spatial models is the most common way of estimation. The procedure to estimate SAR models via ML is based on Ord (1975) and Anselin (1988). If you are interested in more details, please look at the more extensive 3-day workshop or have a look into Sarrias (2023).

The package spatialreg Pebesma and Bivand (2023) provides a series of functions to calculate the ML estimator for all spatial models we have considered.

Table from Pebesma and Bivand (2023):

model model name maximum likelihood estimation function
SEM spatial error errorsarlm(..., Durbin=FALSE)
SEM spatial error spautolm(..., family="SAR")
SDEM spatial Durbin error errorsarlm(..., Durbin=TRUE)
SLM spatial lag lagsarlm(..., Durbin=FALSE)
SDM spatial Durbin lagsarlm(..., Durbin=TRUE)
SAC spatial autoregressive combined sacsarlm(..., Durbin=FALSE)
GNM general nested sacsarlm(..., Durbin=TRUE)

ML SAR

mod_1.sar <- lagsarlm(log(med_house_price) ~ log(no2) + log(POPDEN) + 
                        per_mixed + per_asian + per_black + per_other,  
                      data = msoa.spdf, 
                      listw = queens.lw,
                      Durbin = FALSE) # we could here extend to SDM
summary(mod_1.sar)

Call:
lagsarlm(formula = log(med_house_price) ~ log(no2) + log(POPDEN) + 
    per_mixed + per_asian + per_black + per_other, data = msoa.spdf, 
    listw = queens.lw, Durbin = FALSE)

Residuals:
       Min         1Q     Median         3Q        Max 
-0.5281789 -0.1220524 -0.0099245  0.0992203  1.0936745 

Type: lag 
Coefficients: (asymptotic standard errors) 
               Estimate  Std. Error  z value  Pr(>|z|)
(Intercept)  3.17383180  0.29041604  10.9286 < 2.2e-16
log(no2)     0.39705423  0.04452880   8.9168 < 2.2e-16
log(POPDEN) -0.05583014  0.01242876  -4.4920 7.055e-06
per_mixed    0.01851577  0.00579832   3.1933  0.001407
per_asian   -0.00228346  0.00045876  -4.9775 6.442e-07
per_black   -0.01263650  0.00100282 -12.6009 < 2.2e-16
per_other   -0.00161419  0.00289082  -0.5584  0.576582

Rho: 0.66976, LR test value: 473.23, p-value: < 2.22e-16
Asymptotic standard error: 0.025311
    z-value: 26.461, p-value: < 2.22e-16
Wald statistic: 700.19, p-value: < 2.22e-16

Log likelihood: 196.7203 for lag model
ML residual variance (sigma squared): 0.035402, (sigma: 0.18815)
Number of observations: 983 
Number of parameters estimated: 9 
AIC: -375.44, (AIC for lm: 95.786)
LM test for residual autocorrelation
test value: 8.609, p-value: 0.0033451

ML SEM

mod_1.sem <- errorsarlm(log(med_house_price) ~ log(no2) + log(POPDEN) +
                          per_mixed + per_asian + per_black + per_other,  
                        data = msoa.spdf, 
                        listw = queens.lw,
                        Durbin = FALSE) # we could here extend to SDEM
summary(mod_1.sem)

Call:
errorsarlm(formula = log(med_house_price) ~ log(no2) + log(POPDEN) + 
    per_mixed + per_asian + per_black + per_other, data = msoa.spdf, 
    listw = queens.lw, Durbin = FALSE)

Residuals:
      Min        1Q    Median        3Q       Max 
-0.581785 -0.105218 -0.012758  0.094430  0.913425 

Type: error 
Coefficients: (asymptotic standard errors) 
               Estimate  Std. Error  z value  Pr(>|z|)
(Intercept) 12.92801104  0.35239139  36.6865 < 2.2e-16
log(no2)     0.15735296  0.10880727   1.4462 0.1481317
log(POPDEN) -0.08316270  0.01254315  -6.6301 3.354e-11
per_mixed   -0.03377962  0.00811054  -4.1649 3.115e-05
per_asian   -0.00413115  0.00096849  -4.2656 1.994e-05
per_black   -0.01653816  0.00126741 -13.0488 < 2.2e-16
per_other   -0.01693012  0.00462999  -3.6566 0.0002556

Lambda: 0.88605, LR test value: 623.55, p-value: < 2.22e-16
Asymptotic standard error: 0.015803
    z-value: 56.068, p-value: < 2.22e-16
Wald statistic: 3143.6, p-value: < 2.22e-16

Log likelihood: 271.8839 for error model
ML residual variance (sigma squared): 0.026911, (sigma: 0.16405)
Number of observations: 983 
Number of parameters estimated: 9 
AIC: NA (not available for weighted model), (AIC for lm: 95.786)