6  Spatial Regression Models: Estimation

This section is strongly based on Sarrias (2023), despite being much less detailed then the original.

Required packages

pkgs <- c("sf", "mapview", "spdep", "spatialreg", "tmap", "viridisLite") # note: load spdep first, then spatialreg
lapply(pkgs, require, character.only = TRUE)

Session info

sessionInfo()
R version 4.3.1 (2023-06-16 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19044)

Matrix products: default


locale:
[1] LC_COLLATE=English_United Kingdom.utf8 
[2] LC_CTYPE=English_United Kingdom.utf8   
[3] LC_MONETARY=English_United Kingdom.utf8
[4] LC_NUMERIC=C                           
[5] LC_TIME=English_United Kingdom.utf8    

time zone: Europe/London
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods  
[7] base     

other attached packages:
[1] viridisLite_0.4.2 tmap_3.3-3        spatialreg_1.2-9 
[4] Matrix_1.5-4.1    spdep_1.2-8       spData_2.2.2     
[7] mapview_2.11.0    sf_1.0-13        

loaded via a namespace (and not attached):
 [1] xfun_0.39          raster_3.6-20      htmlwidgets_1.6.2 
 [4] lattice_0.21-8     vctrs_0.6.3        tools_4.3.1       
 [7] crosstalk_1.2.0    LearnBayes_2.15.1  generics_0.1.3    
[10] parallel_4.3.1     stats4_4.3.1       sandwich_3.0-2    
[13] tibble_3.2.1       proxy_0.4-27       fansi_1.0.4       
[16] pkgconfig_2.0.3    KernSmooth_2.23-21 satellite_1.0.4   
[19] RColorBrewer_1.1-3 leaflet_2.1.2      webshot_0.5.4     
[22] lifecycle_1.0.3    compiler_4.3.1     deldir_1.0-9      
[25] munsell_0.5.0      terra_1.7-29       leafsync_0.1.0    
[28] codetools_0.2-19   stars_0.6-1        htmltools_0.5.5   
[31] class_7.3-22       pillar_1.9.0       MASS_7.3-60       
[34] classInt_0.4-9     lwgeom_0.2-13      wk_0.7.3          
[37] abind_1.4-5        boot_1.3-28.1      multcomp_1.4-24   
[40] nlme_3.1-162       tidyselect_1.2.0   digest_0.6.31     
[43] mvtnorm_1.2-2      dplyr_1.1.2        splines_4.3.1     
[46] fastmap_1.1.1      grid_4.3.1         colorspace_2.1-0  
[49] expm_0.999-7       cli_3.6.1          magrittr_2.0.3    
[52] base64enc_0.1-3    dichromat_2.0-0.1  XML_3.99-0.14     
[55] survival_3.5-5     utf8_1.2.3         TH.data_1.1-2     
[58] leafem_0.2.0       e1071_1.7-13       scales_1.2.1      
[61] sp_1.6-1           rmarkdown_2.22     zoo_1.8-12        
[64] png_0.1-8          coda_0.19-4        evaluate_0.21     
[67] knitr_1.43         tmaptools_3.1-1    s2_1.1.4          
[70] rlang_1.1.1        Rcpp_1.0.10        glue_1.6.2        
[73] DBI_1.1.3          jsonlite_1.8.5     R6_2.5.1          
[76] units_0.8-2       

Reload data from pervious session

load("_data/msoa2_spatial.RData")

Note that most of the spatial model specifications can not be estimated by Least Squares (LS), as using (constrained) LS estimators for models containing a spatially lagged dependent variable or disturbance leads to inconsistent results (Anselin and Bera 1998; Franzese and Hays 2007). However, an extensive amount of econometric literature discusses different estimation methods based on (quasi-) maximum likelihood (Anselin 1988; Lee 2004; Ord 1975) or instrumental variable approaches using generalized methods of moments (Drukker, Egger, and Prucha 2013; Kelejian and Prucha 1998, 2010), in which the endogenous lagged variables can be instrumented by \(q\) higher order lags of the exogenous regressors \(({\boldsymbol{\mathbf{X}}}, {\boldsymbol{\mathbf{W}} \boldsymbol{\mathbf{X}}}, {\boldsymbol{\mathbf{W}}^2 \boldsymbol{\mathbf{X}}},..., {\boldsymbol{\mathbf{W}}^q \boldsymbol{\mathbf{X}}})\) (Kelejian and Prucha 1998).

6.1 Simulataneity bias

Remember what is happening when we estimate a spatial auto-regressive model.

Note the circular process here: My \(X\) influences my \(Y\), which then influences your \(Y\), which then influences my \(Y\) again. We write this as

\[ \begin{equation} {\boldsymbol{\mathbf{y}}}=\alpha{\boldsymbol{\mathbf{\iota}}}+\rho{\boldsymbol{\mathbf{W}}}{\boldsymbol{\mathbf{y}}}+{\boldsymbol{\mathbf{X}}}{\boldsymbol{\mathbf{\beta}}}+ {\boldsymbol{\mathbf{\varepsilon}}}. \end{equation} \]

If we ignore \({\boldsymbol{\mathbf{X}}}{\boldsymbol{\mathbf{\beta}}}\) and write the pure auto-regressive term in its reduce form, we get:

\[ \boldsymbol{\mathbf{y}} =\left(\boldsymbol{\mathbf{I}}_n - \rho\boldsymbol{\mathbf{W}}\right)^{-1}\varepsilon, \]

and the spatial lag term is

\[ \boldsymbol{\mathbf{W}} \boldsymbol{\mathbf{y}} =\boldsymbol{\mathbf{W}}\left(\boldsymbol{\mathbf{I}}_n - \rho\boldsymbol{\mathbf{W}}\right)^{-1}\varepsilon. \]

The OLS estimator for the spatial lag term then is

\[ \hat{\rho}_{OLS} = \left[\underbrace{\left(\boldsymbol{\mathbf{W}}\boldsymbol{\mathbf{y}} \right)^\top}_{(1\times n)}\underbrace{\left(\boldsymbol{\mathbf{W}}\boldsymbol{\mathbf{y}} \right)}_{(n\times 1)}\right]^{-1}\underbrace{\left(\boldsymbol{\mathbf{W}}\boldsymbol{\mathbf{y}} \right)^\top}_{(1\times n)}\underbrace{\boldsymbol{\mathbf{y}}}_{(n\times 1)}. \]

It can then be shown that the OLS estimators equals

\[ \hat{\rho}_{OLS} = \rho + \left[\left(\boldsymbol{\mathbf{W}}\boldsymbol{\mathbf{y}} \right)^\top\left(\boldsymbol{\mathbf{W}}\boldsymbol{\mathbf{y}} \right)\right]^{-1}\left(\boldsymbol{\mathbf{W}}\boldsymbol{\mathbf{y}} \right)^\top\varepsilon \\ = \rho + \left(\sum_{i = 1}^n \boldsymbol{\mathbf{y}}_{Li}^2\right)^{-1}\left(\sum_{i = 1}^{n}\boldsymbol{\mathbf{y}}_{Li}\epsilon_i\right), \]

with \(\boldsymbol{\mathbf{y}}_{Li}\) defined as the \(i\)th element of the spatial lag operator \(\boldsymbol{\mathbf{W}}\boldsymbol{\mathbf{y}} = \boldsymbol{\mathbf{y}}_L\). It can further be shown that the second part of the equation \(\neq 0\), which demonstrates that OLS gives a biased estimate of \(\rho\) (Franzese and Hays 2007; Sarrias 2023).

Warning

Do not estimate spatial lags of the dependent variable in OLS. It will suffer from simultaneity bias.

6.2 Instrumental variable

A potential way of estimating spatial lag /SAR models is 2SLS (Kelejian and Prucha 1998).

We start with our standard model

\[ {\boldsymbol{\mathbf{y}}}=\alpha{\boldsymbol{\mathbf{\iota}}}+\rho{\boldsymbol{\mathbf{W}}}{\boldsymbol{\mathbf{y}}}+{\boldsymbol{\mathbf{X}}}{\boldsymbol{\mathbf{\beta}}}+ {\boldsymbol{\mathbf{\varepsilon}}}. \]

As we have seen above, there is a problem of simultaneity: the “covariate” \({\boldsymbol{\mathbf{W}}}{\boldsymbol{\mathbf{y}}}\) is endogenous. One way of dealing with this endogeneity problem is the Instrumental Variable approach.

So, the question is what are good instruments \(\boldsymbol{\mathbf{H}}\) for \({\boldsymbol{\mathbf{W}}}{\boldsymbol{\mathbf{y}}}\)? As we have specified the mode, we are sure that \({\boldsymbol{\mathbf{X}}}\) determines \({\boldsymbol{\mathbf{y}}}\). Thus, it must be true that \({\boldsymbol{\mathbf{W}}}{\boldsymbol{\mathbf{X}}}\) and \({\boldsymbol{\mathbf{W}}}^2{\boldsymbol{\mathbf{X}}},\ldots, {\boldsymbol{\mathbf{W}}}^l{\boldsymbol{\mathbf{X}}}\) determines \({\boldsymbol{\mathbf{W}}}{\boldsymbol{\mathbf{y}}}\).

Note that \({\boldsymbol{\mathbf{W}}}^l\) denotes higher orders of \({\boldsymbol{\mathbf{W}}}\). So \({\boldsymbol{\mathbf{W}}}^2\) are the second order neighbours (neighbours of neighbours), and \({\boldsymbol{\mathbf{W}}}^3\) are the third order neighbours (the neighbours of my neighbour’s neighbours), and so on…

We will discuss this in more detail later, but note for now that the reduced form of the SAR always contains a series of higher order neighbours.

\[ ({\boldsymbol{\mathbf{I}}_N}-\rho {\boldsymbol{\mathbf{W}}})^{-1}\beta_k =({\boldsymbol{\mathbf{I}}_N} + \rho{\boldsymbol{\mathbf{W}}} + \rho^2{\boldsymbol{\mathbf{W}}}^2 + \rho^3{\boldsymbol{\mathbf{W}}}^3 + ...)\beta_k = ({\boldsymbol{\mathbf{I}}_N} + \sum_{h=1}^\infty \rho^h{\boldsymbol{\mathbf{W}}}^h)\beta_k . \]

Thus, Kelejian and Prucha (1998) suggested to use a set of lagged covariates as instruments for \(\boldsymbol{\mathbf{W}} \boldsymbol{\mathbf{Y}}\):

\[ \boldsymbol{\mathbf{H}} = \boldsymbol{\mathbf{X}}, \boldsymbol{\mathbf{W}}\boldsymbol{\mathbf{X}}, \boldsymbol{\mathbf{W}}^2\boldsymbol{\mathbf{X}}, ... , \boldsymbol{\mathbf{W}}^l\boldsymbol{\mathbf{X}}, \]

where \(l\) is a pre-defined number for the higher order neighbours included. In practice, \(l\) is usually restricted to \(l=2\).

This has further been developed by, for instance, using a (truncated) power series as instruments (Kelejian, Prucha, and Yuzefovich 2004):

\[ \boldsymbol{\mathbf{H}} =\left[\boldsymbol{\mathbf{X}}, \boldsymbol{\mathbf{W}}\left(\sum_{l = 1}^{\infty}\rho^{l}\boldsymbol{\mathbf{W}}^l\right)\boldsymbol{\mathbf{X}} \boldsymbol{\mathbf{\beta}}\right]. \]

We can estimate this using the pacakge spatialreg with the function stsls(),

mod_1.sls <- stsls(log(med_house_price) ~ log(no2) + log(POPDEN) + 
                     per_mixed + per_asian + per_black + per_other,  
                   data = msoa.spdf, 
                   listw = queens.lw,
                   robust = TRUE, #  heteroskedasticity robust SEs
                   W2X = TRUE) # Second order neighbours are included as instruments (else only first)
summary(mod_1.sls)

Call:
stsls(formula = log(med_house_price) ~ log(no2) + log(POPDEN) + 
    per_mixed + per_asian + per_black + per_other, data = msoa.spdf, 
    listw = queens.lw, robust = TRUE, W2X = TRUE)

Residuals:
       Min         1Q     Median         3Q        Max 
-0.5464924 -0.1238002 -0.0052299  0.0989150  1.0793093 

Coefficients: 
               Estimate HC0 std. Error z value  Pr(>|z|)
Rho          0.71004211     0.04678235 15.1776 < 2.2e-16
(Intercept)  2.73582523     0.50997823  5.3646 8.113e-08
log(no2)     0.37752751     0.04920257  7.6729 1.688e-14
log(POPDEN) -0.05710992     0.01684036 -3.3913 0.0006957
per_mixed    0.01634307     0.00588488  2.7771 0.0054842
per_asian   -0.00205426     0.00045905 -4.4750 7.640e-06
per_black   -0.01166456     0.00128557 -9.0734 < 2.2e-16
per_other   -0.00280423     0.00332302 -0.8439 0.3987377

Residual variance (sigma squared): 0.035213, (sigma: 0.18765)

6.3 Generalized Method of Moments

Generalized Method of Moments (GMM) provides a way of estimating spatial error / SEM models. A motivation for GMM was that Maximum Likelihood was unfeasible for large samples and its consistent could not be shown. Kelejian and Prucha (1999) thus proposed a Moments estimator for SEM.

We start with the model

\[ \begin{equation} \begin{split} {\boldsymbol{\mathbf{y}}}&=\alpha{\boldsymbol{\mathbf{\iota}}}+{\boldsymbol{\mathbf{X}}}{\boldsymbol{\mathbf{\beta}}}+{\boldsymbol{\mathbf{u}}},\\ {\boldsymbol{\mathbf{u}}}&=\lambda{\boldsymbol{\mathbf{W}}}{\boldsymbol{\mathbf{u}}}+{\boldsymbol{\mathbf{\varepsilon}}} \end{split} \end{equation} \]

The key issue here is to find a consistent estimator for \(\lambda\). However, we usually do not want to draw inference about \(\lambda\) itself, but only need it to consistently estimate \(\boldsymbol{\mathbf{\beta}}\). Kelejian and Prucha (1999) thus treat \(\lambda\) as pure nuisance paramter.

In essence, the GMM works as follows (Sarrias 2023):

  1. First of all obtain a consistent estimate of \(\boldsymbol{\mathbf{\beta}}\), say \(\widetilde{\boldsymbol{\mathbf{\beta}}}\) using either OLS or non-linear least squares (NLS).

  2. Use this estimate to obtain an estimate of \(\boldsymbol{\mathbf{u}}\), say \(\widehat{\boldsymbol{\mathbf{u}}}\),

  3. Use \(\widehat{\boldsymbol{\mathbf{u}}}\), to estimate \(\lambda\), say \(\widehat{\lambda}\), using

\[ (\widehat{\lambda}_{NLS, n}, \widehat{\sigma}^2_{NLS, N}) = \mathrm{argmin} \left\lbrace \boldsymbol{\mathbf{\upsilon}}_n(\lambda, \sigma^2)^\top\boldsymbol{\mathbf{\upsilon}}_n(\lambda, \sigma^2): \rho \in [-a, a], \sigma^2\in [0, b]\right\rbrace, \]

  1. Estimate \(\boldsymbol{\mathbf{\beta}}\) using Equation

\[ \begin{split} \boldsymbol{\mathbf{\beta}}_{FGLS}(\lambda) &=\left[\boldsymbol{\mathbf{X}}^\top\boldsymbol{\mathbf{\Omega}}(\widehat{\lambda})^{-1}\boldsymbol{\mathbf{X}}\right]^{-1}\boldsymbol{\mathbf{X}}^\top\boldsymbol{\mathbf{\Omega}}(\widehat{\lambda})^{-1}\boldsymbol{\mathbf{y}}.\\ \boldsymbol{\mathbf{\Omega}}(\lambda) &= (\boldsymbol{\mathbf{I}} - \lambda\boldsymbol{\mathbf{W}})^{-1}(\boldsymbol{\mathbf{I}} - \lambda\boldsymbol{\mathbf{W}}^\top)^{-1} \end{split} \]

For more, see for instance Kelejian and Piras (2017), chapter 2.2.4 or Sarrias (2023).

We can calculate the estimator using GMerrorsar() from spatialreg.

mod_1.gmm <- GMerrorsar(log(med_house_price) ~ log(no2) + log(POPDEN) + 
                     per_mixed + per_asian + per_black + per_other,  
                   data = msoa.spdf, 
                   listw = queens.lw,
                   se.lambda = TRUE) # Provide standard error for lambda
summary(mod_1.gmm)

Call:
GMerrorsar(formula = log(med_house_price) ~ log(no2) + log(POPDEN) + 
    per_mixed + per_asian + per_black + per_other, data = msoa.spdf, 
    listw = queens.lw, se.lambda = TRUE)

Residuals:
       Min         1Q     Median         3Q        Max 
-0.8275979 -0.1840855 -0.0096616  0.1610019  1.2270026 

Type: GM SAR estimator
Coefficients: (GM standard errors) 
               Estimate  Std. Error  z value  Pr(>|z|)
(Intercept) 11.07612114  0.26596129  41.6456 < 2.2e-16
log(no2)     0.67758095  0.08620995   7.8597 3.775e-15
log(POPDEN) -0.08006377  0.01464953  -5.4653 4.622e-08
per_mixed   -0.01307831  0.00894766  -1.4616    0.1438
per_asian   -0.00521983  0.00090937  -5.7400 9.465e-09
per_black   -0.01957288  0.00134527 -14.5494 < 2.2e-16
per_other   -0.00521695  0.00489760  -1.0652    0.2868

Lambda: 0.69344 (standard error): 0.071248 (z-value): 9.7328
Residual variance (sigma squared): 0.037126, (sigma: 0.19268)
GM argmin sigma squared: 0.037239
Number of observations: 983 
Number of parameters estimated: 9 

6.4 Maximum likelihood estimation

6.4.1 ML SAR

Maximum Likelihood estimation of spatial models is the most common way of estimation. The procedure to estimate Sar models via ML is based on Ord (1975) and Anselin (1988).

Starting with

\[ \begin{split} \boldsymbol{\mathbf{y}} = \rho \boldsymbol{\mathbf{W}}\boldsymbol{\mathbf{y}} + \boldsymbol{\mathbf{X}}\boldsymbol{\mathbf{\beta }}+ \varepsilon, \\ \varepsilon \sim \mathcal{N}(\boldsymbol{\mathbf{0}}_n , \sigma^2\boldsymbol{\mathbf{I}}_n), \end{split} \]

and its reduced form

\[ \begin{split} {\boldsymbol{\mathbf{y}}} =({\boldsymbol{\mathbf{I}}_N}-\rho {\boldsymbol{\mathbf{W}}})^{-1}({\boldsymbol{\mathbf{X}}}{\boldsymbol{\mathbf{\beta}}}+ {\boldsymbol{\mathbf{\varepsilon}}}), \\ {\boldsymbol{\mathbf{y}}} =\boldsymbol{\mathbf{A}}^{-1}({\boldsymbol{\mathbf{X}}}{\boldsymbol{\mathbf{\beta}}}+ {\boldsymbol{\mathbf{\varepsilon}}}), \end{split} \]

where \(\boldsymbol{\mathbf{A}} = ({\boldsymbol{\mathbf{I}}_N}-\rho {\boldsymbol{\mathbf{W}}})\).

The ML estimator then choses the parameters \(\hat\rho\), \(\hat{\boldsymbol{\mathbf{\beta}}}\), and \(\hat\sigma\) to maximize the probability of fitting the observed sample based on the Likelihood function

\[ \begin{split} \mathcal{L} (\boldsymbol{\mathbf{\theta}}) &= \log\left| \boldsymbol{\mathbf{A}}\right| - \frac{n\log(2\pi)}{2} - \frac{n\log(\sigma^2)}{2} - \frac{1}{2\sigma^2}(\boldsymbol{\mathbf{A}}\boldsymbol{\mathbf{y}}-\boldsymbol{\mathbf{X}}\boldsymbol{\mathbf{\beta}})^\top (\boldsymbol{\mathbf{A}}\boldsymbol{\mathbf{y}}-\boldsymbol{\mathbf{X}}\boldsymbol{\mathbf{\beta}}) \\ &= \log\left| \boldsymbol{\mathbf{A}}\right| - \frac{n\log(2\pi)}{2} - \frac{n\log(\sigma^2)}{2} - \frac{1}{2\sigma^2}\left[\boldsymbol{\mathbf{y}}^\top \boldsymbol{\mathbf{A}}^\top\boldsymbol{\mathbf{A}}\boldsymbol{\mathbf{y}} - 2\left(\boldsymbol{\mathbf{A}}\boldsymbol{\mathbf{y}}\right)^\top\boldsymbol{\mathbf{X}}\boldsymbol{\mathbf{\beta }}+ \boldsymbol{\mathbf{\beta}}^\top\boldsymbol{\mathbf{X}}^\top\boldsymbol{\mathbf{X}}\boldsymbol{\mathbf{\beta}}\right], \end{split} \]

ML estimation of the SAR works as follows Sarrias (2023):

  1. Perform the two auxiliary regression of \(\boldsymbol{\mathbf{y}}\) and \(\boldsymbol{\mathbf{W}}\boldsymbol{\mathbf{y}}\) on \(\boldsymbol{\mathbf{X}}\) to obtain the estimators \(\widehat{\boldsymbol{\mathbf{\beta}}}_O\) and \(\widehat{\boldsymbol{\mathbf{\beta}}}_L\) as in Equation \[ \begin{split} \widehat{\boldsymbol{\mathbf{\beta}}}_{ML}(\rho) &= \left(\boldsymbol{\mathbf{X}}^\top\boldsymbol{\mathbf{X}}\right)^{-1}\boldsymbol{\mathbf{X}}^\top\boldsymbol{\mathbf{y}} - \rho\left(\boldsymbol{\mathbf{X}}^\top\boldsymbol{\mathbf{X}}\right)^{-1}\boldsymbol{\mathbf{X}}^\top\boldsymbol{\mathbf{W}}\boldsymbol{\mathbf{y}}, \\ &= \widehat{\boldsymbol{\mathbf{\beta}}}_O -\rho \widehat{\boldsymbol{\mathbf{\beta}}}_L. \end{split} \]

  2. Use \(\widehat{\boldsymbol{\mathbf{\beta}}}_O\) and \(\widehat{\boldsymbol{\mathbf{\beta}}}_L\) to compute the residuals in Equation \[ \varepsilon_O = \boldsymbol{\mathbf{y}} - \boldsymbol{\mathbf{X}}\widehat{\boldsymbol{\mathbf{\beta}}}_0\,\,\mbox{and} \;\; \varepsilon_L = \boldsymbol{\mathbf{W}}\boldsymbol{\mathbf{y}} - \boldsymbol{\mathbf{X}}\widehat{\boldsymbol{\mathbf{\beta}}_L}. \]

  3. By numerical optimization to obtain an estimate of \(\rho\), maximize the concentrated likelihood given in

\[ \ell(\rho)=-\frac{n}{2}-\frac{n}{2}\log(2\pi) - \frac{n}{2}\log\left[\frac{\left(\varepsilon_O - \rho\varepsilon_L\right)^\top\left(\varepsilon_O - \rho\varepsilon_L\right)}{n}\right] + \log\left|\boldsymbol{\mathbf{I}}_n - \rho\boldsymbol{\mathbf{W}}\right|, \]

  1. Use the estimate of \(\widehat{\rho}\) to plug it back in to the expression for \(\boldsymbol{\mathbf{\beta}}\) and \(\sigma^2\)

\[ \begin{split} \widehat{\boldsymbol{\mathbf{\beta}}}_{ML}(\rho) = \left(\boldsymbol{\mathbf{X}}^\top\boldsymbol{\mathbf{X}}\right)^{-1}\boldsymbol{\mathbf{X}}^\top\boldsymbol{\mathbf{A}}\boldsymbol{\mathbf{y}} \widehat{\sigma}^2_{ML}(\rho) =\\ \frac{\left(\boldsymbol{\mathbf{A}}\boldsymbol{\mathbf{y}} - \boldsymbol{\mathbf{X}}\boldsymbol{\mathbf{\beta}}_{ML}\right)^\top\left(\boldsymbol{\mathbf{A}}\boldsymbol{\mathbf{y}} - \boldsymbol{\mathbf{X}}\boldsymbol{\mathbf{\beta}}_{ML}\right)}{n} \end{split} \]

6.4.2 ML SEM

We can also use ML to estimate the spatial error / SEM model of the form

\[ \begin{equation} \begin{split} {\boldsymbol{\mathbf{y}}}&=\alpha{\boldsymbol{\mathbf{\iota}}}+{\boldsymbol{\mathbf{X}}}{\boldsymbol{\mathbf{\beta}}}+{\boldsymbol{\mathbf{u}}},\\ {\boldsymbol{\mathbf{u}}}&=\lambda{\boldsymbol{\mathbf{W}}}{\boldsymbol{\mathbf{u}}}+{\boldsymbol{\mathbf{\varepsilon}}}\\ \varepsilon &\sim \mathcal{N}(\boldsymbol{\mathbf{0}}_n , \sigma^2\boldsymbol{\mathbf{I}}_n) \end{split} \end{equation} \] Its reduce for is given by

\[ \begin{equation} \begin{split} {\boldsymbol{\mathbf{y}}}&=\alpha{\boldsymbol{\mathbf{\iota}}}+{\boldsymbol{\mathbf{X}}}{\boldsymbol{\mathbf{\beta}}}+({\boldsymbol{\mathbf{I}}_N}-\lambda {\boldsymbol{\mathbf{W}}})^{-1}{\boldsymbol{\mathbf{\varepsilon}}}.\\ {\boldsymbol{\mathbf{y}}}&=\alpha{\boldsymbol{\mathbf{\iota}}}+{\boldsymbol{\mathbf{X}}}{\boldsymbol{\mathbf{\beta}}}+\boldsymbol{\mathbf{B}}^{-1}{\boldsymbol{\mathbf{\varepsilon}}}. \end{split} \end{equation} \]

where \(\boldsymbol{\mathbf{B}} = ({\boldsymbol{\mathbf{I}}_N}-\lambda {\boldsymbol{\mathbf{W}}})\).

Note that the OLS estimate of the SEM model are unbiased – if there is no omitted variable bias! However, even in that case, they are inefficient if \(\lambda \neq 0\).

The log-likelihood function is given by

\[ \begin{split} \ell(\boldsymbol{\mathbf{\theta}}) = - \frac{n}{2}\log(2\pi) - \frac{n}{2}\log(\sigma^2)-\frac{(\boldsymbol{\mathbf{y}} - \boldsymbol{\mathbf{X}}\boldsymbol{\mathbf{\beta}})^\top \boldsymbol{\mathbf{\Omega}}(\lambda) (\boldsymbol{\mathbf{y}} - \boldsymbol{\mathbf{X}}\boldsymbol{\mathbf{\beta}})}{2\sigma^2} + \log\left|\boldsymbol{\mathbf{I}}_n - \lambda \boldsymbol{\mathbf{W}}\right|, \\ \boldsymbol{\mathbf{\Omega}}(\lambda) = \boldsymbol{\mathbf{B}}^\top \boldsymbol{\mathbf{B}} = \left(\boldsymbol{\mathbf{I}}_n-\lambda\boldsymbol{\mathbf{W}}\right)^\top \left(\boldsymbol{\mathbf{I}}_n-\lambda\boldsymbol{\mathbf{W}}\right) \end{split} \]

Based on Anselin and Bera (1998), the ML estimation of SEM follow the procedure (Sarrias 2023):

  1. Carry out an OLS of \(\boldsymbol{\mathbf{B}}\boldsymbol{\mathbf{X}}\) on \(\boldsymbol{\mathbf{B}}\boldsymbol{\mathbf{y}}\); get \(\widehat{\boldsymbol{\mathbf{\beta}}}_{OLS}\)

  2. Compute initial set of residuals \(\widehat{\epsilon}_{OLS} = \boldsymbol{\mathbf{B}}\boldsymbol{\mathbf{y}} - \boldsymbol{\mathbf{B}}\boldsymbol{\mathbf{X}}\widehat{\boldsymbol{\mathbf{\beta}}}_{OLS}\)

  3. Given $_{OLS} $, find \(\widehat{\lambda}\) that maximizes the concentrated likelihood

\[ \ell(\lambda)= \mbox{const} + \frac{n}{2}\log\left[\frac{1}{n}\widehat{\boldsymbol{\mathbf{\varepsilon}}}^\top\boldsymbol{\mathbf{B}}^\top\boldsymbol{\mathbf{B}} \widehat{\boldsymbol{\mathbf{\varepsilon}}}\right] + \log\left|\boldsymbol{\mathbf{B}}\right|. \]

  1. If the convergence criterion is met, proceed, otherwise repeat steps 1, 2 and 3.

  2. Given \(\widehat{\lambda}\), estimate \(\widehat{\boldsymbol{\mathbf{\beta}}}(\lambda)\) by GLS and obtain a new vector of residuals, \(\widehat{\boldsymbol{\mathbf{\varepsilon}}}(\lambda)\)

  3. Given \(\widehat{\boldsymbol{\mathbf{\varepsilon}}}(\lambda)\) and \(\widehat{\lambda}\), estimate \(\widehat{\sigma}(\lambda)\).

The package spatialreg Pebesma and Bivand (2023) provides a series of functions to calculate the ML estimator for all spatial models we have considered.

Table from Pebesma and Bivand (2023):

model model name maximum likelihood estimation function
SEM spatial error errorsarlm(..., Durbin=FALSE)
SEM spatial error spautolm(..., family="SAR")
SDEM spatial Durbin error errorsarlm(..., Durbin=TRUE)
SLM spatial lag lagsarlm(..., Durbin=FALSE)
SDM spatial Durbin lagsarlm(..., Durbin=TRUE)
SAC spatial autoregressive combined sacsarlm(..., Durbin=FALSE)
GNM general nested sacsarlm(..., Durbin=TRUE)

ML SAR

mod_1.sar <- lagsarlm(log(med_house_price) ~ log(no2) + log(POPDEN) + 
                        per_mixed + per_asian + per_black + per_other,  
                      data = msoa.spdf, 
                      listw = queens.lw,
                      Durbin = FALSE) # we could here extend to SDM
summary(mod_1.sar)

Call:
lagsarlm(formula = log(med_house_price) ~ log(no2) + log(POPDEN) + 
    per_mixed + per_asian + per_black + per_other, data = msoa.spdf, 
    listw = queens.lw, Durbin = FALSE)

Residuals:
       Min         1Q     Median         3Q        Max 
-0.5281789 -0.1220524 -0.0099245  0.0992203  1.0936745 

Type: lag 
Coefficients: (asymptotic standard errors) 
               Estimate  Std. Error  z value  Pr(>|z|)
(Intercept)  3.17383180  0.29041604  10.9286 < 2.2e-16
log(no2)     0.39705423  0.04452880   8.9168 < 2.2e-16
log(POPDEN) -0.05583014  0.01242876  -4.4920 7.055e-06
per_mixed    0.01851577  0.00579832   3.1933  0.001407
per_asian   -0.00228346  0.00045876  -4.9775 6.442e-07
per_black   -0.01263650  0.00100282 -12.6009 < 2.2e-16
per_other   -0.00161419  0.00289082  -0.5584  0.576582

Rho: 0.66976, LR test value: 473.23, p-value: < 2.22e-16
Asymptotic standard error: 0.025311
    z-value: 26.461, p-value: < 2.22e-16
Wald statistic: 700.19, p-value: < 2.22e-16

Log likelihood: 196.7203 for lag model
ML residual variance (sigma squared): 0.035402, (sigma: 0.18815)
Number of observations: 983 
Number of parameters estimated: 9 
AIC: -375.44, (AIC for lm: 95.786)
LM test for residual autocorrelation
test value: 8.609, p-value: 0.0033451

ML SEM

mod_1.sem <- errorsarlm(log(med_house_price) ~ log(no2) + log(POPDEN) +
                          per_mixed + per_asian + per_black + per_other,  
                        data = msoa.spdf, 
                        listw = queens.lw,
                        Durbin = FALSE) # we could here extend to SDEM
summary(mod_1.sem)

Call:
errorsarlm(formula = log(med_house_price) ~ log(no2) + log(POPDEN) + 
    per_mixed + per_asian + per_black + per_other, data = msoa.spdf, 
    listw = queens.lw, Durbin = FALSE)

Residuals:
      Min        1Q    Median        3Q       Max 
-0.581785 -0.105218 -0.012758  0.094430  0.913425 

Type: error 
Coefficients: (asymptotic standard errors) 
               Estimate  Std. Error  z value  Pr(>|z|)
(Intercept) 12.92801104  0.35239139  36.6865 < 2.2e-16
log(no2)     0.15735296  0.10880727   1.4462 0.1481317
log(POPDEN) -0.08316270  0.01254315  -6.6301 3.354e-11
per_mixed   -0.03377962  0.00811054  -4.1649 3.115e-05
per_asian   -0.00413115  0.00096849  -4.2656 1.994e-05
per_black   -0.01653816  0.00126741 -13.0488 < 2.2e-16
per_other   -0.01693012  0.00462999  -3.6566 0.0002556

Lambda: 0.88605, LR test value: 623.55, p-value: < 2.22e-16
Asymptotic standard error: 0.015803
    z-value: 56.068, p-value: < 2.22e-16
Wald statistic: 3143.6, p-value: < 2.22e-16

Log likelihood: 271.8839 for error model
ML residual variance (sigma squared): 0.026911, (sigma: 0.16405)
Number of observations: 983 
Number of parameters estimated: 9 
AIC: NA (not available for weighted model), (AIC for lm: 95.786)