9 Exercises II

\[ \newcommand{\Exp}{\mathrm{E}} \newcommand\given[1][]{\:#1\vert\:} \newcommand{\Cov}{\mathrm{Cov}} \newcommand{\Var}{\mathrm{Var}} \newcommand{\rank}{\mathrm{rank}} \newcommand{\bm}[1]{\boldsymbol{\mathbf{#1}}} \]

Required packages

pkgs <- c("sf", "mapview", "spdep", "spatialreg", "tmap", "viridisLite") # note: load spdep first, then spatialreg
lapply(pkgs, require, character.only = TRUE)

Session info

sessionInfo()

R version 4.5.1 (2025-06-13 ucrt)
Platform: x86_64-w64-mingw32/x64
Running under: Windows 11 x64 (build 22631)

Matrix products: default
  LAPACK version 3.12.1

locale:
[1] LC_COLLATE=English_United Kingdom.utf8 
[2] LC_CTYPE=English_United Kingdom.utf8   
[3] LC_MONETARY=English_United Kingdom.utf8
[4] LC_NUMERIC=C                           
[5] LC_TIME=English_United Kingdom.utf8    

time zone: Europe/Berlin
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods  
[7] base     

other attached packages:
[1] viridisLite_0.4.2 tmap_4.1          spatialreg_1.3-6 
[4] Matrix_1.7-3      spdep_1.3-13      spData_2.3.4     
[7] mapview_2.11.2    sf_1.0-21        

loaded via a namespace (and not attached):
 [1] xfun_0.52               raster_3.6-32          
 [3] htmlwidgets_1.6.4       lattice_0.22-7         
 [5] leaflet.providers_2.0.0 tools_4.5.1            
 [7] crosstalk_1.2.1         LearnBayes_2.15.1      
 [9] parallel_4.5.1          stats4_4.5.1           
[11] sandwich_3.1-1          proxy_0.4-27           
[13] KernSmooth_2.23-26      data.table_1.17.6      
[15] satellite_1.0.5         RColorBrewer_1.1-3     
[17] leaflet_2.2.2           lifecycle_1.0.4        
[19] compiler_4.5.1          farver_2.1.2           
[21] deldir_2.0-4            microbenchmark_1.5.0   
[23] terra_1.8-54            leafsync_0.1.0         
[25] codetools_0.2-20        leaflegend_1.2.1       
[27] stars_0.6-8             htmltools_0.5.8.1      
[29] class_7.3-23            MASS_7.3-65            
[31] classInt_0.4-11         lwgeom_0.2-14          
[33] wk_0.9.4                boot_1.3-31            
[35] abind_1.4-8             multcomp_1.4-28        
[37] nlme_3.1-168            digest_0.6.37          
[39] mvtnorm_1.3-3           splines_4.5.1          
[41] fastmap_1.2.0           grid_4.5.1             
[43] colorspace_2.1-1        cli_3.6.5              
[45] logger_0.4.0            magrittr_2.0.3         
[47] maptiles_0.10.0         base64enc_0.1-3        
[49] XML_3.99-0.18           dichromat_2.0-0.1      
[51] cols4all_0.8            survival_3.8-3         
[53] leafem_0.2.4            TH.data_1.1-3          
[55] e1071_1.7-16            scales_1.4.0           
[57] sp_2.2-0                rmarkdown_2.29         
[59] zoo_1.8-14              png_0.1-8              
[61] coda_0.19-4.1           evaluate_1.0.4         
[63] knitr_1.50              tmaptools_3.2          
[65] s2_1.1.9                rlang_1.1.6            
[67] Rcpp_1.0.14             glue_1.8.0             
[69] DBI_1.2.3               rstudioapi_0.17.1      
[71] jsonlite_2.0.0          R6_2.6.1               
[73] spacesXYZ_1.6-0         units_0.8-7

Reload data from pervious session

load("_data/msoa2_spatial.RData")

9.1 Environmental inequality

We would like to investigate the following descriptive research question: Are immigrant minorities in London exposed to higher levels of pollution? One could for instance do the following: use pollution as the outcome, and test if demographics (e.g. the share of immigrant minorities) predicts the level of pollution. We also want to consider the spatial structure: is it only the focal unit that matters or spatial clusters with high shares of minorities?

1) Define a neigbours weights object of your choice

Let’s assume a typical neighbourhood would be 2.5km in diameter, so we can use distance based neighbours.

coords <- st_centroid(msoa.spdf)

Warning: st_centroid assumes attributes are constant over
geometries

# Neighbours within 3km distance
dist_15.nb <- dnearneigh(coords, d1 = 0, d2 = 2500)

Warning in dnearneigh(coords, d1 = 0, d2 = 2500): neighbour object
has 6 sub-graphs

summary(dist_15.nb)

Neighbour list object:
Number of regions: 983 
Number of nonzero links: 15266 
Percentage nonzero weights: 1.579859 
Average number of links: 15.53001 
4 regions with no links:
158, 463, 478, 505
6 disjoint connected subgraphs
Link number distribution:

 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 
 4  5  9 23 19 26 36 31 53 39 61 63 59 48 42 35 24 31 28 30 27 26 
22 23 24 25 26 27 28 29 30 31 32 33 34 
25 19 38 29 32 38 26 16 20 10  8  1  2 
5 least connected regions:
160 469 474 597 959 with 1 link
2 most connected regions:
565 567 with 34 links

# There are some empty neighbour sets. Lets impute those with the nearest neighbour.
k2.nb <- knearneigh(coords, k = 1)

# Replace zero
nolink_ids <- which(card(dist_15.nb) == 0)
dist_15.nb[card(dist_15.nb) == 0] <- k2.nb$nn[nolink_ids, ]

summary(dist_15.nb)

Neighbour list object:
Number of regions: 983 
Number of nonzero links: 15270 
Percentage nonzero weights: 1.580273 
Average number of links: 15.53408 
6 disjoint connected subgraphs
Link number distribution:

 1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 
 9  9 23 19 26 36 31 53 39 61 63 59 48 42 35 24 31 28 30 27 26 25 
23 24 25 26 27 28 29 30 31 32 33 34 
19 38 29 32 38 26 16 20 10  8  1  2 
9 least connected regions:
158 160 463 469 474 478 505 597 959 with 1 link
2 most connected regions:
565 567 with 34 links

# listw object with row-normalization
dist_15.lw <- nb2listw(dist_15.nb, style = "W")

2) Estimate the extent of spatial auto-correlation in air pollution

moran.test(msoa.spdf$no2, listw = dist_15.lw)


    Moran I test under randomisation

data:  msoa.spdf$no2  
weights: dist_15.lw    

Moran I statistic standard deviate = 65.197, p-value <
2.2e-16
alternative hypothesis: greater
sample estimates:
Moran I statistic       Expectation          Variance 
      0.891520698      -0.001018330       0.000187411

3) Estimate a Spatial SAR regression model

mod_1.sar <- lagsarlm(log(no2) ~ per_mixed + per_asian + per_black + per_other
                      + per_nonUK_EU + per_nonEU  + log(POPDEN),  
                      data = msoa.spdf, 
                      listw = dist_15.lw,
                      Durbin = FALSE) # we could here extend to SDM
summary(mod_1.sar)


Call:
lagsarlm(formula = log(no2) ~ per_mixed + per_asian + per_black + 
    per_other + per_nonUK_EU + per_nonEU + log(POPDEN), data = msoa.spdf, 
    listw = dist_15.lw, Durbin = FALSE)

Residuals:
       Min         1Q     Median         3Q        Max 
-0.2140485 -0.0267085 -0.0021421  0.0238337  0.3505513 

Type: lag 
Coefficients: (asymptotic standard errors) 
                Estimate  Std. Error z value  Pr(>|z|)
(Intercept)  -1.7004e-02  1.8122e-02 -0.9383  0.348110
per_mixed     3.4376e-04  1.4758e-03  0.2329  0.815810
per_asian    -8.5205e-05  1.1494e-04 -0.7413  0.458507
per_black    -4.2754e-04  2.3468e-04 -1.8218  0.068484
per_other     1.9693e-03  7.4939e-04  2.6279  0.008591
per_nonUK_EU  8.9027e-04  3.9638e-04  2.2460  0.024703
per_nonEU     1.8460e-03  3.5159e-04  5.2506 1.516e-07
log(POPDEN)   1.8650e-02  2.7852e-03  6.6963 2.138e-11

Rho: 0.9684, LR test value: 2002.5, p-value: < 2.22e-16
Asymptotic standard error: 0.0063124
    z-value: 153.41, p-value: < 2.22e-16
Wald statistic: 23535, p-value: < 2.22e-16

Log likelihood: 1562.401 for lag model
ML residual variance (sigma squared): 0.0020568, (sigma: 0.045352)
Number of observations: 983 
Number of parameters estimated: 10 
AIC: -3104.8, (AIC for lm: -1104.3)
LM test for residual autocorrelation
test value: 108.97, p-value: < 2.22e-16

4) Estimate a Spatial SEM regression model

mod_1.sem <- errorsarlm(log(no2) ~ per_mixed + per_asian + per_black + per_other
                      + per_nonUK_EU + per_nonEU  + log(POPDEN),  
                      data = msoa.spdf, 
                      listw = dist_15.lw,
                      Durbin = FALSE) # we could here extend to SDEM
summary(mod_1.sem)


Call:
errorsarlm(formula = log(no2) ~ per_mixed + per_asian + per_black + 
    per_other + per_nonUK_EU + per_nonEU + log(POPDEN), data = msoa.spdf, 
    listw = dist_15.lw, Durbin = FALSE)

Residuals:
      Min        1Q    Median        3Q       Max 
-0.207751 -0.027321 -0.004065  0.023338  0.353188 

Type: error 
Coefficients: (asymptotic standard errors) 
                Estimate  Std. Error z value  Pr(>|z|)
(Intercept)   2.52996667  0.36112709  7.0058 2.457e-12
per_mixed     0.00368862  0.00222547  1.6575  0.097427
per_asian    -0.00019513  0.00024168 -0.8074  0.419428
per_black    -0.00041368  0.00034777 -1.1895  0.234235
per_other     0.00295571  0.00114515  2.5811  0.009849
per_nonUK_EU  0.00154967  0.00059535  2.6030  0.009243
per_nonEU     0.00097237  0.00053695  1.8109  0.070153
log(POPDEN)   0.02738580  0.00326415  8.3899 < 2.2e-16

Lambda: 0.99598, LR test value: 1964.9, p-value: < 2.22e-16
Asymptotic standard error: 0.0018225
    z-value: 546.48, p-value: < 2.22e-16
Wald statistic: 298640, p-value: < 2.22e-16

Log likelihood: 1543.617 for error model
ML residual variance (sigma squared): 0.0020635, (sigma: 0.045426)
Number of observations: 983 
Number of parameters estimated: 10 
AIC: -3067.2, (AIC for lm: -1104.3)

5) Estimate a Spatial SLX regression model

mod_1.slx <- lmSLX(log(no2) ~ per_mixed + per_asian + per_black + per_other
                      + per_nonUK_EU + per_nonEU  + log(POPDEN),  
                      data = msoa.spdf, 
                      listw = dist_15.lw)
summary(mod_1.slx)


Call:
lm(formula = formula(paste("y ~ ", paste(colnames(x)[-1], collapse = "+"))), 
    data = as.data.frame(x), weights = weights)

Coefficients:
                  Estimate     Std. Error   t value    
(Intercept)         1.943e+00    4.115e-02    4.720e+01
per_mixed           4.910e-03    5.282e-03    9.296e-01
per_asian          -1.357e-03    5.779e-04   -2.348e+00
per_black          -1.421e-03    8.321e-04   -1.708e+00
per_other           4.510e-05    2.726e-03    1.655e-02
per_nonUK_EU        1.821e-04    1.436e-03    1.269e-01
per_nonEU           1.605e-03    1.272e-03    1.262e+00
log.POPDEN.         4.202e-02    7.736e-03    5.432e+00
lag.per_mixed      -1.251e-02    7.626e-03   -1.641e+00
lag.per_asian       1.080e-03    6.956e-04    1.552e+00
lag.per_black      -1.395e-03    1.306e-03   -1.069e+00
lag.per_other       1.915e-02    4.229e-03    4.528e+00
lag.per_nonUK_EU    1.038e-02    2.406e-03    4.314e+00
lag.per_nonEU       7.459e-03    1.891e-03    3.945e+00
lag.log.POPDEN.     2.332e-01    1.400e-02    1.666e+01
                  Pr(>|t|)   
(Intercept)        2.535e-253
per_mixed           3.528e-01
per_asian           1.907e-02
per_black           8.795e-02
per_other           9.868e-01
per_nonUK_EU        8.991e-01
per_nonEU           2.073e-01
log.POPDEN.         7.064e-08
lag.per_mixed       1.011e-01
lag.per_asian       1.210e-01
lag.per_black       2.856e-01
lag.per_other       6.707e-06
lag.per_nonUK_EU    1.767e-05
lag.per_nonEU       8.560e-05
lag.log.POPDEN.     5.786e-55

6) Estimate a Spatial Durbin regression model

mod_1.dub <- lagsarlm(log(no2) ~ per_mixed + per_asian + per_black + per_other
                      + per_nonUK_EU + per_nonEU  + log(POPDEN),  
                      data = msoa.spdf, 
                      listw = dist_15.lw,
                      Durbin = TRUE) # Extend here to Durbin
summary(mod_1.dub)


Call:
lagsarlm(formula = log(no2) ~ per_mixed + per_asian + per_black + 
    per_other + per_nonUK_EU + per_nonEU + log(POPDEN), data = msoa.spdf, 
    listw = dist_15.lw, Durbin = TRUE)

Residuals:
       Min         1Q     Median         3Q        Max 
-0.1854009 -0.0263818 -0.0020816  0.0229647  0.3321974 

Type: mixed 
Coefficients: (asymptotic standard errors) 
                    Estimate  Std. Error z value  Pr(>|z|)
(Intercept)      -0.00409824  0.01983728 -0.2066   0.83633
per_mixed         0.00434535  0.00218712  1.9868   0.04695
per_asian        -0.00028620  0.00023959 -1.1945   0.23227
per_black        -0.00056734  0.00034455 -1.6466   0.09964
per_other         0.00222708  0.00112918  1.9723   0.04857
per_nonUK_EU      0.00085417  0.00059478  1.4361   0.15097
per_nonEU         0.00095220  0.00052681  1.8075   0.07069
log(POPDEN)       0.02649122  0.00320358  8.2693 2.220e-16
lag.per_mixed    -0.00475294  0.00315799 -1.5051   0.13231
lag.per_asian     0.00024092  0.00028983  0.8312   0.40584
lag.per_black     0.00025812  0.00054125  0.4769   0.63344
lag.per_other    -0.00074506  0.00176141 -0.4230   0.67230
lag.per_nonUK_EU  0.00094549  0.00100320  0.9425   0.34595
lag.per_nonEU     0.00130970  0.00078547  1.6674   0.09544
lag.log(POPDEN)  -0.02526415  0.00588517 -4.2928 1.764e-05

Rho: 0.98286, LR test value: 1536.9, p-value: < 2.22e-16
Asymptotic standard error: 0.0051804
    z-value: 189.73, p-value: < 2.22e-16
Wald statistic: 35997, p-value: < 2.22e-16

Log likelihood: 1576.566 for mixed model
ML residual variance (sigma squared): 0.001969, (sigma: 0.044374)
Number of observations: 983 
Number of parameters estimated: 17 
AIC: -3119.1, (AIC for lm: -1584.3)
LM test for residual autocorrelation
test value: 103.97, p-value: < 2.22e-16

7) Estimate a Spatial Durbin Error regression model

mod_1.dube <- errorsarlm(log(no2) ~ per_mixed + per_asian + per_black + per_other
                      + per_nonUK_EU + per_nonEU  + log(POPDEN),  
                      data = msoa.spdf, 
                      listw = dist_15.lw,
                      Durbin = TRUE) # Extend here to Durbin
summary(mod_1.dube)


Call:
errorsarlm(formula = log(no2) ~ per_mixed + per_asian + per_black + 
    per_other + per_nonUK_EU + per_nonEU + log(POPDEN), data = msoa.spdf, 
    listw = dist_15.lw, Durbin = TRUE)

Residuals:
       Min         1Q     Median         3Q        Max 
-0.1839285 -0.0254426 -0.0027042  0.0216084  0.2944840 

Type: error 
Coefficients: (asymptotic standard errors) 
                    Estimate  Std. Error z value  Pr(>|z|)
(Intercept)       2.64939215  0.24748370 10.7053 < 2.2e-16
per_mixed         0.00553333  0.00223688  2.4737  0.013373
per_asian        -0.00017156  0.00024183 -0.7094  0.478062
per_black        -0.00057947  0.00034426 -1.6832  0.092334
per_other         0.00203392  0.00112534  1.8074  0.070701
per_nonUK_EU      0.00086254  0.00058902  1.4644  0.143091
per_nonEU         0.00135822  0.00053480  2.5397  0.011096
log(POPDEN)       0.02716824  0.00354239  7.6695 1.732e-14
lag.per_mixed     0.00107140  0.00819322  0.1308  0.895960
lag.per_asian    -0.00060616  0.00070080 -0.8649  0.387069
lag.per_black    -0.00191733  0.00130997 -1.4636  0.143291
lag.per_other     0.01014125  0.00496979  2.0406  0.041293
lag.per_nonUK_EU  0.00925620  0.00217624  4.2533 2.106e-05
lag.per_nonEU     0.00563564  0.00185541  3.0374  0.002386
lag.log(POPDEN)  -0.01370891  0.01128957 -1.2143  0.224634

Lambda: 0.99424, LR test value: 1527.8, p-value: < 2.22e-16
Asymptotic standard error: 0.0024921
    z-value: 398.96, p-value: < 2.22e-16
Wald statistic: 159170, p-value: < 2.22e-16

Log likelihood: 1572.051 for error model
ML residual variance (sigma squared): 0.0019546, (sigma: 0.044211)
Number of observations: 983 
Number of parameters estimated: 17 
AIC: -3110.1, (AIC for lm: -1584.3)

8) Sneak preview on tomorrow: Which of the spatial model specifications about would you choose / prefer in a real world example?

9) Please calculate the spatially lagged value of the median house price.

# Use lag.listw to lag variable
splag <- lag.listw(dist_15.lw,
                   var = msoa.spdf$med_house_price)

msoa.spdf$w.med_house_price <- splag

# Compare
summary(msoa.spdf$med_house_price)

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 126494  224406  264188  315562  357937 1601000

summary(msoa.spdf$w.med_house_price)

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 160320  233656  285488  314505  352936  794942

10) Can you use the results of the previous task to run a non-linear SLX model, where you predict if an MSOA is within the ulez zone based on the house prices? Can you make sense of the result?

### Calculate SLX logit
mod_2.log <- glm(ulez ~ med_house_price + w.med_house_price,
                 data = msoa.spdf,
                 family = "binomial")
summary(mod_2.log)


Call:
glm(formula = ulez ~ med_house_price + w.med_house_price, family = "binomial", 
    data = msoa.spdf)

Coefficients:
                    Estimate Std. Error z value Pr(>|z|)    
(Intercept)       -5.920e+00  5.575e-01 -10.619   <2e-16 ***
med_house_price    1.610e-06  1.167e-06   1.379   0.1678    
w.med_house_price  4.902e-06  1.931e-06   2.538   0.0111 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 261.49  on 982  degrees of freedom
Residual deviance: 229.12  on 980  degrees of freedom
AIC: 235.12

Number of Fisher Scoring iterations: 7

### Distance to city center
# Define centre
centre <- st_as_sf(data.frame(lon = -0.128120855701165, 
                              lat = 51.50725909644806),
                   coords = c("lon", "lat"), 
                   crs = 4326)
# Reproject
centre <- st_transform(centre, crs = st_crs(msoa.spdf))
# Calculate distance
msoa.spdf$dist_centre <- as.numeric(st_distance(msoa.spdf, centre)) / 1000
# hist(msoa.spdf$dist_centre)

### Add distance to city centre
mod_2.log <- glm(ulez ~ med_house_price + w.med_house_price + dist_centre,
                 data = msoa.spdf,
                 family = "binomial")

Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

summary(mod_2.log)


Call:
glm(formula = ulez ~ med_house_price + w.med_house_price + dist_centre, 
    family = "binomial", data = msoa.spdf)

Coefficients:
                    Estimate Std. Error z value Pr(>|z|)    
(Intercept)        1.135e+01  2.707e+00   4.195 2.72e-05 ***
med_house_price    6.766e-06  2.832e-06   2.390  0.01686 *  
w.med_house_price -1.810e-05  5.806e-06  -3.117  0.00182 ** 
dist_centre       -2.831e+00  5.652e-01  -5.009 5.47e-07 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 261.488  on 982  degrees of freedom
Residual deviance:  56.872  on 979  degrees of freedom
AIC: 64.872

Number of Fisher Scoring iterations: 12