8  Exercises II

\[ \newcommand{\Exp}{\mathrm{E}} \newcommand\given[1][]{\:#1\vert\:} \newcommand{\Cov}{\mathrm{Cov}} \newcommand{\Var}{\mathrm{Var}} \newcommand{\rank}{\mathrm{rank}} \newcommand{\bm}[1]{\boldsymbol{\mathbf{#1}}} \]

Required packages

pkgs <- c("sf", "mapview", "spdep", "spatialreg", "tmap", "viridisLite") # note: load spdep first, then spatialreg
lapply(pkgs, require, character.only = TRUE)

Session info

R version 4.4.1 (2024-06-14 ucrt)
Platform: x86_64-w64-mingw32/x64
Running under: Windows 11 x64 (build 22631)

Matrix products: default


locale:
[1] LC_COLLATE=English_United Kingdom.utf8 
[2] LC_CTYPE=English_United Kingdom.utf8   
[3] LC_MONETARY=English_United Kingdom.utf8
[4] LC_NUMERIC=C                           
[5] LC_TIME=English_United Kingdom.utf8    

time zone: Europe/Berlin
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods  
[7] base     

other attached packages:
[1] viridisLite_0.4.2 tmap_3.3-4        spatialreg_1.3-4 
[4] Matrix_1.7-0      spdep_1.3-5       spData_2.3.1     
[7] mapview_2.11.2    sf_1.0-16        

loaded via a namespace (and not attached):
 [1] xfun_0.45          raster_3.6-26      htmlwidgets_1.6.4 
 [4] lattice_0.22-6     tools_4.4.1        crosstalk_1.2.1   
 [7] LearnBayes_2.15.1  parallel_4.4.1     stats4_4.4.1      
[10] sandwich_3.1-0     proxy_0.4-27       KernSmooth_2.23-24
[13] satellite_1.0.5    RColorBrewer_1.1-3 leaflet_2.2.2     
[16] lifecycle_1.0.4    compiler_4.4.1     deldir_2.0-4      
[19] munsell_0.5.1      terra_1.7-78       codetools_0.2-20  
[22] leafsync_0.1.0     stars_0.6-5        htmltools_0.5.8.1 
[25] class_7.3-22       MASS_7.3-60.2      classInt_0.4-10   
[28] lwgeom_0.2-14      wk_0.9.1           abind_1.4-5       
[31] boot_1.3-30        multcomp_1.4-25    nlme_3.1-164      
[34] digest_0.6.35      mvtnorm_1.2-5      splines_4.4.1     
[37] fastmap_1.2.0      grid_4.4.1         colorspace_2.1-0  
[40] cli_3.6.2          magrittr_2.0.3     base64enc_0.1-3   
[43] dichromat_2.0-0.1  XML_3.99-0.16.1    survival_3.6-4    
[46] leafem_0.2.3       TH.data_1.1-2      e1071_1.7-14      
[49] scales_1.3.0       sp_2.1-4           rmarkdown_2.27    
[52] zoo_1.8-12         png_0.1-8          coda_0.19-4.1     
[55] evaluate_0.24.0    knitr_1.47         tmaptools_3.1-1   
[58] s2_1.1.6           rlang_1.1.4        Rcpp_1.0.12       
[61] glue_1.7.0         DBI_1.2.3          rstudioapi_0.16.0 
[64] jsonlite_1.8.8     R6_2.5.1           units_0.8-5       

Reload data from pervious session

load("_data/msoa2_spatial.RData")

8.1 Environmental inequality

How would you investigate the following descriptive research question: Are immigrant minorities in London exposed to higher levels of pollution? Also consider the spatial structure. What’s your dependent and what is your independent variable?

1) Define a neigbours weights object of your choice

Assume a typical neighbourhood would be 2.5km in diameter

coords <- st_centroid(msoa.spdf)
Warning: st_centroid assumes attributes are constant over
geometries
# Neighbours within 3km distance
dist_15.nb <- dnearneigh(coords, d1 = 0, d2 = 2500)

summary(dist_15.nb)
Neighbour list object:
Number of regions: 983 
Number of nonzero links: 15266 
Percentage nonzero weights: 1.579859 
Average number of links: 15.53001 
4 regions with no links:
158 463 478 505
6 disjoint connected subgraphs
Link number distribution:

 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 
 4  5  9 23 19 26 36 31 53 39 61 63 59 48 42 35 24 31 28 30 27 26 
22 23 24 25 26 27 28 29 30 31 32 33 34 
25 19 38 29 32 38 26 16 20 10  8  1  2 
5 least connected regions:
160 469 474 597 959 with 1 link
2 most connected regions:
565 567 with 34 links
# There are some empty neighbour sets. Lets impute those with the nearest neighbour.
k2.nb <- knearneigh(coords, k = 1)

# Replace zero
nolink_ids <- which(card(dist_15.nb) == 0)
dist_15.nb[card(dist_15.nb) == 0] <- k2.nb$nn[nolink_ids, ]

summary(dist_15.nb)
Neighbour list object:
Number of regions: 983 
Number of nonzero links: 15270 
Percentage nonzero weights: 1.580273 
Average number of links: 15.53408 
2 disjoint connected subgraphs
Link number distribution:

 1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 
 9  9 23 19 26 36 31 53 39 61 63 59 48 42 35 24 31 28 30 27 26 25 
23 24 25 26 27 28 29 30 31 32 33 34 
19 38 29 32 38 26 16 20 10  8  1  2 
9 least connected regions:
158 160 463 469 474 478 505 597 959 with 1 link
2 most connected regions:
565 567 with 34 links
# listw object with row-normalization
dist_15.lw <- nb2listw(dist_15.nb, style = "W")

2) Estimate the extent of spatial auto-correlation in air pollution

moran.test(msoa.spdf$no2, listw = dist_15.lw)

    Moran I test under randomisation

data:  msoa.spdf$no2  
weights: dist_15.lw    

Moran I statistic standard deviate = 65.197, p-value <
2.2e-16
alternative hypothesis: greater
sample estimates:
Moran I statistic       Expectation          Variance 
      0.891520698      -0.001018330       0.000187411 

3) Estimate a Spatial SAR regression model

mod_1.sar <- lagsarlm(log(no2) ~ per_mixed + per_asian + per_black + per_other
                      + per_nonUK_EU + per_nonEU  + log(POPDEN),  
                      data = msoa.spdf, 
                      listw = dist_15.lw,
                      Durbin = FALSE) # we could here extend to SDM
summary(mod_1.sar)

Call:
lagsarlm(formula = log(no2) ~ per_mixed + per_asian + per_black + 
    per_other + per_nonUK_EU + per_nonEU + log(POPDEN), data = msoa.spdf, 
    listw = dist_15.lw, Durbin = FALSE)

Residuals:
       Min         1Q     Median         3Q        Max 
-0.2140485 -0.0267085 -0.0021421  0.0238337  0.3505513 

Type: lag 
Coefficients: (asymptotic standard errors) 
                Estimate  Std. Error z value  Pr(>|z|)
(Intercept)  -1.7004e-02  1.8122e-02 -0.9383  0.348110
per_mixed     3.4376e-04  1.4758e-03  0.2329  0.815810
per_asian    -8.5205e-05  1.1494e-04 -0.7413  0.458507
per_black    -4.2754e-04  2.3468e-04 -1.8218  0.068484
per_other     1.9693e-03  7.4939e-04  2.6279  0.008591
per_nonUK_EU  8.9027e-04  3.9638e-04  2.2460  0.024703
per_nonEU     1.8460e-03  3.5159e-04  5.2506 1.516e-07
log(POPDEN)   1.8650e-02  2.7852e-03  6.6963 2.138e-11

Rho: 0.9684, LR test value: 2002.5, p-value: < 2.22e-16
Asymptotic standard error: 0.0063124
    z-value: 153.41, p-value: < 2.22e-16
Wald statistic: 23535, p-value: < 2.22e-16

Log likelihood: 1562.401 for lag model
ML residual variance (sigma squared): 0.0020568, (sigma: 0.045352)
Number of observations: 983 
Number of parameters estimated: 10 
AIC: -3104.8, (AIC for lm: -1104.3)
LM test for residual autocorrelation
test value: 108.97, p-value: < 2.22e-16

4) Estimate a Spatial SEM regression model

mod_1.sem <- errorsarlm(log(no2) ~ per_mixed + per_asian + per_black + per_other
                      + per_nonUK_EU + per_nonEU  + log(POPDEN),  
                      data = msoa.spdf, 
                      listw = dist_15.lw,
                      Durbin = FALSE) # we could here extend to SDEM
summary(mod_1.sem)

Call:
errorsarlm(formula = log(no2) ~ per_mixed + per_asian + per_black + 
    per_other + per_nonUK_EU + per_nonEU + log(POPDEN), data = msoa.spdf, 
    listw = dist_15.lw, Durbin = FALSE)

Residuals:
      Min        1Q    Median        3Q       Max 
-0.207751 -0.027321 -0.004065  0.023338  0.353188 

Type: error 
Coefficients: (asymptotic standard errors) 
                Estimate  Std. Error z value  Pr(>|z|)
(Intercept)   2.52996667  0.36112709  7.0058 2.457e-12
per_mixed     0.00368862  0.00222547  1.6575  0.097427
per_asian    -0.00019513  0.00024168 -0.8074  0.419428
per_black    -0.00041368  0.00034777 -1.1895  0.234235
per_other     0.00295571  0.00114515  2.5811  0.009849
per_nonUK_EU  0.00154967  0.00059535  2.6030  0.009243
per_nonEU     0.00097237  0.00053695  1.8109  0.070153
log(POPDEN)   0.02738580  0.00326415  8.3899 < 2.2e-16

Lambda: 0.99598, LR test value: 1964.9, p-value: < 2.22e-16
Asymptotic standard error: 0.0018225
    z-value: 546.48, p-value: < 2.22e-16
Wald statistic: 298640, p-value: < 2.22e-16

Log likelihood: 1543.617 for error model
ML residual variance (sigma squared): 0.0020635, (sigma: 0.045426)
Number of observations: 983 
Number of parameters estimated: 10 
AIC: -3067.2, (AIC for lm: -1104.3)

5) Estimate a Spatial SLX regression model

mod_1.slx <- lmSLX(log(no2) ~ per_mixed + per_asian + per_black + per_other
                      + per_nonUK_EU + per_nonEU  + log(POPDEN),  
                      data = msoa.spdf, 
                      listw = dist_15.lw)
summary(mod_1.slx)

Call:
lm(formula = formula(paste("y ~ ", paste(colnames(x)[-1], collapse = "+"))), 
    data = as.data.frame(x), weights = weights)

Coefficients:
                  Estimate     Std. Error   t value    
(Intercept)         1.943e+00    4.115e-02    4.720e+01
per_mixed           4.910e-03    5.282e-03    9.296e-01
per_asian          -1.357e-03    5.779e-04   -2.348e+00
per_black          -1.421e-03    8.321e-04   -1.708e+00
per_other           4.510e-05    2.726e-03    1.655e-02
per_nonUK_EU        1.821e-04    1.436e-03    1.269e-01
per_nonEU           1.605e-03    1.272e-03    1.262e+00
log.POPDEN.         4.202e-02    7.736e-03    5.432e+00
lag.per_mixed      -1.251e-02    7.626e-03   -1.641e+00
lag.per_asian       1.080e-03    6.956e-04    1.552e+00
lag.per_black      -1.395e-03    1.306e-03   -1.069e+00
lag.per_other       1.915e-02    4.229e-03    4.528e+00
lag.per_nonUK_EU    1.038e-02    2.406e-03    4.314e+00
lag.per_nonEU       7.459e-03    1.891e-03    3.945e+00
lag.log.POPDEN.     2.332e-01    1.400e-02    1.666e+01
                  Pr(>|t|)   
(Intercept)        2.535e-253
per_mixed           3.528e-01
per_asian           1.907e-02
per_black           8.795e-02
per_other           9.868e-01
per_nonUK_EU        8.991e-01
per_nonEU           2.073e-01
log.POPDEN.         7.064e-08
lag.per_mixed       1.011e-01
lag.per_asian       1.210e-01
lag.per_black       2.856e-01
lag.per_other       6.707e-06
lag.per_nonUK_EU    1.767e-05
lag.per_nonEU       8.560e-05
lag.log.POPDEN.     5.786e-55

6) Estimate a Spatial Durbin regression model

mod_1.dub <- lagsarlm(log(no2) ~ per_mixed + per_asian + per_black + per_other
                      + per_nonUK_EU + per_nonEU  + log(POPDEN),  
                      data = msoa.spdf, 
                      listw = dist_15.lw,
                      Durbin = TRUE) # Extend here to Durbin
summary(mod_1.dub)

Call:
lagsarlm(formula = log(no2) ~ per_mixed + per_asian + per_black + 
    per_other + per_nonUK_EU + per_nonEU + log(POPDEN), data = msoa.spdf, 
    listw = dist_15.lw, Durbin = TRUE)

Residuals:
       Min         1Q     Median         3Q        Max 
-0.1854009 -0.0263818 -0.0020816  0.0229647  0.3321974 

Type: mixed 
Coefficients: (asymptotic standard errors) 
                    Estimate  Std. Error z value  Pr(>|z|)
(Intercept)      -0.00409824  0.01983728 -0.2066   0.83633
per_mixed         0.00434535  0.00218712  1.9868   0.04695
per_asian        -0.00028620  0.00023959 -1.1945   0.23227
per_black        -0.00056734  0.00034455 -1.6466   0.09964
per_other         0.00222708  0.00112918  1.9723   0.04857
per_nonUK_EU      0.00085417  0.00059478  1.4361   0.15097
per_nonEU         0.00095220  0.00052681  1.8075   0.07069
log(POPDEN)       0.02649122  0.00320358  8.2693 2.220e-16
lag.per_mixed    -0.00475294  0.00315799 -1.5051   0.13231
lag.per_asian     0.00024092  0.00028983  0.8312   0.40584
lag.per_black     0.00025812  0.00054125  0.4769   0.63344
lag.per_other    -0.00074506  0.00176141 -0.4230   0.67230
lag.per_nonUK_EU  0.00094549  0.00100320  0.9425   0.34595
lag.per_nonEU     0.00130970  0.00078547  1.6674   0.09544
lag.log(POPDEN)  -0.02526415  0.00588517 -4.2928 1.764e-05

Rho: 0.98286, LR test value: 1536.9, p-value: < 2.22e-16
Asymptotic standard error: 0.0051804
    z-value: 189.73, p-value: < 2.22e-16
Wald statistic: 35997, p-value: < 2.22e-16

Log likelihood: 1576.566 for mixed model
ML residual variance (sigma squared): 0.001969, (sigma: 0.044374)
Number of observations: 983 
Number of parameters estimated: 17 
AIC: -3119.1, (AIC for lm: -1584.3)
LM test for residual autocorrelation
test value: 103.97, p-value: < 2.22e-16

7) Estimate a Spatial Durbin Error regression model

mod_1.dube <- errorsarlm(log(no2) ~ per_mixed + per_asian + per_black + per_other
                      + per_nonUK_EU + per_nonEU  + log(POPDEN),  
                      data = msoa.spdf, 
                      listw = dist_15.lw,
                      Durbin = TRUE) # Extend here to Durbin
summary(mod_1.dube)

Call:
errorsarlm(formula = log(no2) ~ per_mixed + per_asian + per_black + 
    per_other + per_nonUK_EU + per_nonEU + log(POPDEN), data = msoa.spdf, 
    listw = dist_15.lw, Durbin = TRUE)

Residuals:
       Min         1Q     Median         3Q        Max 
-0.1839285 -0.0254426 -0.0027042  0.0216084  0.2944840 

Type: error 
Coefficients: (asymptotic standard errors) 
                    Estimate  Std. Error z value  Pr(>|z|)
(Intercept)       2.64939215  0.24748370 10.7053 < 2.2e-16
per_mixed         0.00553333  0.00223688  2.4737  0.013373
per_asian        -0.00017156  0.00024183 -0.7094  0.478062
per_black        -0.00057947  0.00034426 -1.6832  0.092334
per_other         0.00203392  0.00112534  1.8074  0.070701
per_nonUK_EU      0.00086254  0.00058902  1.4644  0.143091
per_nonEU         0.00135822  0.00053480  2.5397  0.011096
log(POPDEN)       0.02716824  0.00354239  7.6695 1.732e-14
lag.per_mixed     0.00107140  0.00819322  0.1308  0.895960
lag.per_asian    -0.00060616  0.00070080 -0.8649  0.387069
lag.per_black    -0.00191733  0.00130997 -1.4636  0.143291
lag.per_other     0.01014125  0.00496979  2.0406  0.041293
lag.per_nonUK_EU  0.00925620  0.00217624  4.2533 2.106e-05
lag.per_nonEU     0.00563564  0.00185541  3.0374  0.002386
lag.log(POPDEN)  -0.01370891  0.01128957 -1.2143  0.224634

Lambda: 0.99424, LR test value: 1527.8, p-value: < 2.22e-16
Asymptotic standard error: 0.0024921
    z-value: 398.96, p-value: < 2.22e-16
Wald statistic: 159170, p-value: < 2.22e-16

Log likelihood: 1572.051 for error model
ML residual variance (sigma squared): 0.0019546, (sigma: 0.044211)
Number of observations: 983 
Number of parameters estimated: 17 
AIC: -3110.1, (AIC for lm: -1584.3)

8.1.1 8) Sneak preview on tomorrow: Which of the spatial model specifications about would you choose / prefer in a real world example?

8.1.2 9) Please calculate the spatially lagged value of the median house price.

# Use lag.listw to lag variable
splag <- lag.listw(dist_15.lw,
                   var = msoa.spdf$med_house_price)

msoa.spdf$w.med_house_price <- splag

# Compare
summary(msoa.spdf$med_house_price)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 126494  224406  264188  315562  357937 1601000 
summary(msoa.spdf$w.med_house_price)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 160320  233656  285488  314505  352936  794943 

8.1.3 10) Can you use the results of the previous task to run a non-linear SLX model, where you predict if an MSOA is within the ulez zone based on the house prices? Can you make sense of the result?

### Calculate SLX logit
mod_2.log <- glm(ulez ~ med_house_price + w.med_house_price,
                 data = msoa.spdf,
                 family = "binomial")
summary(mod_2.log)

Call:
glm(formula = ulez ~ med_house_price + w.med_house_price, family = "binomial", 
    data = msoa.spdf)

Coefficients:
                    Estimate Std. Error z value Pr(>|z|)    
(Intercept)       -6.029e+00  5.704e-01 -10.570  < 2e-16 ***
med_house_price    1.572e-06  1.179e-06   1.333  0.18248    
w.med_house_price  5.107e-06  1.955e-06   2.612  0.00899 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 254.47  on 982  degrees of freedom
Residual deviance: 221.56  on 980  degrees of freedom
AIC: 227.56

Number of Fisher Scoring iterations: 7
### Distance to city center
# Define centre
centre <- st_as_sf(data.frame(lon = -0.128120855701165, 
                              lat = 51.50725909644806),
                   coords = c("lon", "lat"), 
                   crs = 4326)
# Reproject
centre <- st_transform(centre, crs = st_crs(msoa.spdf))
# Calculate distance
msoa.spdf$dist_centre <- as.numeric(st_distance(msoa.spdf, centre)) / 1000
# hist(msoa.spdf$dist_centre)

### Add distance to city centre
mod_2.log <- glm(ulez ~ med_house_price + w.med_house_price + dist_centre,
                 data = msoa.spdf,
                 family = "binomial")
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
summary(mod_2.log)

Call:
glm(formula = ulez ~ med_house_price + w.med_house_price + dist_centre, 
    family = "binomial", data = msoa.spdf)

Coefficients:
                    Estimate Std. Error z value Pr(>|z|)    
(Intercept)        1.228e+01  3.042e+00   4.038 5.38e-05 ***
med_house_price    7.024e-06  2.981e-06   2.357   0.0184 *  
w.med_house_price -1.883e-05  6.177e-06  -3.048   0.0023 ** 
dist_centre       -3.170e+00  6.769e-01  -4.684 2.82e-06 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 254.465  on 982  degrees of freedom
Residual deviance:  50.485  on 979  degrees of freedom
AIC: 58.485

Number of Fisher Scoring iterations: 12