8 Exercise I

Required packages

pkgs <- c("sf", "mapview", "spdep", "spatialreg", "tmap", "viridisLite") # note: load spdep first, then spatialreg
lapply(pkgs, require, character.only = TRUE)

Session info

sessionInfo()

R version 4.3.1 (2023-06-16 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19044)

Matrix products: default


locale:
[1] LC_COLLATE=English_United Kingdom.utf8 
[2] LC_CTYPE=English_United Kingdom.utf8   
[3] LC_MONETARY=English_United Kingdom.utf8
[4] LC_NUMERIC=C                           
[5] LC_TIME=English_United Kingdom.utf8    

time zone: Europe/London
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods  
[7] base     

other attached packages:
[1] viridisLite_0.4.2 tmap_3.3-3        spatialreg_1.2-9 
[4] Matrix_1.5-4.1    spdep_1.2-8       spData_2.2.2     
[7] mapview_2.11.0    sf_1.0-13        

loaded via a namespace (and not attached):
 [1] xfun_0.39          raster_3.6-20      htmlwidgets_1.6.2 
 [4] lattice_0.21-8     vctrs_0.6.3        tools_4.3.1       
 [7] crosstalk_1.2.0    LearnBayes_2.15.1  generics_0.1.3    
[10] parallel_4.3.1     sandwich_3.0-2     stats4_4.3.1      
[13] tibble_3.2.1       proxy_0.4-27       fansi_1.0.4       
[16] pkgconfig_2.0.3    KernSmooth_2.23-21 satellite_1.0.4   
[19] RColorBrewer_1.1-3 leaflet_2.1.2      webshot_0.5.4     
[22] lifecycle_1.0.3    compiler_4.3.1     deldir_1.0-9      
[25] munsell_0.5.0      terra_1.7-29       leafsync_0.1.0    
[28] codetools_0.2-19   stars_0.6-1        htmltools_0.5.5   
[31] class_7.3-22       pillar_1.9.0       MASS_7.3-60       
[34] classInt_0.4-9     lwgeom_0.2-13      wk_0.7.3          
[37] abind_1.4-5        boot_1.3-28.1      multcomp_1.4-24   
[40] nlme_3.1-162       tidyselect_1.2.0   digest_0.6.31     
[43] mvtnorm_1.2-2      dplyr_1.1.2        splines_4.3.1     
[46] fastmap_1.1.1      grid_4.3.1         colorspace_2.1-0  
[49] expm_0.999-7       cli_3.6.1          magrittr_2.0.3    
[52] base64enc_0.1-3    dichromat_2.0-0.1  XML_3.99-0.14     
[55] survival_3.5-5     utf8_1.2.3         TH.data_1.1-2     
[58] leafem_0.2.0       e1071_1.7-13       scales_1.2.1      
[61] sp_1.6-1           rmarkdown_2.22     zoo_1.8-12        
[64] png_0.1-8          coda_0.19-4        evaluate_0.21     
[67] knitr_1.43         tmaptools_3.1-1    s2_1.1.4          
[70] rlang_1.1.1        Rcpp_1.0.10        glue_1.6.2        
[73] DBI_1.1.3          rstudioapi_0.14    jsonlite_1.8.5    
[76] R6_2.5.1           units_0.8-2

Reload data from pervious session

load("_data/msoa2_spatial.RData")

8.1 Environmental inequality

How would you investigate the following descriptive research question: Are ethnic (and immigrant) minorities in London exposed to higher levels of pollution? Also consider the spatial structure. What’s your dependent and whats your independent variable?

1) Define a neigbours weights object of your choice

Assume a typical neighbourhood would be 1.5km in diameter

coords <- st_centroid(msoa.spdf)

Warning: st_centroid assumes attributes are constant over
geometries

# Neighbours within 3km distance
dist_15.nb <- dnearneigh(coords, d1 = 0, d2 = 2500)

summary(dist_15.nb)

Neighbour list object:
Number of regions: 983 
Number of nonzero links: 15266 
Percentage nonzero weights: 1.579859 
Average number of links: 15.53001 
4 regions with no links:
158 463 478 505
Link number distribution:

 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 
 4  5  9 23 19 26 36 31 53 39 61 63 59 48 42 35 24 31 28 30 27 26 
22 23 24 25 26 27 28 29 30 31 32 33 34 
25 19 38 29 32 38 26 16 20 10  8  1  2 
5 least connected regions:
160 469 474 597 959 with 1 link
2 most connected regions:
565 567 with 34 links

# There are some mpty one. Lets impute with the nearest neighbour
k2.nb <- knearneigh(coords, k = 1)

# Replace zero
nolink_ids <- which(card(dist_15.nb) == 0)
dist_15.nb[card(dist_15.nb) == 0] <- k2.nb$nn[nolink_ids, ]

summary(dist_15.nb)

Neighbour list object:
Number of regions: 983 
Number of nonzero links: 15270 
Percentage nonzero weights: 1.580273 
Average number of links: 15.53408 
Link number distribution:

 1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 
 9  9 23 19 26 36 31 53 39 61 63 59 48 42 35 24 31 28 30 27 26 25 
23 24 25 26 27 28 29 30 31 32 33 34 
19 38 29 32 38 26 16 20 10  8  1  2 
9 least connected regions:
158 160 463 469 474 478 505 597 959 with 1 link
2 most connected regions:
565 567 with 34 links

# listw object with row-normalization
dist_15.lw <- nb2listw(dist_15.nb, style = "W")

2) Estimate the extent of spatial auto-correlation

moran.test(msoa.spdf$no2, listw = dist_15.lw)


    Moran I test under randomisation

data:  msoa.spdf$no2  
weights: dist_15.lw    

Moran I statistic standard deviate = 65.197, p-value <
2.2e-16
alternative hypothesis: greater
sample estimates:
Moran I statistic       Expectation          Variance 
      0.891520698      -0.001018330       0.000187411

3) Estimate a spatial SAR regression model

Estimate a spatial autoregressive SAR model

mod_1.sar <- lagsarlm(log(no2) ~ per_mixed + per_asian + per_black + per_other
                      + per_nonUK_EU + per_nonEU  + log(POPDEN),  
                      data = msoa.spdf, 
                      listw = dist_15.lw,
                      Durbin = FALSE) # we could here extend to SDM
summary(mod_1.sar)


Call:
lagsarlm(formula = log(no2) ~ per_mixed + per_asian + per_black + 
    per_other + per_nonUK_EU + per_nonEU + log(POPDEN), data = msoa.spdf, 
    listw = dist_15.lw, Durbin = FALSE)

Residuals:
       Min         1Q     Median         3Q        Max 
-0.2140485 -0.0267085 -0.0021421  0.0238337  0.3505513 

Type: lag 
Coefficients: (asymptotic standard errors) 
                Estimate  Std. Error z value  Pr(>|z|)
(Intercept)  -1.7004e-02  1.8122e-02 -0.9383  0.348110
per_mixed     3.4376e-04  1.4758e-03  0.2329  0.815810
per_asian    -8.5205e-05  1.1494e-04 -0.7413  0.458507
per_black    -4.2754e-04  2.3468e-04 -1.8218  0.068484
per_other     1.9693e-03  7.4939e-04  2.6279  0.008591
per_nonUK_EU  8.9027e-04  3.9638e-04  2.2460  0.024703
per_nonEU     1.8460e-03  3.5159e-04  5.2506 1.516e-07
log(POPDEN)   1.8650e-02  2.7852e-03  6.6963 2.138e-11

Rho: 0.9684, LR test value: 2002.5, p-value: < 2.22e-16
Asymptotic standard error: 0.0063124
    z-value: 153.41, p-value: < 2.22e-16
Wald statistic: 23535, p-value: < 2.22e-16

Log likelihood: 1562.401 for lag model
ML residual variance (sigma squared): 0.0020568, (sigma: 0.045352)
Number of observations: 983 
Number of parameters estimated: 10 
AIC: -3104.8, (AIC for lm: -1104.3)
LM test for residual autocorrelation
test value: 108.97, p-value: < 2.22e-16

Have a look into the true multiplier matrix \(({\boldsymbol{\mathbf{I}}_N}-\rho {\boldsymbol{\mathbf{W}}})^{-1}\beta_k\)

W <- listw2mat(dist_15.lw)
I <- diag(dim(W)[1])

rho <- unname(mod_1.sar$rho)

M <- solve(I - rho*W)

M[1:10, 1:10]

                1           2           3           4           5
 [1,] 1.164650997 0.002433319 0.004089559 0.004034508 0.006545994
 [2,] 0.010706605 1.407336301 0.643881932 0.370049927 0.464794934
 [3,] 0.011246286 0.402426207 1.474021599 0.429011868 0.641526285
 [4,] 0.008875918 0.185024963 0.343209495 1.684533322 0.614086824
 [5,] 0.012000989 0.193664556 0.427684190 0.511739020 1.560840834
 [6,] 0.010741524 0.192552594 0.452940016 0.631452476 0.672787841
 [7,] 0.012779708 0.141953871 0.299247377 0.418234186 0.616895800
 [8,] 0.014769006 0.125781189 0.253122442 0.295553039 0.500919513
 [9,] 0.011708131 0.147549264 0.309080773 0.568442619 0.629156269
[10,] 0.009937859 0.152900148 0.306652041 0.727001926 0.553973310
                6           7          8           9          10
 [1,] 0.004882511 0.005808958 0.00872714 0.005854065 0.003613767
 [2,] 0.385105188 0.283907742 0.32703109 0.324608380 0.244640236
 [3,] 0.566175019 0.374059222 0.41132397 0.424986063 0.306652041
 [4,] 0.631452476 0.418234186 0.38421895 0.625286881 0.581601541
 [5,] 0.560656534 0.514079833 0.54266281 0.576726579 0.369315540
 [6,] 1.571175245 0.558170218 0.46513922 0.661184961 0.543820047
 [7,] 0.558170218 1.475511568 0.58520461 0.614170880 0.463886540
 [8,] 0.357799398 0.450157392 1.46638195 0.474994894 0.272339890
 [9,] 0.601077237 0.558337164 0.56135760 1.581077095 0.517983092
[10,] 0.679775059 0.579858174 0.44255232 0.712226751 1.560083138

Create an \(N \times N\) effects matrix. What is the effect of unit 6 on unit 10?

# For beta 1

beta <- mod_1.sar$coefficients

effM <- beta[2] * M

effM[1:10, 1:10]

                 1            2            3            4
 [1,] 4.003610e-04 8.364789e-07 1.405829e-06 1.386904e-06
 [2,] 3.680507e-06 4.837866e-04 2.213411e-04 1.272085e-04
 [3,] 3.866028e-06 1.383382e-04 5.067103e-04 1.474773e-04
 [4,] 3.051190e-06 6.360427e-05 1.179819e-04 5.790759e-04
 [5,] 4.125465e-06 6.657422e-05 1.470209e-04 1.759156e-04
 [6,] 3.692511e-06 6.619197e-05 1.557029e-04 2.170684e-04
 [7,] 4.393158e-06 4.879813e-05 1.028694e-04 1.437724e-04
 [8,] 5.077000e-06 4.323860e-05 8.701349e-05 1.015994e-04
 [9,] 4.024792e-06 5.072160e-05 1.062497e-04 1.954081e-04
[10,] 3.416243e-06 5.256102e-05 1.054148e-04 2.499145e-04
                 5            6            7            8
 [1,] 2.250254e-06 1.678414e-06 1.996890e-06 3.000045e-06
 [2,] 1.597781e-04 1.323839e-04 9.759625e-05 1.124204e-04
 [3,] 2.205314e-04 1.946286e-04 1.285868e-04 1.413969e-04
 [4,] 2.110988e-04 2.170684e-04 1.437724e-04 1.320793e-04
 [5,] 5.365554e-04 1.927315e-04 1.767203e-04 1.865460e-04
 [6,] 2.312779e-04 5.401079e-04 1.918768e-04 1.598965e-04
 [7,] 2.120644e-04 1.918768e-04 5.072225e-04 2.011702e-04
 [8,] 1.721963e-04 1.229973e-04 1.547463e-04 5.040841e-04
 [9,] 2.162790e-04 2.066266e-04 1.919342e-04 1.929725e-04
[10,] 1.904341e-04 2.336798e-04 1.993323e-04 1.521320e-04
                 9           10
 [1,] 2.012396e-06 1.242270e-06
 [2,] 1.115875e-04 8.409764e-05
 [3,] 1.460934e-04 1.054148e-04
 [4,] 2.149489e-04 1.999316e-04
 [5,] 1.982558e-04 1.269561e-04
 [6,] 2.272892e-04 1.869438e-04
 [7,] 2.111277e-04 1.594658e-04
 [8,] 1.632845e-04 9.361968e-05
 [9,] 5.435118e-04 1.780621e-04
[10,] 2.448354e-04 5.362949e-04

# "Effect" of unit 6 on unit 10
effM[10, 6]

           6 
0.0002336798

Calculate and interpret the summary impact measures.

mod_1.sar.imp <- impacts(mod_1.sar, listw = dist_15.lw, R = 300)
summary(mod_1.sar.imp)

Impact measures (lag, exact):
                    Direct     Indirect        Total
per_mixed     0.0004939013  0.010385844  0.010879745
per_asian    -0.0001224192 -0.002574253 -0.002696672
per_black    -0.0006142789 -0.012917166 -0.013531445
per_other     0.0028294759  0.059498722  0.062328198
per_nonUK_EU  0.0012791011  0.026897166  0.028176267
per_nonEU     0.0026523198  0.055773451  0.058425770
log(POPDEN)   0.0267960076  0.563471199  0.590267206
========================================================
Simulation results ( variance matrix):
Direct:

Iterations = 1:300
Thinning interval = 1 
Number of chains = 1 
Sample size per chain = 300 

1. Empirical mean and standard deviation for each variable,
   plus standard error of the mean:

                   Mean        SD  Naive SE Time-series SE
per_mixed     0.0006040 0.0020920 1.208e-04      1.208e-04
per_asian    -0.0001103 0.0001560 9.007e-06      1.035e-05
per_black    -0.0006271 0.0003298 1.904e-05      1.904e-05
per_other     0.0027658 0.0010967 6.332e-05      6.332e-05
per_nonUK_EU  0.0012667 0.0005646 3.260e-05      3.260e-05
per_nonEU     0.0026330 0.0004942 2.853e-05      3.103e-05
log(POPDEN)   0.0268165 0.0037907 2.189e-04      2.653e-04

2. Quantiles for each variable:

                   2.5%        25%        50%        75%     97.5%
per_mixed    -3.555e-03 -0.0006939  0.0006818  2.010e-03 4.614e-03
per_asian    -4.052e-04 -0.0002075 -0.0001140 -1.541e-05 1.825e-04
per_black    -1.223e-03 -0.0008542 -0.0006202 -4.248e-04 4.855e-07
per_other     5.201e-04  0.0020690  0.0027611  3.533e-03 4.875e-03
per_nonUK_EU  6.608e-05  0.0008741  0.0012604  1.684e-03 2.267e-03
per_nonEU     1.644e-03  0.0023112  0.0026410  2.932e-03 3.589e-03
log(POPDEN)   2.006e-02  0.0242190  0.0268230  2.966e-02 3.408e-02

========================================================
Indirect:

Iterations = 1:300
Thinning interval = 1 
Number of chains = 1 
Sample size per chain = 300 

1. Empirical mean and standard deviation for each variable,
   plus standard error of the mean:

                  Mean       SD  Naive SE Time-series SE
per_mixed     0.012078 0.045516 0.0026279      0.0026279
per_asian    -0.002418 0.003570 0.0002061      0.0002369
per_black    -0.013473 0.007693 0.0004441      0.0004602
per_other     0.059114 0.025766 0.0014876      0.0014876
per_nonUK_EU  0.027127 0.013340 0.0007702      0.0007702
per_nonEU     0.056823 0.016381 0.0009458      0.0010557
log(POPDEN)   0.573386 0.120109 0.0069345      0.0080970

2. Quantiles for each variable:

                  2.5%       25%       50%        75%     97.5%
per_mixed    -0.075534 -0.014592  0.013192  0.0405609 1.008e-01
per_asian    -0.010291 -0.004422 -0.002205 -0.0003546 3.968e-03
per_black    -0.030384 -0.018381 -0.012752 -0.0086602 1.556e-05
per_other     0.012396  0.043711  0.057869  0.0716045 1.180e-01
per_nonUK_EU  0.001391  0.017889  0.027007  0.0347052 5.276e-02
per_nonEU     0.033511  0.046229  0.053537  0.0655623 9.040e-02
log(POPDEN)   0.394650  0.485552  0.557876  0.6467093 8.672e-01

========================================================
Total:

Iterations = 1:300
Thinning interval = 1 
Number of chains = 1 
Sample size per chain = 300 

1. Empirical mean and standard deviation for each variable,
   plus standard error of the mean:

                  Mean       SD  Naive SE Time-series SE
per_mixed     0.012682 0.047576 0.0027468      0.0027468
per_asian    -0.002529 0.003722 0.0002149      0.0002471
per_black    -0.014100 0.007999 0.0004618      0.0004778
per_other     0.061880 0.026737 0.0015437      0.0015437
per_nonUK_EU  0.028394 0.013849 0.0007996      0.0007996
per_nonEU     0.059456 0.016742 0.0009666      0.0010819
log(POPDEN)   0.600202 0.121985 0.0070428      0.0082708

2. Quantiles for each variable:

                  2.5%      25%       50%       75%     97.5%
per_mixed    -0.079021 -0.01537  0.013886  0.042492 1.060e-01
per_asian    -0.010702 -0.00465 -0.002321 -0.000369 4.168e-03
per_black    -0.031483 -0.01929 -0.013426 -0.009016 1.604e-05
per_other     0.012907  0.04572  0.060653  0.074784 1.228e-01
per_nonUK_EU  0.001457  0.01881  0.028109  0.036437 5.446e-02
per_nonEU     0.035310  0.04858  0.056231  0.068118 9.358e-02
log(POPDEN)   0.416004  0.51273  0.584256  0.672491 9.010e-01

4) Is SAR the right model choice or would you rather estimate a different model?

How do results change once you specify a spatial Durbin model?

I am using a spatial Durbin error model here.

mod_1.durb <- errorsarlm(log(no2) ~ per_mixed + per_asian + per_black + per_other
                      + per_nonUK_EU + per_nonEU + log(POPDEN),  
                      data = msoa.spdf, 
                      listw = dist_15.lw,
                      Durbin = TRUE)

summary(mod_1.durb)


Call:
errorsarlm(formula = log(no2) ~ per_mixed + per_asian + per_black + 
    per_other + per_nonUK_EU + per_nonEU + log(POPDEN), data = msoa.spdf, 
    listw = dist_15.lw, Durbin = TRUE)

Residuals:
       Min         1Q     Median         3Q        Max 
-0.1839285 -0.0254426 -0.0027042  0.0216084  0.2944840 

Type: error 
Coefficients: (asymptotic standard errors) 
                    Estimate  Std. Error z value  Pr(>|z|)
(Intercept)       2.64939215  0.24748370 10.7053 < 2.2e-16
per_mixed         0.00553333  0.00223688  2.4737  0.013373
per_asian        -0.00017156  0.00024183 -0.7094  0.478062
per_black        -0.00057947  0.00034426 -1.6832  0.092334
per_other         0.00203392  0.00112534  1.8074  0.070701
per_nonUK_EU      0.00086254  0.00058902  1.4644  0.143091
per_nonEU         0.00135822  0.00053480  2.5397  0.011096
log(POPDEN)       0.02716824  0.00354239  7.6695 1.732e-14
lag.per_mixed     0.00107140  0.00819322  0.1308  0.895960
lag.per_asian    -0.00060616  0.00070080 -0.8649  0.387069
lag.per_black    -0.00191733  0.00130997 -1.4636  0.143291
lag.per_other     0.01014125  0.00496979  2.0406  0.041293
lag.per_nonUK_EU  0.00925620  0.00217624  4.2533 2.106e-05
lag.per_nonEU     0.00563564  0.00185541  3.0374  0.002386
lag.log(POPDEN)  -0.01370891  0.01128957 -1.2143  0.224634

Lambda: 0.99424, LR test value: 1527.8, p-value: < 2.22e-16
Asymptotic standard error: 0.0024921
    z-value: 398.96, p-value: < 2.22e-16
Wald statistic: 159170, p-value: < 2.22e-16

Log likelihood: 1572.051 for error model
ML residual variance (sigma squared): 0.0019546, (sigma: 0.044211)
Number of observations: 983 
Number of parameters estimated: 17 
AIC: NA (not available for weighted model), (AIC for lm: -1584.3)

mod_1.durb.imp <- impacts(mod_1.durb, listw = dist_15.lw, R = 300)
summary(mod_1.durb.imp, zstats = TRUE, short = TRUE)

Impact measures (SDEM, glht, n):
                    Direct      Indirect         Total
per_mixed     0.0055333298  0.0010713981  0.0066047279
per_asian    -0.0001715584 -0.0006061567 -0.0007777151
per_black    -0.0005794704 -0.0019173261 -0.0024967965
per_other     0.0020339226  0.0101412504  0.0121751729
per_nonUK_EU  0.0008625423  0.0092561971  0.0101187394
per_nonEU     0.0013582217  0.0056356426  0.0069938643
log(POPDEN)   0.0271682392 -0.0137089088  0.0134593305
========================================================
Standard errors:
                   Direct     Indirect        Total
per_mixed    0.0022368774 0.0081932151 0.0089412446
per_asian    0.0002418282 0.0007008041 0.0007540247
per_black    0.0003442649 0.0013099673 0.0013639632
per_other    0.0011253352 0.0049697880 0.0051074709
per_nonUK_EU 0.0005890159 0.0021762395 0.0022231333
per_nonEU    0.0005348024 0.0018554128 0.0020196122
log(POPDEN)  0.0035423870 0.0112895670 0.0132006518
========================================================
Z-values:
                 Direct   Indirect     Total
per_mixed     2.4736849  0.1307665  0.738681
per_asian    -0.7094226 -0.8649446 -1.031419
per_black    -1.6832108 -1.4636442 -1.830545
per_other     1.8073926  2.0405801  2.383797
per_nonUK_EU  1.4643785  4.2532989  4.551567
per_nonEU     2.5396703  3.0374063  3.462974
log(POPDEN)   7.6694724 -1.2142989  1.019596

p-values:
             Direct     Indirect   Total     
per_mixed    0.013373   0.8959600  0.46010070
per_asian    0.478062   0.3870692  0.30234457
per_black    0.092334   0.1432912  0.06716843
per_other    0.070701   0.0412926  0.01713506
per_nonUK_EU 0.143091   2.1064e-05 5.3248e-06
per_nonEU    0.011096   0.0023862  0.00053424
log(POPDEN)  1.7319e-14 0.2246336  0.30792015