11  Exercises III

\[ \newcommand{\tr}{\mathrm{tr}} \newcommand{\rank}{\mathrm{rank}} \newcommand{\plim}{\operatornamewithlimits{plim}} \newcommand{\diag}{\mathrm{diag}} \newcommand{\bm}[1]{\boldsymbol{\mathbf{#1}}} \newcommand{\Var}{\mathrm{Var}} \newcommand{\Exp}{\mathrm{E}} \newcommand{\Cov}{\mathrm{Cov}} \newcommand\given[1][]{\:#1\vert\:} \newcommand{\irow}[1]{% \begin{pmatrix}#1\end{pmatrix} } \]

Required packages

pkgs <- c("sf", "mapview", "spdep", "spatialreg", "tmap", "viridisLite", 
          "plm", "lfe", "splm", "SDPDmod")
lapply(pkgs, require, character.only = TRUE)

Session info

R version 4.4.1 (2024-06-14 ucrt)
Platform: x86_64-w64-mingw32/x64
Running under: Windows 11 x64 (build 22631)

Matrix products: default


locale:
[1] LC_COLLATE=English_United Kingdom.utf8 
[2] LC_CTYPE=English_United Kingdom.utf8   
[3] LC_MONETARY=English_United Kingdom.utf8
[4] LC_NUMERIC=C                           
[5] LC_TIME=English_United Kingdom.utf8    

time zone: Europe/Berlin
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods  
[7] base     

other attached packages:
 [1] SDPDmod_0.0.5     splm_1.6-5        lfe_3.0-0        
 [4] plm_2.6-4         viridisLite_0.4.2 tmap_3.3-4       
 [7] spatialreg_1.3-4  Matrix_1.7-0      spdep_1.3-5      
[10] spData_2.3.1      mapview_2.11.2    sf_1.0-16        

loaded via a namespace (and not attached):
 [1] fastmap_1.2.0       leaflet_2.2.2       TH.data_1.1-2      
 [4] dotCall64_1.1-1     fixest_0.12.1       XML_3.99-0.16.1    
 [7] digest_0.6.35       lifecycle_1.0.4     dreamerr_1.4.0     
[10] LearnBayes_2.15.1   survival_3.6-4      terra_1.7-78       
[13] magrittr_2.0.3      compiler_4.4.1      rlang_1.1.4        
[16] tools_4.4.1         collapse_2.0.14     knitr_1.47         
[19] htmlwidgets_1.6.4   sp_2.1-4            classInt_0.4-10    
[22] RColorBrewer_1.1-3  multcomp_1.4-25     abind_1.4-5        
[25] KernSmooth_2.23-24  numDeriv_2016.8-1.1 leafsync_0.1.0     
[28] grid_4.4.1          stats4_4.4.1        xtable_1.8-4       
[31] e1071_1.7-14        leafem_0.2.3        colorspace_2.1-0   
[34] scales_1.3.0        MASS_7.3-60.2       dichromat_2.0-0.1  
[37] cli_3.6.2           mvtnorm_1.2-5       rmarkdown_2.27     
[40] miscTools_0.6-28    generics_0.1.3      RSpectra_0.16-1    
[43] rstudioapi_0.16.0   tmaptools_3.1-1     bdsmatrix_1.3-7    
[46] DBI_1.2.3           proxy_0.4-27        stringr_1.5.1      
[49] splines_4.4.1       stars_0.6-5         parallel_4.4.1     
[52] s2_1.1.6            stringmagic_1.1.2   base64enc_0.1-3    
[55] boot_1.3-30         sandwich_3.1-0      jsonlite_1.8.8     
[58] Formula_1.2-5       crosstalk_1.2.1     units_0.8-5        
[61] spam_2.10-0         glue_1.7.0          lwgeom_0.2-14      
[64] codetools_0.2-20    stringi_1.8.4       deldir_2.0-4       
[67] raster_3.6-26       lmtest_0.9-40       munsell_0.5.1      
[70] htmltools_0.5.8.1   satellite_1.0.5     R6_2.5.1           
[73] wk_0.9.1            maxLik_1.5-2.1      Rdpack_2.6         
[76] evaluate_0.24.0     lattice_0.22-6      rbibutils_2.2.16   
[79] png_0.1-8           class_7.3-22        Rcpp_1.0.12        
[82] coda_0.19-4.1       nlme_3.1-164        xfun_0.45          
[85] zoo_1.8-12         

Reload data from pervious session

load("_data/msoa2_spatial.RData")

11.1 Environmental inequality (continued)

Let’s use the same neighbours weights definition as before:

coords <- st_centroid(msoa.spdf)
Warning: st_centroid assumes attributes are constant over
geometries
# Neighbours within 3km distance
dist_15.nb <- dnearneigh(coords, d1 = 0, d2 = 2500)

summary(dist_15.nb)
Neighbour list object:
Number of regions: 983 
Number of nonzero links: 15266 
Percentage nonzero weights: 1.579859 
Average number of links: 15.53001 
4 regions with no links:
158 463 478 505
6 disjoint connected subgraphs
Link number distribution:

 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 
 4  5  9 23 19 26 36 31 53 39 61 63 59 48 42 35 24 31 28 30 27 26 
22 23 24 25 26 27 28 29 30 31 32 33 34 
25 19 38 29 32 38 26 16 20 10  8  1  2 
5 least connected regions:
160 469 474 597 959 with 1 link
2 most connected regions:
565 567 with 34 links
# There are some empty neighbour sets. Lets impute those with the nearest neighbour.
k2.nb <- knearneigh(coords, k = 1)

# Replace zero
nolink_ids <- which(card(dist_15.nb) == 0)
dist_15.nb[card(dist_15.nb) == 0] <- k2.nb$nn[nolink_ids, ]

summary(dist_15.nb)
Neighbour list object:
Number of regions: 983 
Number of nonzero links: 15270 
Percentage nonzero weights: 1.580273 
Average number of links: 15.53408 
2 disjoint connected subgraphs
Link number distribution:

 1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 
 9  9 23 19 26 36 31 53 39 61 63 59 48 42 35 24 31 28 30 27 26 25 
23 24 25 26 27 28 29 30 31 32 33 34 
19 38 29 32 38 26 16 20 10  8  1  2 
9 least connected regions:
158 160 463 469 474 478 505 597 959 with 1 link
2 most connected regions:
565 567 with 34 links
# listw object with row-normalization
dist_15.lw <- nb2listw(dist_15.nb, style = "W")

and estiamte the spatial SAR model:

mod_1.sar <- lagsarlm(log(no2) ~ per_mixed + per_asian + per_black + per_other
                      + per_nonUK_EU + per_nonEU  + log(POPDEN),  
                      data = msoa.spdf, 
                      listw = dist_15.lw,
                      Durbin = FALSE) # we could here extend to SDM
summary(mod_1.sar)

Call:
lagsarlm(formula = log(no2) ~ per_mixed + per_asian + per_black + 
    per_other + per_nonUK_EU + per_nonEU + log(POPDEN), data = msoa.spdf, 
    listw = dist_15.lw, Durbin = FALSE)

Residuals:
       Min         1Q     Median         3Q        Max 
-0.2140485 -0.0267085 -0.0021421  0.0238337  0.3505513 

Type: lag 
Coefficients: (asymptotic standard errors) 
                Estimate  Std. Error z value  Pr(>|z|)
(Intercept)  -1.7004e-02  1.8122e-02 -0.9383  0.348110
per_mixed     3.4376e-04  1.4758e-03  0.2329  0.815810
per_asian    -8.5205e-05  1.1494e-04 -0.7413  0.458507
per_black    -4.2754e-04  2.3468e-04 -1.8218  0.068484
per_other     1.9693e-03  7.4939e-04  2.6279  0.008591
per_nonUK_EU  8.9027e-04  3.9638e-04  2.2460  0.024703
per_nonEU     1.8460e-03  3.5159e-04  5.2506 1.516e-07
log(POPDEN)   1.8650e-02  2.7852e-03  6.6963 2.138e-11

Rho: 0.9684, LR test value: 2002.5, p-value: < 2.22e-16
Asymptotic standard error: 0.0063124
    z-value: 153.41, p-value: < 2.22e-16
Wald statistic: 23535, p-value: < 2.22e-16

Log likelihood: 1562.401 for lag model
ML residual variance (sigma squared): 0.0020568, (sigma: 0.045352)
Number of observations: 983 
Number of parameters estimated: 10 
AIC: -3104.8, (AIC for lm: -1104.3)
LM test for residual autocorrelation
test value: 108.97, p-value: < 2.22e-16

1) Please calculate the true multiplier matrix of this SAR model.

The multiplier matrix is given by \(({\bm I_N}-\rho {\bm W})^{-1}\).

W <- listw2mat(dist_15.lw)
I <- diag(dim(W)[1])

rho <- unname(mod_1.sar$rho)

M <- solve(I - rho*W)

M[1:10, 1:10]
                1           2           3           4           5
 [1,] 1.164650997 0.002433319 0.004089559 0.004034508 0.006545994
 [2,] 0.010706605 1.407336301 0.643881932 0.370049927 0.464794934
 [3,] 0.011246286 0.402426208 1.474021599 0.429011868 0.641526285
 [4,] 0.008875918 0.185024963 0.343209495 1.684533322 0.614086824
 [5,] 0.012000989 0.193664556 0.427684190 0.511739020 1.560840834
 [6,] 0.010741524 0.192552594 0.452940016 0.631452476 0.672787841
 [7,] 0.012779708 0.141953871 0.299247377 0.418234186 0.616895800
 [8,] 0.014769006 0.125781189 0.253122442 0.295553039 0.500919513
 [9,] 0.011708131 0.147549264 0.309080773 0.568442619 0.629156269
[10,] 0.009937859 0.152900148 0.306652041 0.727001926 0.553973310
                6           7          8           9          10
 [1,] 0.004882511 0.005808958 0.00872714 0.005854065 0.003613767
 [2,] 0.385105188 0.283907742 0.32703109 0.324608381 0.244640237
 [3,] 0.566175019 0.374059222 0.41132397 0.424986063 0.306652041
 [4,] 0.631452476 0.418234186 0.38421895 0.625286881 0.581601541
 [5,] 0.560656534 0.514079833 0.54266281 0.576726580 0.369315540
 [6,] 1.571175245 0.558170218 0.46513922 0.661184961 0.543820047
 [7,] 0.558170218 1.475511568 0.58520461 0.614170880 0.463886540
 [8,] 0.357799398 0.450157392 1.46638195 0.474994894 0.272339890
 [9,] 0.601077237 0.558337164 0.56135760 1.581077095 0.517983092
[10,] 0.679775059 0.579858175 0.44255232 0.712226751 1.560083138

2) Create an N x N effects matrix for the effect of the non-EU citizens. What is the effect of unit 6 on unit 10? Why is this larger than the effect of unit 5 on unit 8?

# For beta 1

beta <- mod_1.sar$coefficients

effM <- beta["per_nonEU"] * M

effM[1:10, 1:10]
                 1            2            3            4
 [1,] 2.149995e-03 4.492010e-06 7.549498e-06 7.447872e-06
 [2,] 1.976484e-05 2.598002e-03 1.188633e-03 6.831278e-04
 [3,] 2.076112e-05 7.428958e-04 2.721106e-03 7.919740e-04
 [4,] 1.638532e-05 3.415639e-04 6.335792e-04 3.109720e-03
 [5,] 2.215433e-05 3.575129e-04 7.895231e-04 9.446918e-04
 [6,] 1.982931e-05 3.554602e-04 8.361464e-04 1.165688e-03
 [7,] 2.359188e-05 2.620528e-04 5.524233e-04 7.720780e-04
 [8,] 2.726421e-05 2.321974e-04 4.672747e-04 5.456034e-04
 [9,] 2.161370e-05 2.723822e-04 5.705762e-04 1.049369e-03
[10,] 1.834571e-05 2.822601e-04 5.660926e-04 1.342076e-03
                 5            6            7            8
 [1,] 1.208418e-05 9.013321e-06 1.072358e-05 1.611067e-05
 [2,] 8.580311e-04 7.109204e-04 5.241057e-04 6.037132e-04
 [3,] 1.184285e-03 1.045183e-03 6.905291e-04 7.593214e-04
 [4,] 1.133630e-03 1.165688e-03 7.720780e-04 7.092844e-04
 [5,] 2.881378e-03 1.034996e-03 9.490131e-04 1.001778e-03
 [6,] 1.241995e-03 2.900456e-03 1.030406e-03 8.586666e-04
 [7,] 1.138816e-03 1.030406e-03 2.723857e-03 1.080312e-03
 [8,] 9.247186e-04 6.605128e-04 8.310095e-04 2.707003e-03
 [9,] 1.161449e-03 1.109614e-03 1.030714e-03 1.036290e-03
[10,] 1.022658e-03 1.254893e-03 1.070443e-03 8.169703e-04
                 9           10
 [1,] 1.080685e-05 6.671166e-06
 [2,] 5.992408e-04 4.516162e-04
 [3,] 7.845422e-04 5.660926e-04
 [4,] 1.154306e-03 1.073661e-03
 [5,] 1.064662e-03 6.817721e-04
 [6,] 1.220575e-03 1.003915e-03
 [7,] 1.133785e-03 8.563541e-04
 [8,] 8.768606e-04 5.027509e-04
 [9,] 2.918735e-03 9.562187e-04
[10,] 1.314801e-03 2.879979e-03
# "Effect" of unit 6 on unit 10
effM[10, 6]
          6 
0.001254893 
# "Effect" of unit 5 on unit 8
effM[8, 5]
           5 
0.0009247186 

3) Calculate and interpret the summary impact measures of the SAR model.

mod_1.sar.imp <- impacts(mod_1.sar, listw = dist_15.lw, R = 300)
summary(mod_1.sar.imp)
Impact measures (lag, exact):
                    Direct     Indirect        Total
per_mixed     0.0004939013  0.010385844  0.010879745
per_asian    -0.0001224192 -0.002574253 -0.002696672
per_black    -0.0006142789 -0.012917166 -0.013531445
per_other     0.0028294759  0.059498722  0.062328198
per_nonUK_EU  0.0012791011  0.026897166  0.028176267
per_nonEU     0.0026523198  0.055773451  0.058425771
log(POPDEN)   0.0267960076  0.563471199  0.590267206
========================================================
Simulation results ( variance matrix):
Direct:

Iterations = 1:300
Thinning interval = 1 
Number of chains = 1 
Sample size per chain = 300 

1. Empirical mean and standard deviation for each variable,
   plus standard error of the mean:

                   Mean        SD  Naive SE Time-series SE
per_mixed     0.0001745 0.0020780 1.200e-04      1.200e-04
per_asian    -0.0001336 0.0001714 9.896e-06      8.335e-06
per_black    -0.0005954 0.0003387 1.956e-05      1.956e-05
per_other     0.0028443 0.0011054 6.382e-05      6.382e-05
per_nonUK_EU  0.0012857 0.0005820 3.360e-05      3.360e-05
per_nonEU     0.0026483 0.0004929 2.846e-05      2.846e-05
log(POPDEN)   0.0272698 0.0038311 2.212e-04      2.212e-04

2. Quantiles for each variable:

                   2.5%        25%        50%        75%     97.5%
per_mixed    -0.0038905 -0.0014682  0.0001532  1.625e-03 3.956e-03
per_asian    -0.0004527 -0.0002502 -0.0001377 -3.469e-05 1.923e-04
per_black    -0.0012608 -0.0008239 -0.0006091 -3.778e-04 4.875e-05
per_other     0.0006565  0.0021010  0.0028389  3.588e-03 5.182e-03
per_nonUK_EU  0.0002459  0.0008845  0.0013231  1.641e-03 2.462e-03
per_nonEU     0.0015766  0.0023563  0.0026501  2.920e-03 3.706e-03
log(POPDEN)   0.0196959  0.0251516  0.0271791  2.961e-02 3.481e-02

========================================================
Indirect:

Iterations = 1:300
Thinning interval = 1 
Number of chains = 1 
Sample size per chain = 300 

1. Empirical mean and standard deviation for each variable,
   plus standard error of the mean:

                  Mean       SD  Naive SE Time-series SE
per_mixed     0.003658 0.046712 0.0026969      0.0026969
per_asian    -0.002957 0.003998 0.0002308      0.0002308
per_black    -0.012914 0.008211 0.0004741      0.0004741
per_other     0.061433 0.028135 0.0016244      0.0016244
per_nonUK_EU  0.027138 0.013036 0.0007527      0.0006604
per_nonEU     0.057503 0.018086 0.0010442      0.0010442
log(POPDEN)   0.584320 0.133273 0.0076945      0.0076945

2. Quantiles for each variable:

                  2.5%       25%       50%        75%    97.5%
per_mixed    -0.083244 -0.029530  0.002813  0.0325991 0.089390
per_asian    -0.010917 -0.005318 -0.002799 -0.0006719 0.004353
per_black    -0.030136 -0.017562 -0.012604 -0.0074698 0.001150
per_other     0.013233  0.042098  0.059148  0.0771860 0.124014
per_nonUK_EU  0.004334  0.018699  0.026498  0.0333457 0.054834
per_nonEU     0.028556  0.046141  0.053447  0.0660868 0.100683
log(POPDEN)   0.369991  0.497071  0.565468  0.6474343 0.932730

========================================================
Total:

Iterations = 1:300
Thinning interval = 1 
Number of chains = 1 
Sample size per chain = 300 

1. Empirical mean and standard deviation for each variable,
   plus standard error of the mean:

                  Mean       SD  Naive SE Time-series SE
per_mixed     0.003832 0.048743 0.0028142      0.0028142
per_asian    -0.003090 0.004164 0.0002404      0.0002404
per_black    -0.013509 0.008522 0.0004920      0.0004920
per_other     0.064278 0.029108 0.0016805      0.0016805
per_nonUK_EU  0.028423 0.013557 0.0007827      0.0006880
per_nonEU     0.060151 0.018452 0.0010653      0.0010653
log(POPDEN)   0.611590 0.135087 0.0077993      0.0077993

2. Quantiles for each variable:

                  2.5%       25%       50%        75%    97.5%
per_mixed    -0.087299 -0.030723  0.002957  0.0341656 0.092695
per_asian    -0.011336 -0.005578 -0.002933 -0.0007075 0.004572
per_black    -0.031225 -0.018446 -0.013258 -0.0078015 0.001199
per_other     0.013743  0.043935  0.061758  0.0807672 0.129270
per_nonUK_EU  0.004594  0.019548  0.027903  0.0349452 0.057126
per_nonEU     0.030439  0.048930  0.056091  0.0688583 0.104002
log(POPDEN)   0.391342  0.525207  0.595474  0.6767114 0.961312

4) Is SAR the right model choice or would you rather estimate a different model? Please run a Durbin model and caculate its impact summary measures

# Spatial Dubrbin model
mod_1.durb <- lagsarlm(log(no2) ~ per_mixed + per_asian + per_black + per_other
                      + per_nonUK_EU + per_nonEU + log(POPDEN),  
                      data = msoa.spdf, 
                      listw = dist_15.lw,
                      Durbin = TRUE)

summary(mod_1.durb)

Call:
lagsarlm(formula = log(no2) ~ per_mixed + per_asian + per_black + 
    per_other + per_nonUK_EU + per_nonEU + log(POPDEN), data = msoa.spdf, 
    listw = dist_15.lw, Durbin = TRUE)

Residuals:
       Min         1Q     Median         3Q        Max 
-0.1854009 -0.0263818 -0.0020816  0.0229647  0.3321974 

Type: mixed 
Coefficients: (asymptotic standard errors) 
                    Estimate  Std. Error z value  Pr(>|z|)
(Intercept)      -0.00409824  0.01983728 -0.2066   0.83633
per_mixed         0.00434535  0.00218712  1.9868   0.04695
per_asian        -0.00028620  0.00023959 -1.1945   0.23227
per_black        -0.00056734  0.00034455 -1.6466   0.09964
per_other         0.00222708  0.00112918  1.9723   0.04857
per_nonUK_EU      0.00085417  0.00059478  1.4361   0.15097
per_nonEU         0.00095220  0.00052681  1.8075   0.07069
log(POPDEN)       0.02649122  0.00320358  8.2693 2.220e-16
lag.per_mixed    -0.00475294  0.00315799 -1.5051   0.13231
lag.per_asian     0.00024092  0.00028983  0.8312   0.40584
lag.per_black     0.00025812  0.00054125  0.4769   0.63344
lag.per_other    -0.00074506  0.00176141 -0.4230   0.67230
lag.per_nonUK_EU  0.00094549  0.00100320  0.9425   0.34595
lag.per_nonEU     0.00130970  0.00078547  1.6674   0.09544
lag.log(POPDEN)  -0.02526415  0.00588517 -4.2928 1.764e-05

Rho: 0.98286, LR test value: 1536.9, p-value: < 2.22e-16
Asymptotic standard error: 0.0051804
    z-value: 189.73, p-value: < 2.22e-16
Wald statistic: 35997, p-value: < 2.22e-16

Log likelihood: 1576.566 for mixed model
ML residual variance (sigma squared): 0.001969, (sigma: 0.044374)
Number of observations: 983 
Number of parameters estimated: 17 
AIC: -3119.1, (AIC for lm: -1584.3)
LM test for residual autocorrelation
test value: 103.97, p-value: < 2.22e-16
# Impact measures of the Durbin Error model
mod_1.durb.imp <- impacts(mod_1.durb, listw = dist_15.lw, R = 300)
summary(mod_1.durb.imp, zstats = TRUE, short = TRUE)
Impact measures (mixed, exact):
                    Direct     Indirect        Total
per_mixed     0.0040597904 -0.027843988 -0.023784197
per_asian    -0.0003101210 -0.002332322 -0.002642443
per_black    -0.0007447486 -0.017299184 -0.018043932
per_other     0.0030823781  0.083398408  0.086480787
per_nonUK_EU  0.0019115634  0.103104372  0.105015935
per_nonEU     0.0022824096  0.129706442  0.131988851
log(POPDEN)   0.0269491699  0.044653927  0.071603096
========================================================
Simulation results ( variance matrix):
========================================================
Simulated standard errors
                   Direct   Indirect      Total
per_mixed    0.0026426092 0.15237064 0.15383800
per_asian    0.0002435009 0.03368906 0.03377057
per_black    0.0004087996 0.05626414 0.05648197
per_other    0.0014460702 0.28832432 0.28912971
per_nonUK_EU 0.0007026514 0.08873903 0.08906055
per_nonEU    0.0007841476 0.21793605 0.21846836
log(POPDEN)  0.0048294250 0.44047304 0.44363907

Simulated z-values:
                Direct    Indirect       Total
per_mixed     1.502545 -0.09423347 -0.06752413
per_asian    -1.330223 -0.15862593 -0.16783455
per_black    -1.869693 -0.43856895 -0.45040979
per_other     2.234748  0.37949852  0.38961840
per_nonUK_EU  2.741003  1.30070454  1.31763432
per_nonEU     3.020399  0.73967594  0.74871477
log(POPDEN)   5.564437  0.09572275  0.15561371

Simulated p-values:
             Direct    Indirect Total  
per_mixed    0.1329565 0.92492  0.94616
per_asian    0.1834448 0.87396  0.86671
per_black    0.0615264 0.66097  0.65241
per_other    0.0254339 0.70432  0.69682
per_nonUK_EU 0.0061252 0.19336  0.18763
per_nonEU    0.0025244 0.45950  0.45403
log(POPDEN)  2.63e-08  0.92374  0.87634

5) Please repeat with a Durbin Error model. Why are the impacts here idenptical to the coefficients?

# Spatial Dubrbin model
mod_1.durbe <- errorsarlm(log(no2) ~ per_mixed + per_asian + per_black + per_other
                      + per_nonUK_EU + per_nonEU + log(POPDEN),  
                      data = msoa.spdf, 
                      listw = dist_15.lw,
                      Durbin = TRUE)

summary(mod_1.durbe)

Call:
errorsarlm(formula = log(no2) ~ per_mixed + per_asian + per_black + 
    per_other + per_nonUK_EU + per_nonEU + log(POPDEN), data = msoa.spdf, 
    listw = dist_15.lw, Durbin = TRUE)

Residuals:
       Min         1Q     Median         3Q        Max 
-0.1839285 -0.0254426 -0.0027042  0.0216084  0.2944840 

Type: error 
Coefficients: (asymptotic standard errors) 
                    Estimate  Std. Error z value  Pr(>|z|)
(Intercept)       2.64939215  0.24748370 10.7053 < 2.2e-16
per_mixed         0.00553333  0.00223688  2.4737  0.013373
per_asian        -0.00017156  0.00024183 -0.7094  0.478062
per_black        -0.00057947  0.00034426 -1.6832  0.092334
per_other         0.00203392  0.00112534  1.8074  0.070701
per_nonUK_EU      0.00086254  0.00058902  1.4644  0.143091
per_nonEU         0.00135822  0.00053480  2.5397  0.011096
log(POPDEN)       0.02716824  0.00354239  7.6695 1.732e-14
lag.per_mixed     0.00107140  0.00819322  0.1308  0.895960
lag.per_asian    -0.00060616  0.00070080 -0.8649  0.387069
lag.per_black    -0.00191733  0.00130997 -1.4636  0.143291
lag.per_other     0.01014125  0.00496979  2.0406  0.041293
lag.per_nonUK_EU  0.00925620  0.00217624  4.2533 2.106e-05
lag.per_nonEU     0.00563564  0.00185541  3.0374  0.002386
lag.log(POPDEN)  -0.01370891  0.01128957 -1.2143  0.224634

Lambda: 0.99424, LR test value: 1527.8, p-value: < 2.22e-16
Asymptotic standard error: 0.0024921
    z-value: 398.96, p-value: < 2.22e-16
Wald statistic: 159170, p-value: < 2.22e-16

Log likelihood: 1572.051 for error model
ML residual variance (sigma squared): 0.0019546, (sigma: 0.044211)
Number of observations: 983 
Number of parameters estimated: 17 
AIC: -3110.1, (AIC for lm: -1584.3)
# Impact measures of the Durbin model
mod_1.durbe.imp <- impacts(mod_1.durbe, listw = dist_15.lw, R = 300)
summary(mod_1.durbe.imp, zstats = TRUE, short = TRUE)
Impact measures (SDEM, glht, n):
                    Direct      Indirect         Total
per_mixed     0.0055333298  0.0010713981  0.0066047279
per_asian    -0.0001715584 -0.0006061567 -0.0007777151
per_black    -0.0005794704 -0.0019173261 -0.0024967965
per_other     0.0020339226  0.0101412504  0.0121751729
per_nonUK_EU  0.0008625423  0.0092561971  0.0101187394
per_nonEU     0.0013582217  0.0056356426  0.0069938643
log(POPDEN)   0.0271682392 -0.0137089088  0.0134593305
========================================================
Standard errors:
                   Direct     Indirect        Total
per_mixed    0.0022368774 0.0081932151 0.0089412446
per_asian    0.0002418282 0.0007008041 0.0007540247
per_black    0.0003442649 0.0013099673 0.0013639632
per_other    0.0011253352 0.0049697880 0.0051074709
per_nonUK_EU 0.0005890159 0.0021762395 0.0022231333
per_nonEU    0.0005348024 0.0018554128 0.0020196122
log(POPDEN)  0.0035423870 0.0112895670 0.0132006518
========================================================
Z-values:
                 Direct   Indirect     Total
per_mixed     2.4736849  0.1307665  0.738681
per_asian    -0.7094226 -0.8649446 -1.031419
per_black    -1.6832108 -1.4636442 -1.830545
per_other     1.8073926  2.0405801  2.383797
per_nonUK_EU  1.4643785  4.2532989  4.551567
per_nonEU     2.5396703  3.0374063  3.462974
log(POPDEN)   7.6694724 -1.2142989  1.019596

p-values:
             Direct     Indirect   Total     
per_mixed    0.013373   0.8959600  0.46010070
per_asian    0.478062   0.3870692  0.30234457
per_black    0.092334   0.1432912  0.06716843
per_other    0.070701   0.0412926  0.01713506
per_nonUK_EU 0.143091   2.1064e-05 5.3248e-06
per_nonEU    0.011096   0.0023862  0.00053424
log(POPDEN)  1.7319e-14 0.2246336  0.30792015

11.2 Inkar data: the effect of regional characteristics on life expectancy

Below, we read and transform some characteristics of the INKAR data on German counties.

load("_data/inkar2.Rdata")

Variables are

Variable Description
“Kennziffer” ID
“Raumeinheit” Name
“Aggregat” Level
“year” Year
“poluation_density” Population Density
“median_income” Median Household income (only for 2020)
“gdp_in1000EUR” Gross Domestic Product in 1000 euros
“unemployment_rate” Unemployment rate
“share_longterm_unemployed” Share of longterm unemployed (among unemployed)
“share_working_indutry” Share of employees in undistrial sector
“share_foreigners” Share of foreign nationals
“share_college” Share of school-finishers with college degree
“recreational_space” Recreational space per inhabitant
“car_density” Density of cars
“life_expectancy” Life expectancy

11.3 County shapes

kreise.spdf <- st_read(dsn = "_data/vg5000_ebenen_1231",
                       layer = "VG5000_KRS")
Reading layer `VG5000_KRS' from data source 
  `C:\work\Lehre\Geodata_Spatial_Regression\_data\vg5000_ebenen_1231' 
  using driver `ESRI Shapefile'
Simple feature collection with 400 features and 24 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 280353.1 ymin: 5235878 xmax: 921261.6 ymax: 6101302
Projected CRS: ETRS89 / UTM zone 32N

1) Please map the life expectancy across Germany

  1. Merge data with the shape file (as with conventional data)
# Merge
inkar_2020.spdf <- merge(kreise.spdf, inkar.df[inkar.df$year == 2020, ], 
                         by.x = "AGS", by.y = "Kennziffer")
  1. Create a map of life-expectancy
cols <- viridis(n = 100, direction = -1, option = "G")

mp1 <-  tm_shape(inkar_2020.spdf) + 
  tm_fill(col = "life_expectancy", 
          style = "cont", # algorithm to def cut points
          palette = cols, # colours
          stretch.palette = TRUE,
          title = "in years"
          ) +
  tm_borders(col = "white", lwd = 0.5, alpha = 0.5) +
  tm_layout(frame = FALSE,
            legend.frame = TRUE, legend.bg.color = TRUE,
            legend.position = c("right", "bottom"),
            legend.outside = FALSE,
            main.title = "Life expectancy", 
            main.title.position = "center",
            main.title.size = 1.6,
            legend.title.size = 0.8,
            legend.text.size = 0.8)

mp1

2) Chose some variables that could predict life expectancy. See for instance the following paper.

3) Generate a neighbours object (e.g. the 10 nearest neighbours).

# nb <- poly2nb(kreise.spdf, row.names = "ags", queen = TRUE)
knn <- knearneigh(st_centroid(kreise.spdf), k = 10)
Warning: st_centroid assumes attributes are constant over
geometries
nb <- knn2nb(knn, row.names = kreise.spdf$ags)
listw <- nb2listw(nb, style = "W")

4) Estimate a cross-sectional spatial model for the year 2020 and calculate the impacts.

### Use a spatial Durbin Error model

# Spec formula
fm <- life_expectancy ~ median_income + unemployment_rate + share_college + car_density

# Estimate error model with Durbin = TRUE 
mod_1.durb <- errorsarlm(fm,  
                      data = inkar_2020.spdf, 
                      listw = listw,
                      Durbin = TRUE)

summary(mod_1.durb)

Call:
errorsarlm(formula = fm, data = inkar_2020.spdf, listw = listw, 
    Durbin = TRUE)

Residuals:
      Min        1Q    Median        3Q       Max 
-1.343988 -0.349564  0.013309  0.333105  1.819014 

Type: error 
Coefficients: (asymptotic standard errors) 
                         Estimate  Std. Error  z value  Pr(>|z|)
(Intercept)            8.4970e+01  1.4366e+00  59.1460 < 2.2e-16
median_income          5.4013e-04  8.2285e-05   6.5642 5.233e-11
unemployment_rate     -3.8970e-01  2.0095e-02 -19.3923 < 2.2e-16
share_college          6.7806e-03  3.2502e-03   2.0862  0.036957
car_density           -3.2042e-03  4.9774e-04  -6.4376 1.214e-10
lag.median_income      4.9282e-04  1.8112e-04   2.7209  0.006510
lag.unemployment_rate -3.4685e-02  4.5454e-02  -0.7631  0.445415
lag.share_college     -1.7066e-03  7.0324e-03  -0.2427  0.808257
lag.car_density       -5.2210e-03  1.7540e-03  -2.9766  0.002915

Lambda: 0.57895, LR test value: 48.146, p-value: 3.9563e-12
Asymptotic standard error: 0.069524
    z-value: 8.3274, p-value: < 2.22e-16
Wald statistic: 69.345, p-value: < 2.22e-16

Log likelihood: -305.6855 for error model
ML residual variance (sigma squared): 0.26001, (sigma: 0.50991)
Number of observations: 400 
Number of parameters estimated: 11 
AIC: 633.37, (AIC for lm: 679.52)
# Calculate impacts (which is unnecessary in this case)
mod_1.durb.imp <- impacts(mod_1.durb, listw = listw, R = 300)
summary(mod_1.durb.imp, zstats = TRUE, short = TRUE)
Impact measures (SDEM, glht, n):
                         Direct      Indirect        Total
median_income      0.0005401288  0.0004928218  0.001032951
unemployment_rate -0.3896966886 -0.0346850580 -0.424381747
share_college      0.0067806080 -0.0017065790  0.005074029
car_density       -0.0032042420 -0.0052210239 -0.008425266
========================================================
Standard errors:
                        Direct     Indirect        Total
median_income     8.228463e-05 0.0001811232 0.0001813201
unemployment_rate 2.009540e-02 0.0454539601 0.0455753576
share_college     3.250157e-03 0.0070323541 0.0069549498
car_density       4.977410e-04 0.0017540488 0.0018942466
========================================================
Z-values:
                      Direct   Indirect      Total
median_income       6.564152  2.7209207  5.6968335
unemployment_rate -19.392336 -0.7630811 -9.3116493
share_college       2.086240 -0.2426754  0.7295565
car_density        -6.437570 -2.9765557 -4.4478191

p-values:
                  Direct     Indirect  Total     
median_income     5.233e-11  0.0065100 1.2205e-08
unemployment_rate < 2.22e-16 0.4454150 < 2.22e-16
share_college     0.036957   0.8082569 0.46566   
car_density       1.214e-10  0.0029151 8.6747e-06

5) Calculate the spatial lagged variables for your covariates (e.g. use create_WX(), which needs a non-spatial df as input) .

# Extract covariate names
covars <- attr(terms(fm),"term.labels")

w_vars <- create_WX(st_drop_geometry(inkar_2020.spdf)[, covars],
                    listw = listw,
                    prefix = "w")

inkar_2020.spdf <- cbind(inkar_2020.spdf, w_vars)

6) Can you run a spatial machine learning model? (for instance, using randomForest)?

randomForest 4.7-1.1
Type rfNews() to see new features/changes/bug fixes.
# Train
rf.mod <- randomForest(life_expectancy ~ median_income + unemployment_rate + share_college + car_density +
                         w.median_income + w.unemployment_rate + w.share_college + w.car_density,
                       data = st_drop_geometry(inkar_2020.spdf), 
                       ntree = 1000,
                       importance = TRUE)

# Inspect the mechanics of the model
importance(rf.mod)
                     %IncMSE IncNodePurity
median_income       33.24305      41.24505
unemployment_rate   67.21736     118.99576
share_college       23.17769      25.85066
car_density         25.46301      33.12425
w.median_income     36.07124      57.94637
w.unemployment_rate 26.67292      50.25835
w.share_college     16.17076      24.40972
w.car_density       23.68573      30.67772
varImpPlot(rf.mod)

You could even go further and use higher order neighbours (e.g. nblag(queens.nb, maxlag = 3)) to check the importance of direct neighbours and the neighbours neighbours and so on …

11.4 Esimate an FE model with SLX specification

  1. Loops over years to generate WX
# We use gdp instead of median income (which is only available in recent year)
fm <- life_expectancy ~ gdp_in1000EUR + unemployment_rate + share_college + car_density

# All years where we have a balanced sample
years <- unique(inkar.df$year[which(complete.cases(inkar.df[, all.vars(fm)]))])

# All variables we want ot lag
vars <- all.vars(fm)

# create listw with the correct rownames (ID = Kennziffer)
kreise.spdf$Kennziffer <- kreise.spdf$ags
knn <- knearneigh(st_centroid(kreise.spdf), k = 10)
nb <- knn2nb(knn, row.names = kreise.spdf$Kennziffer)
listw <- nb2listw(nb, style = "W")

for(y in years){
  # Select singe year
  tmp <- inkar.df[inkar.df$year == y ,]
  # Select variables and make df
  x <- st_drop_geometry(tmp[, vars])
  # Add ID as rownames (we retreive them again later)
  rownames(x) <- tmp$Kennziffer
  # Perform lag transformation (rownames contian ids)
  w.tmp <- create_WX(as.matrix(x),
                    listw = listw,
                    prefix = "w",
                    zero.policy = TRUE) # NAs will get zero
  w.tmp <- as.data.frame(w.tmp)
  
  # add indices back
  w.tmp$Kennziffer <- row.names(w.tmp)
  w.tmp$year <- y
  
  if(y == years[1]){
    w.inkar.df <- w.tmp
  }else{
    w.inkar.df <- rbind(w.inkar.df, w.tmp)
  }
}

head(w.inkar.df)
      w.life_expectancy w.gdp_in1000EUR w.unemployment_rate
01001            77.386         3866035              10.257
01002            77.355         3812976              10.394
01003            77.237        10728945              11.666
01004            77.458         4586244               9.999
01051            77.291         4270208              10.007
01053            77.119        11012351              11.878
      w.share_college w.car_density Kennziffer year
01001          18.558       518.092      01001 1998
01002          20.389       516.400      01002 1998
01003          23.075       497.344      01003 1998
01004          20.798       516.580      01004 1998
01051          18.957       520.985      01051 1998
01053          23.625       501.522      01053 1998
# Merge back 

inkar.df <- merge(inkar.df, w.inkar.df, by = c("Kennziffer", "year"))
  1. Estimate a twoways FE SLX panel model
slx.fe <- felm(life_expectancy ~ gdp_in1000EUR + unemployment_rate + share_college + car_density +
                 w.gdp_in1000EUR + w.unemployment_rate + w.share_college + w.car_density
                | Kennziffer + year | 0 | Kennziffer,
              data = inkar.df)

summary(slx.fe)

Call:
   felm(formula = life_expectancy ~ gdp_in1000EUR + unemployment_rate +      share_college + car_density + w.gdp_in1000EUR + w.unemployment_rate +      w.share_college + w.car_density | Kennziffer + year | 0 |      Kennziffer, data = inkar.df) 

Residuals:
     Min       1Q   Median       3Q      Max 
-1.62945 -0.17351  0.00156  0.17930  1.58230 

Coefficients:
                      Estimate Cluster s.e. t value Pr(>|t|)   
gdp_in1000EUR        1.370e-08    4.323e-09   3.170  0.00164 **
unemployment_rate    4.875e-04    1.127e-02   0.043  0.96553   
share_college        2.565e-03    1.818e-03   1.411  0.15909   
car_density          4.277e-04    3.351e-04   1.276  0.20254   
w.gdp_in1000EUR      3.397e-08    1.107e-08   3.069  0.00230 **
w.unemployment_rate -2.848e-02    1.239e-02  -2.299  0.02203 * 
w.share_college     -4.753e-04    2.506e-03  -0.190  0.84966   
w.car_density        1.038e-03    8.283e-04   1.254  0.21072   
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.2957 on 8770 degrees of freedom
Multiple R-squared(full model): 0.9602   Adjusted R-squared: 0.9582 
Multiple R-squared(proj model): 0.02528   Adjusted R-squared: -0.0224 
F-statistic(full model, *iid*):492.7 on 429 and 8770 DF, p-value: < 2.2e-16 
F-statistic(proj model): 4.508 on 8 and 399 DF, p-value: 2.929e-05 
  1. Estimate a twoways FE SAR panel model (use spml())
### Estimate model
sar.fe <- spml(life_expectancy ~ gdp_in1000EUR + unemployment_rate + share_college + car_density, 
               data = inkar.df, 
               index = c("Kennziffer", "year"), 
               listw = listw, 
               model= "within",
               effect= "twoways",
               lag = TRUE, 
               spatial.error = "none"
               )

summary(sar.fe)
Spatial panel fixed effects lag model
 

Call:
spml(formula = life_expectancy ~ gdp_in1000EUR + unemployment_rate + 
    share_college + car_density, data = inkar.df, index = c("Kennziffer", 
    "year"), listw = listw, model = "within", effect = "twoways", 
    lag = TRUE, spatial.error = "none")

Residuals:
       Min.     1st Qu.      Median     3rd Qu.        Max. 
-1.56935018 -0.16490692  0.00062493  0.16729792  1.38374051 

Spatial autoregressive coefficient:
       Estimate Std. Error t-value  Pr(>|t|)    
lambda  0.47997    0.01653  29.037 < 2.2e-16 ***

Coefficients:
                     Estimate  Std. Error t-value  Pr(>|t|)    
gdp_in1000EUR      1.2031e-08  1.4786e-09  8.1369 4.056e-16 ***
unemployment_rate -1.0767e-02  2.0558e-03 -5.2375 1.627e-07 ***
share_college      1.8501e-03  7.0611e-04  2.6202  0.008788 ** 
car_density        3.4915e-04  1.1950e-04  2.9218  0.003480 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
  1. Estimate the summary impacts.
sar.fe.imp <- impacts(sar.fe, listw = listw, time = length(years), R = 200)
summary(sar.fe.imp, zstats = TRUE, short = TRUE)
Impact measures (lag, trace):
                         Direct      Indirect         Total
gdp_in1000EUR      1.236588e-08  1.076990e-08  2.313578e-08
unemployment_rate -1.106695e-02 -9.638614e-03 -2.070556e-02
share_college      1.901619e-03  1.656190e-03  3.557809e-03
car_density        3.588594e-04  3.125439e-04  6.714033e-04
========================================================
Simulation results ( variance matrix):
========================================================
Simulated standard errors
                        Direct     Indirect        Total
gdp_in1000EUR     1.580503e-09 1.533227e-09 3.034055e-09
unemployment_rate 2.082529e-03 1.870369e-03 3.904303e-03
share_college     7.529830e-04 6.709583e-04 1.419872e-03
car_density       1.365610e-04 1.231071e-04 2.588810e-04

Simulated z-values:
                     Direct  Indirect     Total
gdp_in1000EUR      7.893393  7.065978  7.682551
unemployment_rate -5.331485 -5.150828 -5.311299
share_college      2.595160  2.530844  2.572206
car_density        2.554172  2.467431  2.520689

Simulated p-values:
                  Direct     Indirect   Total     
gdp_in1000EUR     2.8866e-15 1.5949e-12 1.5543e-14
unemployment_rate 9.7413e-08 2.5934e-07 1.0885e-07
share_college     0.0094547  0.011379   0.010105  
car_density       0.0106441  0.013609   0.011713