11 Exercises III
\[ \newcommand{\tr}{\mathrm{tr}} \newcommand{\rank}{\mathrm{rank}} \newcommand{\plim}{\operatornamewithlimits{plim}} \newcommand{\diag}{\mathrm{diag}} \newcommand{\bm}[1]{\boldsymbol{\mathbf{#1}}} \newcommand{\Var}{\mathrm{Var}} \newcommand{\Exp}{\mathrm{E}} \newcommand{\Cov}{\mathrm{Cov}} \newcommand\given[1][]{\:#1\vert\:} \newcommand{\irow}[1]{% \begin{pmatrix}#1\end{pmatrix} } \]
Required packages
Session info
R version 4.4.1 (2024-06-14 ucrt)
Platform: x86_64-w64-mingw32/x64
Running under: Windows 11 x64 (build 22631)
Matrix products: default
locale:
[1] LC_COLLATE=English_United Kingdom.utf8
[2] LC_CTYPE=English_United Kingdom.utf8
[3] LC_MONETARY=English_United Kingdom.utf8
[4] LC_NUMERIC=C
[5] LC_TIME=English_United Kingdom.utf8
time zone: Europe/Berlin
tzcode source: internal
attached base packages:
[1] stats graphics grDevices utils datasets methods
[7] base
other attached packages:
[1] SDPDmod_0.0.5 splm_1.6-5 lfe_3.0-0
[4] plm_2.6-4 viridisLite_0.4.2 tmap_3.3-4
[7] spatialreg_1.3-4 Matrix_1.7-0 spdep_1.3-5
[10] spData_2.3.1 mapview_2.11.2 sf_1.0-16
loaded via a namespace (and not attached):
[1] fastmap_1.2.0 leaflet_2.2.2 TH.data_1.1-2
[4] dotCall64_1.1-1 fixest_0.12.1 XML_3.99-0.16.1
[7] digest_0.6.35 lifecycle_1.0.4 dreamerr_1.4.0
[10] LearnBayes_2.15.1 survival_3.6-4 terra_1.7-78
[13] magrittr_2.0.3 compiler_4.4.1 rlang_1.1.4
[16] tools_4.4.1 collapse_2.0.14 knitr_1.47
[19] htmlwidgets_1.6.4 sp_2.1-4 classInt_0.4-10
[22] RColorBrewer_1.1-3 multcomp_1.4-25 abind_1.4-5
[25] KernSmooth_2.23-24 numDeriv_2016.8-1.1 leafsync_0.1.0
[28] grid_4.4.1 stats4_4.4.1 xtable_1.8-4
[31] e1071_1.7-14 leafem_0.2.3 colorspace_2.1-0
[34] scales_1.3.0 MASS_7.3-60.2 dichromat_2.0-0.1
[37] cli_3.6.2 mvtnorm_1.2-5 rmarkdown_2.27
[40] miscTools_0.6-28 generics_0.1.3 RSpectra_0.16-1
[43] rstudioapi_0.16.0 tmaptools_3.1-1 bdsmatrix_1.3-7
[46] DBI_1.2.3 proxy_0.4-27 stringr_1.5.1
[49] splines_4.4.1 stars_0.6-5 parallel_4.4.1
[52] s2_1.1.6 stringmagic_1.1.2 base64enc_0.1-3
[55] boot_1.3-30 sandwich_3.1-0 jsonlite_1.8.8
[58] Formula_1.2-5 crosstalk_1.2.1 units_0.8-5
[61] spam_2.10-0 glue_1.7.0 lwgeom_0.2-14
[64] codetools_0.2-20 stringi_1.8.4 deldir_2.0-4
[67] raster_3.6-26 lmtest_0.9-40 munsell_0.5.1
[70] htmltools_0.5.8.1 satellite_1.0.5 R6_2.5.1
[73] wk_0.9.1 maxLik_1.5-2.1 Rdpack_2.6
[76] evaluate_0.24.0 lattice_0.22-6 rbibutils_2.2.16
[79] png_0.1-8 class_7.3-22 Rcpp_1.0.12
[82] coda_0.19-4.1 nlme_3.1-164 xfun_0.45
[85] zoo_1.8-12
Reload data from pervious session
load("_data/msoa2_spatial.RData")
11.1 Environmental inequality (continued)
Let’s use the same neighbours weights definition as before:
coords <- st_centroid(msoa.spdf)
Warning: st_centroid assumes attributes are constant over
geometries
# Neighbours within 3km distance
dist_15.nb <- dnearneigh(coords, d1 = 0, d2 = 2500)
summary(dist_15.nb)
Neighbour list object:
Number of regions: 983
Number of nonzero links: 15266
Percentage nonzero weights: 1.579859
Average number of links: 15.53001
4 regions with no links:
158 463 478 505
6 disjoint connected subgraphs
Link number distribution:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
4 5 9 23 19 26 36 31 53 39 61 63 59 48 42 35 24 31 28 30 27 26
22 23 24 25 26 27 28 29 30 31 32 33 34
25 19 38 29 32 38 26 16 20 10 8 1 2
5 least connected regions:
160 469 474 597 959 with 1 link
2 most connected regions:
565 567 with 34 links
# There are some empty neighbour sets. Lets impute those with the nearest neighbour.
k2.nb <- knearneigh(coords, k = 1)
# Replace zero
nolink_ids <- which(card(dist_15.nb) == 0)
dist_15.nb[card(dist_15.nb) == 0] <- k2.nb$nn[nolink_ids, ]
summary(dist_15.nb)
Neighbour list object:
Number of regions: 983
Number of nonzero links: 15270
Percentage nonzero weights: 1.580273
Average number of links: 15.53408
2 disjoint connected subgraphs
Link number distribution:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
9 9 23 19 26 36 31 53 39 61 63 59 48 42 35 24 31 28 30 27 26 25
23 24 25 26 27 28 29 30 31 32 33 34
19 38 29 32 38 26 16 20 10 8 1 2
9 least connected regions:
158 160 463 469 474 478 505 597 959 with 1 link
2 most connected regions:
565 567 with 34 links
# listw object with row-normalization
dist_15.lw <- nb2listw(dist_15.nb, style = "W")
and estiamte the spatial SAR model:
mod_1.sar <- lagsarlm(log(no2) ~ per_mixed + per_asian + per_black + per_other
+ per_nonUK_EU + per_nonEU + log(POPDEN),
data = msoa.spdf,
listw = dist_15.lw,
Durbin = FALSE) # we could here extend to SDM
summary(mod_1.sar)
Call:
lagsarlm(formula = log(no2) ~ per_mixed + per_asian + per_black +
per_other + per_nonUK_EU + per_nonEU + log(POPDEN), data = msoa.spdf,
listw = dist_15.lw, Durbin = FALSE)
Residuals:
Min 1Q Median 3Q Max
-0.2140485 -0.0267085 -0.0021421 0.0238337 0.3505513
Type: lag
Coefficients: (asymptotic standard errors)
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.7004e-02 1.8122e-02 -0.9383 0.348110
per_mixed 3.4376e-04 1.4758e-03 0.2329 0.815810
per_asian -8.5205e-05 1.1494e-04 -0.7413 0.458507
per_black -4.2754e-04 2.3468e-04 -1.8218 0.068484
per_other 1.9693e-03 7.4939e-04 2.6279 0.008591
per_nonUK_EU 8.9027e-04 3.9638e-04 2.2460 0.024703
per_nonEU 1.8460e-03 3.5159e-04 5.2506 1.516e-07
log(POPDEN) 1.8650e-02 2.7852e-03 6.6963 2.138e-11
Rho: 0.9684, LR test value: 2002.5, p-value: < 2.22e-16
Asymptotic standard error: 0.0063124
z-value: 153.41, p-value: < 2.22e-16
Wald statistic: 23535, p-value: < 2.22e-16
Log likelihood: 1562.401 for lag model
ML residual variance (sigma squared): 0.0020568, (sigma: 0.045352)
Number of observations: 983
Number of parameters estimated: 10
AIC: -3104.8, (AIC for lm: -1104.3)
LM test for residual autocorrelation
test value: 108.97, p-value: < 2.22e-16
1) Please calculate the true multiplier matrix of this SAR model.
The multiplier matrix is given by \(({\bm I_N}-\rho {\bm W})^{-1}\).
W <- listw2mat(dist_15.lw)
I <- diag(dim(W)[1])
rho <- unname(mod_1.sar$rho)
M <- solve(I - rho*W)
M[1:10, 1:10]
1 2 3 4 5
[1,] 1.164650997 0.002433319 0.004089559 0.004034508 0.006545994
[2,] 0.010706605 1.407336301 0.643881932 0.370049927 0.464794934
[3,] 0.011246286 0.402426208 1.474021599 0.429011868 0.641526285
[4,] 0.008875918 0.185024963 0.343209495 1.684533322 0.614086824
[5,] 0.012000989 0.193664556 0.427684190 0.511739020 1.560840834
[6,] 0.010741524 0.192552594 0.452940016 0.631452476 0.672787841
[7,] 0.012779708 0.141953871 0.299247377 0.418234186 0.616895800
[8,] 0.014769006 0.125781189 0.253122442 0.295553039 0.500919513
[9,] 0.011708131 0.147549264 0.309080773 0.568442619 0.629156269
[10,] 0.009937859 0.152900148 0.306652041 0.727001926 0.553973310
6 7 8 9 10
[1,] 0.004882511 0.005808958 0.00872714 0.005854065 0.003613767
[2,] 0.385105188 0.283907742 0.32703109 0.324608381 0.244640237
[3,] 0.566175019 0.374059222 0.41132397 0.424986063 0.306652041
[4,] 0.631452476 0.418234186 0.38421895 0.625286881 0.581601541
[5,] 0.560656534 0.514079833 0.54266281 0.576726580 0.369315540
[6,] 1.571175245 0.558170218 0.46513922 0.661184961 0.543820047
[7,] 0.558170218 1.475511568 0.58520461 0.614170880 0.463886540
[8,] 0.357799398 0.450157392 1.46638195 0.474994894 0.272339890
[9,] 0.601077237 0.558337164 0.56135760 1.581077095 0.517983092
[10,] 0.679775059 0.579858175 0.44255232 0.712226751 1.560083138
2) Create an N x N effects matrix for the effect of the non-EU citizens. What is the effect of unit 6 on unit 10? Why is this larger than the effect of unit 5 on unit 8?
# For beta 1
beta <- mod_1.sar$coefficients
effM <- beta["per_nonEU"] * M
effM[1:10, 1:10]
1 2 3 4
[1,] 2.149995e-03 4.492010e-06 7.549498e-06 7.447872e-06
[2,] 1.976484e-05 2.598002e-03 1.188633e-03 6.831278e-04
[3,] 2.076112e-05 7.428958e-04 2.721106e-03 7.919740e-04
[4,] 1.638532e-05 3.415639e-04 6.335792e-04 3.109720e-03
[5,] 2.215433e-05 3.575129e-04 7.895231e-04 9.446918e-04
[6,] 1.982931e-05 3.554602e-04 8.361464e-04 1.165688e-03
[7,] 2.359188e-05 2.620528e-04 5.524233e-04 7.720780e-04
[8,] 2.726421e-05 2.321974e-04 4.672747e-04 5.456034e-04
[9,] 2.161370e-05 2.723822e-04 5.705762e-04 1.049369e-03
[10,] 1.834571e-05 2.822601e-04 5.660926e-04 1.342076e-03
5 6 7 8
[1,] 1.208418e-05 9.013321e-06 1.072358e-05 1.611067e-05
[2,] 8.580311e-04 7.109204e-04 5.241057e-04 6.037132e-04
[3,] 1.184285e-03 1.045183e-03 6.905291e-04 7.593214e-04
[4,] 1.133630e-03 1.165688e-03 7.720780e-04 7.092844e-04
[5,] 2.881378e-03 1.034996e-03 9.490131e-04 1.001778e-03
[6,] 1.241995e-03 2.900456e-03 1.030406e-03 8.586666e-04
[7,] 1.138816e-03 1.030406e-03 2.723857e-03 1.080312e-03
[8,] 9.247186e-04 6.605128e-04 8.310095e-04 2.707003e-03
[9,] 1.161449e-03 1.109614e-03 1.030714e-03 1.036290e-03
[10,] 1.022658e-03 1.254893e-03 1.070443e-03 8.169703e-04
9 10
[1,] 1.080685e-05 6.671166e-06
[2,] 5.992408e-04 4.516162e-04
[3,] 7.845422e-04 5.660926e-04
[4,] 1.154306e-03 1.073661e-03
[5,] 1.064662e-03 6.817721e-04
[6,] 1.220575e-03 1.003915e-03
[7,] 1.133785e-03 8.563541e-04
[8,] 8.768606e-04 5.027509e-04
[9,] 2.918735e-03 9.562187e-04
[10,] 1.314801e-03 2.879979e-03
# "Effect" of unit 6 on unit 10
effM[10, 6]
6
0.001254893
# "Effect" of unit 5 on unit 8
effM[8, 5]
5
0.0009247186
3) Calculate and interpret the summary impact measures of the SAR model.
mod_1.sar.imp <- impacts(mod_1.sar, listw = dist_15.lw, R = 300)
summary(mod_1.sar.imp)
Impact measures (lag, exact):
Direct Indirect Total
per_mixed 0.0004939013 0.010385844 0.010879745
per_asian -0.0001224192 -0.002574253 -0.002696672
per_black -0.0006142789 -0.012917166 -0.013531445
per_other 0.0028294759 0.059498722 0.062328198
per_nonUK_EU 0.0012791011 0.026897166 0.028176267
per_nonEU 0.0026523198 0.055773451 0.058425771
log(POPDEN) 0.0267960076 0.563471199 0.590267206
========================================================
Simulation results ( variance matrix):
Direct:
Iterations = 1:300
Thinning interval = 1
Number of chains = 1
Sample size per chain = 300
1. Empirical mean and standard deviation for each variable,
plus standard error of the mean:
Mean SD Naive SE Time-series SE
per_mixed 0.0001745 0.0020780 1.200e-04 1.200e-04
per_asian -0.0001336 0.0001714 9.896e-06 8.335e-06
per_black -0.0005954 0.0003387 1.956e-05 1.956e-05
per_other 0.0028443 0.0011054 6.382e-05 6.382e-05
per_nonUK_EU 0.0012857 0.0005820 3.360e-05 3.360e-05
per_nonEU 0.0026483 0.0004929 2.846e-05 2.846e-05
log(POPDEN) 0.0272698 0.0038311 2.212e-04 2.212e-04
2. Quantiles for each variable:
2.5% 25% 50% 75% 97.5%
per_mixed -0.0038905 -0.0014682 0.0001532 1.625e-03 3.956e-03
per_asian -0.0004527 -0.0002502 -0.0001377 -3.469e-05 1.923e-04
per_black -0.0012608 -0.0008239 -0.0006091 -3.778e-04 4.875e-05
per_other 0.0006565 0.0021010 0.0028389 3.588e-03 5.182e-03
per_nonUK_EU 0.0002459 0.0008845 0.0013231 1.641e-03 2.462e-03
per_nonEU 0.0015766 0.0023563 0.0026501 2.920e-03 3.706e-03
log(POPDEN) 0.0196959 0.0251516 0.0271791 2.961e-02 3.481e-02
========================================================
Indirect:
Iterations = 1:300
Thinning interval = 1
Number of chains = 1
Sample size per chain = 300
1. Empirical mean and standard deviation for each variable,
plus standard error of the mean:
Mean SD Naive SE Time-series SE
per_mixed 0.003658 0.046712 0.0026969 0.0026969
per_asian -0.002957 0.003998 0.0002308 0.0002308
per_black -0.012914 0.008211 0.0004741 0.0004741
per_other 0.061433 0.028135 0.0016244 0.0016244
per_nonUK_EU 0.027138 0.013036 0.0007527 0.0006604
per_nonEU 0.057503 0.018086 0.0010442 0.0010442
log(POPDEN) 0.584320 0.133273 0.0076945 0.0076945
2. Quantiles for each variable:
2.5% 25% 50% 75% 97.5%
per_mixed -0.083244 -0.029530 0.002813 0.0325991 0.089390
per_asian -0.010917 -0.005318 -0.002799 -0.0006719 0.004353
per_black -0.030136 -0.017562 -0.012604 -0.0074698 0.001150
per_other 0.013233 0.042098 0.059148 0.0771860 0.124014
per_nonUK_EU 0.004334 0.018699 0.026498 0.0333457 0.054834
per_nonEU 0.028556 0.046141 0.053447 0.0660868 0.100683
log(POPDEN) 0.369991 0.497071 0.565468 0.6474343 0.932730
========================================================
Total:
Iterations = 1:300
Thinning interval = 1
Number of chains = 1
Sample size per chain = 300
1. Empirical mean and standard deviation for each variable,
plus standard error of the mean:
Mean SD Naive SE Time-series SE
per_mixed 0.003832 0.048743 0.0028142 0.0028142
per_asian -0.003090 0.004164 0.0002404 0.0002404
per_black -0.013509 0.008522 0.0004920 0.0004920
per_other 0.064278 0.029108 0.0016805 0.0016805
per_nonUK_EU 0.028423 0.013557 0.0007827 0.0006880
per_nonEU 0.060151 0.018452 0.0010653 0.0010653
log(POPDEN) 0.611590 0.135087 0.0077993 0.0077993
2. Quantiles for each variable:
2.5% 25% 50% 75% 97.5%
per_mixed -0.087299 -0.030723 0.002957 0.0341656 0.092695
per_asian -0.011336 -0.005578 -0.002933 -0.0007075 0.004572
per_black -0.031225 -0.018446 -0.013258 -0.0078015 0.001199
per_other 0.013743 0.043935 0.061758 0.0807672 0.129270
per_nonUK_EU 0.004594 0.019548 0.027903 0.0349452 0.057126
per_nonEU 0.030439 0.048930 0.056091 0.0688583 0.104002
log(POPDEN) 0.391342 0.525207 0.595474 0.6767114 0.961312
4) Is SAR the right model choice or would you rather estimate a different model? Please run a Durbin model and caculate its impact summary measures
# Spatial Dubrbin model
mod_1.durb <- lagsarlm(log(no2) ~ per_mixed + per_asian + per_black + per_other
+ per_nonUK_EU + per_nonEU + log(POPDEN),
data = msoa.spdf,
listw = dist_15.lw,
Durbin = TRUE)
summary(mod_1.durb)
Call:
lagsarlm(formula = log(no2) ~ per_mixed + per_asian + per_black +
per_other + per_nonUK_EU + per_nonEU + log(POPDEN), data = msoa.spdf,
listw = dist_15.lw, Durbin = TRUE)
Residuals:
Min 1Q Median 3Q Max
-0.1854009 -0.0263818 -0.0020816 0.0229647 0.3321974
Type: mixed
Coefficients: (asymptotic standard errors)
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.00409824 0.01983728 -0.2066 0.83633
per_mixed 0.00434535 0.00218712 1.9868 0.04695
per_asian -0.00028620 0.00023959 -1.1945 0.23227
per_black -0.00056734 0.00034455 -1.6466 0.09964
per_other 0.00222708 0.00112918 1.9723 0.04857
per_nonUK_EU 0.00085417 0.00059478 1.4361 0.15097
per_nonEU 0.00095220 0.00052681 1.8075 0.07069
log(POPDEN) 0.02649122 0.00320358 8.2693 2.220e-16
lag.per_mixed -0.00475294 0.00315799 -1.5051 0.13231
lag.per_asian 0.00024092 0.00028983 0.8312 0.40584
lag.per_black 0.00025812 0.00054125 0.4769 0.63344
lag.per_other -0.00074506 0.00176141 -0.4230 0.67230
lag.per_nonUK_EU 0.00094549 0.00100320 0.9425 0.34595
lag.per_nonEU 0.00130970 0.00078547 1.6674 0.09544
lag.log(POPDEN) -0.02526415 0.00588517 -4.2928 1.764e-05
Rho: 0.98286, LR test value: 1536.9, p-value: < 2.22e-16
Asymptotic standard error: 0.0051804
z-value: 189.73, p-value: < 2.22e-16
Wald statistic: 35997, p-value: < 2.22e-16
Log likelihood: 1576.566 for mixed model
ML residual variance (sigma squared): 0.001969, (sigma: 0.044374)
Number of observations: 983
Number of parameters estimated: 17
AIC: -3119.1, (AIC for lm: -1584.3)
LM test for residual autocorrelation
test value: 103.97, p-value: < 2.22e-16
# Impact measures of the Durbin Error model
mod_1.durb.imp <- impacts(mod_1.durb, listw = dist_15.lw, R = 300)
summary(mod_1.durb.imp, zstats = TRUE, short = TRUE)
Impact measures (mixed, exact):
Direct Indirect Total
per_mixed 0.0040597904 -0.027843988 -0.023784197
per_asian -0.0003101210 -0.002332322 -0.002642443
per_black -0.0007447486 -0.017299184 -0.018043932
per_other 0.0030823781 0.083398408 0.086480787
per_nonUK_EU 0.0019115634 0.103104372 0.105015935
per_nonEU 0.0022824096 0.129706442 0.131988851
log(POPDEN) 0.0269491699 0.044653927 0.071603096
========================================================
Simulation results ( variance matrix):
========================================================
Simulated standard errors
Direct Indirect Total
per_mixed 0.0026426092 0.15237064 0.15383800
per_asian 0.0002435009 0.03368906 0.03377057
per_black 0.0004087996 0.05626414 0.05648197
per_other 0.0014460702 0.28832432 0.28912971
per_nonUK_EU 0.0007026514 0.08873903 0.08906055
per_nonEU 0.0007841476 0.21793605 0.21846836
log(POPDEN) 0.0048294250 0.44047304 0.44363907
Simulated z-values:
Direct Indirect Total
per_mixed 1.502545 -0.09423347 -0.06752413
per_asian -1.330223 -0.15862593 -0.16783455
per_black -1.869693 -0.43856895 -0.45040979
per_other 2.234748 0.37949852 0.38961840
per_nonUK_EU 2.741003 1.30070454 1.31763432
per_nonEU 3.020399 0.73967594 0.74871477
log(POPDEN) 5.564437 0.09572275 0.15561371
Simulated p-values:
Direct Indirect Total
per_mixed 0.1329565 0.92492 0.94616
per_asian 0.1834448 0.87396 0.86671
per_black 0.0615264 0.66097 0.65241
per_other 0.0254339 0.70432 0.69682
per_nonUK_EU 0.0061252 0.19336 0.18763
per_nonEU 0.0025244 0.45950 0.45403
log(POPDEN) 2.63e-08 0.92374 0.87634
5) Please repeat with a Durbin Error model. Why are the impacts here idenptical to the coefficients?
# Spatial Dubrbin model
mod_1.durbe <- errorsarlm(log(no2) ~ per_mixed + per_asian + per_black + per_other
+ per_nonUK_EU + per_nonEU + log(POPDEN),
data = msoa.spdf,
listw = dist_15.lw,
Durbin = TRUE)
summary(mod_1.durbe)
Call:
errorsarlm(formula = log(no2) ~ per_mixed + per_asian + per_black +
per_other + per_nonUK_EU + per_nonEU + log(POPDEN), data = msoa.spdf,
listw = dist_15.lw, Durbin = TRUE)
Residuals:
Min 1Q Median 3Q Max
-0.1839285 -0.0254426 -0.0027042 0.0216084 0.2944840
Type: error
Coefficients: (asymptotic standard errors)
Estimate Std. Error z value Pr(>|z|)
(Intercept) 2.64939215 0.24748370 10.7053 < 2.2e-16
per_mixed 0.00553333 0.00223688 2.4737 0.013373
per_asian -0.00017156 0.00024183 -0.7094 0.478062
per_black -0.00057947 0.00034426 -1.6832 0.092334
per_other 0.00203392 0.00112534 1.8074 0.070701
per_nonUK_EU 0.00086254 0.00058902 1.4644 0.143091
per_nonEU 0.00135822 0.00053480 2.5397 0.011096
log(POPDEN) 0.02716824 0.00354239 7.6695 1.732e-14
lag.per_mixed 0.00107140 0.00819322 0.1308 0.895960
lag.per_asian -0.00060616 0.00070080 -0.8649 0.387069
lag.per_black -0.00191733 0.00130997 -1.4636 0.143291
lag.per_other 0.01014125 0.00496979 2.0406 0.041293
lag.per_nonUK_EU 0.00925620 0.00217624 4.2533 2.106e-05
lag.per_nonEU 0.00563564 0.00185541 3.0374 0.002386
lag.log(POPDEN) -0.01370891 0.01128957 -1.2143 0.224634
Lambda: 0.99424, LR test value: 1527.8, p-value: < 2.22e-16
Asymptotic standard error: 0.0024921
z-value: 398.96, p-value: < 2.22e-16
Wald statistic: 159170, p-value: < 2.22e-16
Log likelihood: 1572.051 for error model
ML residual variance (sigma squared): 0.0019546, (sigma: 0.044211)
Number of observations: 983
Number of parameters estimated: 17
AIC: -3110.1, (AIC for lm: -1584.3)
# Impact measures of the Durbin model
mod_1.durbe.imp <- impacts(mod_1.durbe, listw = dist_15.lw, R = 300)
summary(mod_1.durbe.imp, zstats = TRUE, short = TRUE)
Impact measures (SDEM, glht, n):
Direct Indirect Total
per_mixed 0.0055333298 0.0010713981 0.0066047279
per_asian -0.0001715584 -0.0006061567 -0.0007777151
per_black -0.0005794704 -0.0019173261 -0.0024967965
per_other 0.0020339226 0.0101412504 0.0121751729
per_nonUK_EU 0.0008625423 0.0092561971 0.0101187394
per_nonEU 0.0013582217 0.0056356426 0.0069938643
log(POPDEN) 0.0271682392 -0.0137089088 0.0134593305
========================================================
Standard errors:
Direct Indirect Total
per_mixed 0.0022368774 0.0081932151 0.0089412446
per_asian 0.0002418282 0.0007008041 0.0007540247
per_black 0.0003442649 0.0013099673 0.0013639632
per_other 0.0011253352 0.0049697880 0.0051074709
per_nonUK_EU 0.0005890159 0.0021762395 0.0022231333
per_nonEU 0.0005348024 0.0018554128 0.0020196122
log(POPDEN) 0.0035423870 0.0112895670 0.0132006518
========================================================
Z-values:
Direct Indirect Total
per_mixed 2.4736849 0.1307665 0.738681
per_asian -0.7094226 -0.8649446 -1.031419
per_black -1.6832108 -1.4636442 -1.830545
per_other 1.8073926 2.0405801 2.383797
per_nonUK_EU 1.4643785 4.2532989 4.551567
per_nonEU 2.5396703 3.0374063 3.462974
log(POPDEN) 7.6694724 -1.2142989 1.019596
p-values:
Direct Indirect Total
per_mixed 0.013373 0.8959600 0.46010070
per_asian 0.478062 0.3870692 0.30234457
per_black 0.092334 0.1432912 0.06716843
per_other 0.070701 0.0412926 0.01713506
per_nonUK_EU 0.143091 2.1064e-05 5.3248e-06
per_nonEU 0.011096 0.0023862 0.00053424
log(POPDEN) 1.7319e-14 0.2246336 0.30792015
11.2 Inkar data: the effect of regional characteristics on life expectancy
Below, we read and transform some characteristics of the INKAR data on German counties.
load("_data/inkar2.Rdata")
Variables are
Variable | Description |
---|---|
“Kennziffer” | ID |
“Raumeinheit” | Name |
“Aggregat” | Level |
“year” | Year |
“poluation_density” | Population Density |
“median_income” | Median Household income (only for 2020) |
“gdp_in1000EUR” | Gross Domestic Product in 1000 euros |
“unemployment_rate” | Unemployment rate |
“share_longterm_unemployed” | Share of longterm unemployed (among unemployed) |
“share_working_indutry” | Share of employees in undistrial sector |
“share_foreigners” | Share of foreign nationals |
“share_college” | Share of school-finishers with college degree |
“recreational_space” | Recreational space per inhabitant |
“car_density” | Density of cars |
“life_expectancy” | Life expectancy |
11.3 County shapes
kreise.spdf <- st_read(dsn = "_data/vg5000_ebenen_1231",
layer = "VG5000_KRS")
Reading layer `VG5000_KRS' from data source
`C:\work\Lehre\Geodata_Spatial_Regression\_data\vg5000_ebenen_1231'
using driver `ESRI Shapefile'
Simple feature collection with 400 features and 24 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: 280353.1 ymin: 5235878 xmax: 921261.6 ymax: 6101302
Projected CRS: ETRS89 / UTM zone 32N
1) Please map the life expectancy across Germany
- Merge data with the shape file (as with conventional data)
# Merge
inkar_2020.spdf <- merge(kreise.spdf, inkar.df[inkar.df$year == 2020, ],
by.x = "AGS", by.y = "Kennziffer")
- Create a map of life-expectancy
cols <- viridis(n = 100, direction = -1, option = "G")
mp1 <- tm_shape(inkar_2020.spdf) +
tm_fill(col = "life_expectancy",
style = "cont", # algorithm to def cut points
palette = cols, # colours
stretch.palette = TRUE,
title = "in years"
) +
tm_borders(col = "white", lwd = 0.5, alpha = 0.5) +
tm_layout(frame = FALSE,
legend.frame = TRUE, legend.bg.color = TRUE,
legend.position = c("right", "bottom"),
legend.outside = FALSE,
main.title = "Life expectancy",
main.title.position = "center",
main.title.size = 1.6,
legend.title.size = 0.8,
legend.text.size = 0.8)
mp1
2) Chose some variables that could predict life expectancy. See for instance the following paper.
3) Generate a neighbours object (e.g. the 10 nearest neighbours).
# nb <- poly2nb(kreise.spdf, row.names = "ags", queen = TRUE)
knn <- knearneigh(st_centroid(kreise.spdf), k = 10)
Warning: st_centroid assumes attributes are constant over
geometries
nb <- knn2nb(knn, row.names = kreise.spdf$ags)
listw <- nb2listw(nb, style = "W")
4) Estimate a cross-sectional spatial model for the year 2020 and calculate the impacts.
### Use a spatial Durbin Error model
# Spec formula
fm <- life_expectancy ~ median_income + unemployment_rate + share_college + car_density
# Estimate error model with Durbin = TRUE
mod_1.durb <- errorsarlm(fm,
data = inkar_2020.spdf,
listw = listw,
Durbin = TRUE)
summary(mod_1.durb)
Call:
errorsarlm(formula = fm, data = inkar_2020.spdf, listw = listw,
Durbin = TRUE)
Residuals:
Min 1Q Median 3Q Max
-1.343988 -0.349564 0.013309 0.333105 1.819014
Type: error
Coefficients: (asymptotic standard errors)
Estimate Std. Error z value Pr(>|z|)
(Intercept) 8.4970e+01 1.4366e+00 59.1460 < 2.2e-16
median_income 5.4013e-04 8.2285e-05 6.5642 5.233e-11
unemployment_rate -3.8970e-01 2.0095e-02 -19.3923 < 2.2e-16
share_college 6.7806e-03 3.2502e-03 2.0862 0.036957
car_density -3.2042e-03 4.9774e-04 -6.4376 1.214e-10
lag.median_income 4.9282e-04 1.8112e-04 2.7209 0.006510
lag.unemployment_rate -3.4685e-02 4.5454e-02 -0.7631 0.445415
lag.share_college -1.7066e-03 7.0324e-03 -0.2427 0.808257
lag.car_density -5.2210e-03 1.7540e-03 -2.9766 0.002915
Lambda: 0.57895, LR test value: 48.146, p-value: 3.9563e-12
Asymptotic standard error: 0.069524
z-value: 8.3274, p-value: < 2.22e-16
Wald statistic: 69.345, p-value: < 2.22e-16
Log likelihood: -305.6855 for error model
ML residual variance (sigma squared): 0.26001, (sigma: 0.50991)
Number of observations: 400
Number of parameters estimated: 11
AIC: 633.37, (AIC for lm: 679.52)
# Calculate impacts (which is unnecessary in this case)
mod_1.durb.imp <- impacts(mod_1.durb, listw = listw, R = 300)
summary(mod_1.durb.imp, zstats = TRUE, short = TRUE)
Impact measures (SDEM, glht, n):
Direct Indirect Total
median_income 0.0005401288 0.0004928218 0.001032951
unemployment_rate -0.3896966886 -0.0346850580 -0.424381747
share_college 0.0067806080 -0.0017065790 0.005074029
car_density -0.0032042420 -0.0052210239 -0.008425266
========================================================
Standard errors:
Direct Indirect Total
median_income 8.228463e-05 0.0001811232 0.0001813201
unemployment_rate 2.009540e-02 0.0454539601 0.0455753576
share_college 3.250157e-03 0.0070323541 0.0069549498
car_density 4.977410e-04 0.0017540488 0.0018942466
========================================================
Z-values:
Direct Indirect Total
median_income 6.564152 2.7209207 5.6968335
unemployment_rate -19.392336 -0.7630811 -9.3116493
share_college 2.086240 -0.2426754 0.7295565
car_density -6.437570 -2.9765557 -4.4478191
p-values:
Direct Indirect Total
median_income 5.233e-11 0.0065100 1.2205e-08
unemployment_rate < 2.22e-16 0.4454150 < 2.22e-16
share_college 0.036957 0.8082569 0.46566
car_density 1.214e-10 0.0029151 8.6747e-06
5) Calculate the spatial lagged variables for your covariates (e.g. use create_WX(), which needs a non-spatial df as input) .
6) Can you run a spatial machine learning model? (for instance, using randomForest
)?
randomForest 4.7-1.1
Type rfNews() to see new features/changes/bug fixes.
# Train
rf.mod <- randomForest(life_expectancy ~ median_income + unemployment_rate + share_college + car_density +
w.median_income + w.unemployment_rate + w.share_college + w.car_density,
data = st_drop_geometry(inkar_2020.spdf),
ntree = 1000,
importance = TRUE)
# Inspect the mechanics of the model
importance(rf.mod)
%IncMSE IncNodePurity
median_income 33.24305 41.24505
unemployment_rate 67.21736 118.99576
share_college 23.17769 25.85066
car_density 25.46301 33.12425
w.median_income 36.07124 57.94637
w.unemployment_rate 26.67292 50.25835
w.share_college 16.17076 24.40972
w.car_density 23.68573 30.67772
varImpPlot(rf.mod)
You could even go further and use higher order neighbours (e.g. nblag(queens.nb, maxlag = 3)
) to check the importance of direct neighbours and the neighbours neighbours and so on …
11.4 Esimate an FE model with SLX specification
- Loops over years to generate WX
# We use gdp instead of median income (which is only available in recent year)
fm <- life_expectancy ~ gdp_in1000EUR + unemployment_rate + share_college + car_density
# All years where we have a balanced sample
years <- unique(inkar.df$year[which(complete.cases(inkar.df[, all.vars(fm)]))])
# All variables we want ot lag
vars <- all.vars(fm)
# create listw with the correct rownames (ID = Kennziffer)
kreise.spdf$Kennziffer <- kreise.spdf$ags
knn <- knearneigh(st_centroid(kreise.spdf), k = 10)
nb <- knn2nb(knn, row.names = kreise.spdf$Kennziffer)
listw <- nb2listw(nb, style = "W")
for(y in years){
# Select singe year
tmp <- inkar.df[inkar.df$year == y ,]
# Select variables and make df
x <- st_drop_geometry(tmp[, vars])
# Add ID as rownames (we retreive them again later)
rownames(x) <- tmp$Kennziffer
# Perform lag transformation (rownames contian ids)
w.tmp <- create_WX(as.matrix(x),
listw = listw,
prefix = "w",
zero.policy = TRUE) # NAs will get zero
w.tmp <- as.data.frame(w.tmp)
# add indices back
w.tmp$Kennziffer <- row.names(w.tmp)
w.tmp$year <- y
if(y == years[1]){
w.inkar.df <- w.tmp
}else{
w.inkar.df <- rbind(w.inkar.df, w.tmp)
}
}
head(w.inkar.df)
w.life_expectancy w.gdp_in1000EUR w.unemployment_rate
01001 77.386 3866035 10.257
01002 77.355 3812976 10.394
01003 77.237 10728945 11.666
01004 77.458 4586244 9.999
01051 77.291 4270208 10.007
01053 77.119 11012351 11.878
w.share_college w.car_density Kennziffer year
01001 18.558 518.092 01001 1998
01002 20.389 516.400 01002 1998
01003 23.075 497.344 01003 1998
01004 20.798 516.580 01004 1998
01051 18.957 520.985 01051 1998
01053 23.625 501.522 01053 1998
- Estimate a twoways FE SLX panel model
slx.fe <- felm(life_expectancy ~ gdp_in1000EUR + unemployment_rate + share_college + car_density +
w.gdp_in1000EUR + w.unemployment_rate + w.share_college + w.car_density
| Kennziffer + year | 0 | Kennziffer,
data = inkar.df)
summary(slx.fe)
Call:
felm(formula = life_expectancy ~ gdp_in1000EUR + unemployment_rate + share_college + car_density + w.gdp_in1000EUR + w.unemployment_rate + w.share_college + w.car_density | Kennziffer + year | 0 | Kennziffer, data = inkar.df)
Residuals:
Min 1Q Median 3Q Max
-1.62945 -0.17351 0.00156 0.17930 1.58230
Coefficients:
Estimate Cluster s.e. t value Pr(>|t|)
gdp_in1000EUR 1.370e-08 4.323e-09 3.170 0.00164 **
unemployment_rate 4.875e-04 1.127e-02 0.043 0.96553
share_college 2.565e-03 1.818e-03 1.411 0.15909
car_density 4.277e-04 3.351e-04 1.276 0.20254
w.gdp_in1000EUR 3.397e-08 1.107e-08 3.069 0.00230 **
w.unemployment_rate -2.848e-02 1.239e-02 -2.299 0.02203 *
w.share_college -4.753e-04 2.506e-03 -0.190 0.84966
w.car_density 1.038e-03 8.283e-04 1.254 0.21072
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.2957 on 8770 degrees of freedom
Multiple R-squared(full model): 0.9602 Adjusted R-squared: 0.9582
Multiple R-squared(proj model): 0.02528 Adjusted R-squared: -0.0224
F-statistic(full model, *iid*):492.7 on 429 and 8770 DF, p-value: < 2.2e-16
F-statistic(proj model): 4.508 on 8 and 399 DF, p-value: 2.929e-05
- Estimate a twoways FE SAR panel model (use
spml()
)
### Estimate model
sar.fe <- spml(life_expectancy ~ gdp_in1000EUR + unemployment_rate + share_college + car_density,
data = inkar.df,
index = c("Kennziffer", "year"),
listw = listw,
model= "within",
effect= "twoways",
lag = TRUE,
spatial.error = "none"
)
summary(sar.fe)
Spatial panel fixed effects lag model
Call:
spml(formula = life_expectancy ~ gdp_in1000EUR + unemployment_rate +
share_college + car_density, data = inkar.df, index = c("Kennziffer",
"year"), listw = listw, model = "within", effect = "twoways",
lag = TRUE, spatial.error = "none")
Residuals:
Min. 1st Qu. Median 3rd Qu. Max.
-1.56935018 -0.16490692 0.00062493 0.16729792 1.38374051
Spatial autoregressive coefficient:
Estimate Std. Error t-value Pr(>|t|)
lambda 0.47997 0.01653 29.037 < 2.2e-16 ***
Coefficients:
Estimate Std. Error t-value Pr(>|t|)
gdp_in1000EUR 1.2031e-08 1.4786e-09 8.1369 4.056e-16 ***
unemployment_rate -1.0767e-02 2.0558e-03 -5.2375 1.627e-07 ***
share_college 1.8501e-03 7.0611e-04 2.6202 0.008788 **
car_density 3.4915e-04 1.1950e-04 2.9218 0.003480 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
- Estimate the summary impacts.
sar.fe.imp <- impacts(sar.fe, listw = listw, time = length(years), R = 200)
summary(sar.fe.imp, zstats = TRUE, short = TRUE)
Impact measures (lag, trace):
Direct Indirect Total
gdp_in1000EUR 1.236588e-08 1.076990e-08 2.313578e-08
unemployment_rate -1.106695e-02 -9.638614e-03 -2.070556e-02
share_college 1.901619e-03 1.656190e-03 3.557809e-03
car_density 3.588594e-04 3.125439e-04 6.714033e-04
========================================================
Simulation results ( variance matrix):
========================================================
Simulated standard errors
Direct Indirect Total
gdp_in1000EUR 1.580503e-09 1.533227e-09 3.034055e-09
unemployment_rate 2.082529e-03 1.870369e-03 3.904303e-03
share_college 7.529830e-04 6.709583e-04 1.419872e-03
car_density 1.365610e-04 1.231071e-04 2.588810e-04
Simulated z-values:
Direct Indirect Total
gdp_in1000EUR 7.893393 7.065978 7.682551
unemployment_rate -5.331485 -5.150828 -5.311299
share_college 2.595160 2.530844 2.572206
car_density 2.554172 2.467431 2.520689
Simulated p-values:
Direct Indirect Total
gdp_in1000EUR 2.8866e-15 1.5949e-12 1.5543e-14
unemployment_rate 9.7413e-08 2.5934e-07 1.0885e-07
share_college 0.0094547 0.011379 0.010105
car_density 0.0106441 0.013609 0.011713