9 Exercises II
\[ \newcommand{\Exp}{\mathrm{E}} \newcommand\given[1][]{\:#1\vert\:} \newcommand{\Cov}{\mathrm{Cov}} \newcommand{\Var}{\mathrm{Var}} \newcommand{\rank}{\mathrm{rank}} \newcommand{\bm}[1]{\boldsymbol{\mathbf{#1}}} \]
Required packages
Session info
R version 4.6.0 (2026-04-24 ucrt)
Platform: x86_64-w64-mingw32/x64
Running under: Windows 11 x64 (build 26200)
Matrix products: default
LAPACK version 3.12.1
locale:
[1] LC_COLLATE=German_Germany.utf8 LC_CTYPE=German_Germany.utf8
[3] LC_MONETARY=German_Germany.utf8 LC_NUMERIC=C
[5] LC_TIME=German_Germany.utf8
time zone: Europe/Berlin
tzcode source: internal
attached base packages:
[1] stats graphics grDevices utils datasets methods
[7] base
other attached packages:
[1] viridisLite_0.4.3 tmap_4.4 spatialreg_1.4-3
[4] Matrix_1.7-5 spdep_1.4-2 spData_2.3.5
[7] mapview_2.11.4 sf_1.1-1
loaded via a namespace (and not attached):
[1] xfun_0.57 raster_3.6-32
[3] htmlwidgets_1.6.4 lattice_0.22-9
[5] tools_4.6.0 crosstalk_1.2.2
[7] LearnBayes_2.15.2 generics_0.1.4
[9] parallel_4.6.0 stats4_4.6.0
[11] sandwich_3.1-1 proxy_0.4-29
[13] KernSmooth_2.23-26 satellite_1.0.6
[15] data.table_1.18.4 RColorBrewer_1.1-3
[17] leaflet_2.2.3 lifecycle_1.0.5
[19] compiler_4.6.0 farver_2.1.2
[21] deldir_2.0-4 microbenchmark_1.5.0
[23] terra_1.9-27 leafsync_0.1.0
[25] codetools_0.2-20 leaflegend_1.2.8
[27] stars_0.7-2 htmltools_0.5.9
[29] marginaleffects_0.32.0 class_7.3-23
[31] MASS_7.3-65 classInt_0.4-11
[33] lwgeom_0.2-16 wk_0.9.5
[35] abind_1.4-8 boot_1.3-32
[37] multcomp_1.4-30 nlme_3.1-169
[39] digest_0.6.39 mvtnorm_1.3-7
[41] splines_4.6.0 fastmap_1.2.0
[43] grid_4.6.0 colorspace_2.1-2
[45] cli_3.6.6 logger_0.4.2
[47] magrittr_2.0.5 maptiles_0.11.0
[49] base64enc_0.1-6 XML_3.99-0.23
[51] cols4all_0.10 survival_3.8-6
[53] leafem_0.2.5 TH.data_1.1-5
[55] e1071_1.7-17 scales_1.4.0
[57] backports_1.5.1 sp_2.2-1
[59] rmarkdown_2.31 otel_0.2.0
[61] zoo_1.8-15 png_0.1-9
[63] coda_0.19-4.1 evaluate_1.0.5
[65] knitr_1.51 tmaptools_3.3
[67] s2_1.1.11 rlang_1.2.0
[69] Rcpp_1.1.1-1.1 glue_1.8.1
[71] DBI_1.3.0 rstudioapi_0.18.0
[73] jsonlite_2.0.0 spacesXYZ_1.6-0
[75] R6_2.6.1 units_1.0-1
Reload data from pervious session
load("_data/msoa2_spatial.RData")9.1 Environmental inequality
We would like to investigate the following descriptive research question: Are immigrant minorities in London exposed to higher levels of pollution? One could for instance do the following: use pollution as the outcome, and test if demographics (e.g. the share of immigrant minorities) predicts the level of pollution. We also want to consider the spatial structure: is it only the focal unit that matters or spatial clusters with high shares of minorities?
1) Define a neigbours weights object the includes everything within a given distance as neighbours.
Let’s assume a typical neighbourhood would be 2.5km in diameter, so we can use distance based neighbours.
coords <- st_centroid(msoa.spdf)Warning: st_centroid assumes attributes are constant over
geometries
# Neighbours within 3km distance
dist_15.nb <- dnearneigh(coords, d1 = 0, d2 = 2500)Warning in dnearneigh(coords, d1 = 0, d2 = 2500): neighbour object
has 6 sub-graphs
summary(dist_15.nb)Neighbour list object:
Number of regions: 983
Number of nonzero links: 15266
Percentage nonzero weights: 1.579859
Average number of links: 15.53001
4 regions with no links:
158, 463, 478, 505
6 disjoint connected subgraphs
Link number distribution:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
4 5 9 23 19 26 36 31 53 39 61 63 59 48 42 35 24 31 28 30 27 26
22 23 24 25 26 27 28 29 30 31 32 33 34
25 19 38 29 32 38 26 16 20 10 8 1 2
5 least connected regions:
160 469 474 597 959 with 1 link
2 most connected regions:
565 567 with 34 links
Neighbour list object:
Number of regions: 983
Number of nonzero links: 15270
Percentage nonzero weights: 1.580273
Average number of links: 15.53408
6 disjoint connected subgraphs
Link number distribution:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
9 9 23 19 26 36 31 53 39 61 63 59 48 42 35 24 31 28 30 27 26 25
23 24 25 26 27 28 29 30 31 32 33 34
19 38 29 32 38 26 16 20 10 8 1 2
9 least connected regions:
158 160 463 469 474 478 505 597 959 with 1 link
2 most connected regions:
565 567 with 34 links
# listw object with row-normalization
dist_15.lw <- nb2listw(dist_15.nb, style = "W")2) Estimate the level of spatial auto-correlation in air pollution using the spatial weights matrix you have created above.
moran.test(msoa.spdf$no2, listw = dist_15.lw)
Moran I test under randomisation
data: msoa.spdf$no2
weights: dist_15.lw
Moran I statistic standard deviate = 65.197, p-value <
2.2e-16
alternative hypothesis: greater
sample estimates:
Moran I statistic Expectation Variance
0.891520698 -0.001018330 0.000187411
4) Estimate a Spatial SEM regression model for the same model as above.
Call:
errorsarlm(formula = log(no2) ~ per_mixed + per_asian + per_black +
per_other + per_nonUK_EU + per_nonEU + log(POPDEN), data = msoa.spdf,
listw = dist_15.lw, Durbin = FALSE)
Residuals:
Min 1Q Median 3Q Max
-0.207751 -0.027321 -0.004065 0.023338 0.353188
Type: error
Coefficients: (asymptotic standard errors)
Estimate Std. Error z value Pr(>|z|)
(Intercept) 2.52996667 0.36112709 7.0058 2.457e-12
per_mixed 0.00368862 0.00222547 1.6575 0.097427
per_asian -0.00019513 0.00024168 -0.8074 0.419428
per_black -0.00041368 0.00034777 -1.1895 0.234235
per_other 0.00295571 0.00114515 2.5811 0.009849
per_nonUK_EU 0.00154967 0.00059535 2.6030 0.009243
per_nonEU 0.00097237 0.00053695 1.8109 0.070153
log(POPDEN) 0.02738580 0.00326415 8.3899 < 2.2e-16
Lambda: 0.99598, LR test value: 1964.9, p-value: < 2.22e-16
Asymptotic standard error: 0.0018225
z-value: 546.48, p-value: < 2.22e-16
Wald statistic: 298640, p-value: < 2.22e-16
Log likelihood: 1543.617 for error model
ML residual variance (sigma squared): 0.0020635, (sigma: 0.045426)
Number of observations: 983
Number of parameters estimated: 10
AIC: -3067.2, (AIC for lm: -1104.3)
5) Estimate a Spatial SLX regression model for the same model as above.
Call:
lm(formula = formula(paste("y ~ ", paste(colnames(x)[-1], collapse = "+"))),
data = as.data.frame(x), weights = weights)
Coefficients:
Estimate Std. Error t value
(Intercept) 1.943e+00 4.115e-02 4.720e+01
per_mixed 4.910e-03 5.282e-03 9.296e-01
per_asian -1.357e-03 5.779e-04 -2.348e+00
per_black -1.421e-03 8.321e-04 -1.708e+00
per_other 4.510e-05 2.726e-03 1.655e-02
per_nonUK_EU 1.821e-04 1.436e-03 1.269e-01
per_nonEU 1.605e-03 1.272e-03 1.262e+00
log.POPDEN. 4.202e-02 7.736e-03 5.432e+00
lag.per_mixed -1.251e-02 7.626e-03 -1.641e+00
lag.per_asian 1.080e-03 6.956e-04 1.552e+00
lag.per_black -1.395e-03 1.306e-03 -1.069e+00
lag.per_other 1.915e-02 4.229e-03 4.528e+00
lag.per_nonUK_EU 1.038e-02 2.406e-03 4.314e+00
lag.per_nonEU 7.459e-03 1.891e-03 3.945e+00
lag.log.POPDEN. 2.332e-01 1.400e-02 1.666e+01
Pr(>|t|)
(Intercept) 2.535e-253
per_mixed 3.528e-01
per_asian 1.907e-02
per_black 8.795e-02
per_other 9.868e-01
per_nonUK_EU 8.991e-01
per_nonEU 2.073e-01
log.POPDEN. 7.064e-08
lag.per_mixed 1.011e-01
lag.per_asian 1.210e-01
lag.per_black 2.856e-01
lag.per_other 6.707e-06
lag.per_nonUK_EU 1.767e-05
lag.per_nonEU 8.560e-05
lag.log.POPDEN. 5.786e-55
6) Estimate a Spatial Durbin regression model for the same model as above.
Call:
lagsarlm(formula = log(no2) ~ per_mixed + per_asian + per_black +
per_other + per_nonUK_EU + per_nonEU + log(POPDEN), data = msoa.spdf,
listw = dist_15.lw, Durbin = TRUE)
Residuals:
Min 1Q Median 3Q Max
-0.1854009 -0.0263818 -0.0020816 0.0229647 0.3321974
Type: mixed
Coefficients: (asymptotic standard errors)
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.00409824 0.01983728 -0.2066 0.83633
per_mixed 0.00434535 0.00218712 1.9868 0.04695
per_asian -0.00028620 0.00023959 -1.1945 0.23227
per_black -0.00056734 0.00034455 -1.6466 0.09964
per_other 0.00222708 0.00112918 1.9723 0.04857
per_nonUK_EU 0.00085417 0.00059478 1.4361 0.15097
per_nonEU 0.00095220 0.00052681 1.8075 0.07069
log(POPDEN) 0.02649122 0.00320358 8.2693 2.220e-16
lag.per_mixed -0.00475294 0.00315799 -1.5051 0.13231
lag.per_asian 0.00024092 0.00028983 0.8312 0.40584
lag.per_black 0.00025812 0.00054125 0.4769 0.63344
lag.per_other -0.00074506 0.00176141 -0.4230 0.67230
lag.per_nonUK_EU 0.00094549 0.00100320 0.9425 0.34595
lag.per_nonEU 0.00130970 0.00078547 1.6674 0.09544
lag.log(POPDEN) -0.02526415 0.00588517 -4.2928 1.764e-05
Rho: 0.98286, LR test value: 1536.9, p-value: < 2.22e-16
Asymptotic standard error: 0.0051804
z-value: 189.73, p-value: < 2.22e-16
Wald statistic: 35997, p-value: < 2.22e-16
Log likelihood: 1576.566 for mixed model
ML residual variance (sigma squared): 0.001969, (sigma: 0.044374)
Number of observations: 983
Number of parameters estimated: 17
AIC: -3119.1, (AIC for lm: -1584.3)
LM test for residual autocorrelation
test value: 103.97, p-value: < 2.22e-16
7) Estimate a Spatial Durbin Error regression model for the same model as above.
Call:
errorsarlm(formula = log(no2) ~ per_mixed + per_asian + per_black +
per_other + per_nonUK_EU + per_nonEU + log(POPDEN), data = msoa.spdf,
listw = dist_15.lw, Durbin = TRUE)
Residuals:
Min 1Q Median 3Q Max
-0.1839285 -0.0254426 -0.0027042 0.0216084 0.2944840
Type: error
Coefficients: (asymptotic standard errors)
Estimate Std. Error z value Pr(>|z|)
(Intercept) 2.64939215 0.24748370 10.7053 < 2.2e-16
per_mixed 0.00553333 0.00223688 2.4737 0.013373
per_asian -0.00017156 0.00024183 -0.7094 0.478062
per_black -0.00057947 0.00034426 -1.6832 0.092334
per_other 0.00203392 0.00112534 1.8074 0.070701
per_nonUK_EU 0.00086254 0.00058902 1.4644 0.143091
per_nonEU 0.00135822 0.00053480 2.5397 0.011096
log(POPDEN) 0.02716824 0.00354239 7.6695 1.732e-14
lag.per_mixed 0.00107140 0.00819322 0.1308 0.895960
lag.per_asian -0.00060616 0.00070080 -0.8649 0.387069
lag.per_black -0.00191733 0.00130997 -1.4636 0.143291
lag.per_other 0.01014125 0.00496979 2.0406 0.041293
lag.per_nonUK_EU 0.00925620 0.00217624 4.2533 2.106e-05
lag.per_nonEU 0.00563564 0.00185541 3.0374 0.002386
lag.log(POPDEN) -0.01370891 0.01128957 -1.2143 0.224634
Lambda: 0.99424, LR test value: 1527.8, p-value: < 2.22e-16
Asymptotic standard error: 0.0024921
z-value: 398.96, p-value: < 2.22e-16
Wald statistic: 159170, p-value: < 2.22e-16
Log likelihood: 1572.051 for error model
ML residual variance (sigma squared): 0.0019546, (sigma: 0.044211)
Number of observations: 983
Number of parameters estimated: 17
AIC: -3110.1, (AIC for lm: -1584.3)
8) Sneak preview on future session: Which of the spatial model specifications about would you choose / prefer in a real world example?
9) If we model spatial dependence in statistical models, it is sometimes helpful to calculate spatial lags by hand. Please calculate the spatial lag value of the median house price. See lag.listw().
# Use lag.listw to lag variable
splag <- lag.listw(dist_15.lw,
var = msoa.spdf$med_house_price)
msoa.spdf$w.med_house_price <- splag
# Compare
summary(msoa.spdf$med_house_price) Min. 1st Qu. Median Mean 3rd Qu. Max.
126494 224406 264188 315562 357937 1601000
summary(msoa.spdf$w.med_house_price) Min. 1st Qu. Median Mean 3rd Qu. Max.
160320 233656 285488 314505 352936 794942
10) Can you use the results of the previous task to run a non-linear SLX model, where you predict if an MSOA is within the ulez zone based on the house prices? Can you make sense of the result?
Call:
glm(formula = ulez ~ med_house_price + w.med_house_price, family = "binomial",
data = msoa.spdf)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -6.029e+00 5.704e-01 -10.570 < 2e-16 ***
med_house_price 1.572e-06 1.179e-06 1.333 0.18248
w.med_house_price 5.107e-06 1.955e-06 2.612 0.00899 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 254.47 on 982 degrees of freedom
Residual deviance: 221.56 on 980 degrees of freedom
AIC: 227.56
Number of Fisher Scoring iterations: 7
### Distance to city center
# Define centre
centre <- st_as_sf(data.frame(lon = -0.128120855701165,
lat = 51.50725909644806),
coords = c("lon", "lat"),
crs = 4326)
# Reproject
centre <- st_transform(centre, crs = st_crs(msoa.spdf))
# Calculate distance
msoa.spdf$dist_centre <- as.numeric(st_distance(msoa.spdf, centre)) / 1000
# hist(msoa.spdf$dist_centre)
### Add distance to city centre
mod_2.log <- glm(ulez ~ med_house_price + w.med_house_price + dist_centre,
data = msoa.spdf,
family = "binomial")Warning: glm.fit: Angepasste Wahrscheinlichkeiten mit numerischem
Wert 0 oder 1 aufgetreten
summary(mod_2.log)
Call:
glm(formula = ulez ~ med_house_price + w.med_house_price + dist_centre,
family = "binomial", data = msoa.spdf)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.228e+01 3.042e+00 4.038 5.38e-05 ***
med_house_price 7.024e-06 2.981e-06 2.357 0.0184 *
w.med_house_price -1.883e-05 6.177e-06 -3.048 0.0023 **
dist_centre -3.170e+00 6.769e-01 -4.684 2.82e-06 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 254.465 on 982 degrees of freedom
Residual deviance: 50.485 on 979 degrees of freedom
AIC: 58.485
Number of Fisher Scoring iterations: 12