8 Exercises II
\[ \newcommand{\Exp}{\mathrm{E}} \newcommand\given[1][]{\:#1\vert\:} \newcommand{\Cov}{\mathrm{Cov}} \newcommand{\Var}{\mathrm{Var}} \newcommand{\rank}{\mathrm{rank}} \newcommand{\bm}[1]{\boldsymbol{\mathbf{#1}}} \]
Required packages
Session info
R version 4.4.1 (2024-06-14 ucrt)
Platform: x86_64-w64-mingw32/x64
Running under: Windows 11 x64 (build 22631)
Matrix products: default
locale:
[1] LC_COLLATE=English_United Kingdom.utf8
[2] LC_CTYPE=English_United Kingdom.utf8
[3] LC_MONETARY=English_United Kingdom.utf8
[4] LC_NUMERIC=C
[5] LC_TIME=English_United Kingdom.utf8
time zone: Europe/Berlin
tzcode source: internal
attached base packages:
[1] stats graphics grDevices utils datasets methods
[7] base
other attached packages:
[1] viridisLite_0.4.2 tmap_3.3-4 spatialreg_1.3-4
[4] Matrix_1.7-0 spdep_1.3-5 spData_2.3.1
[7] mapview_2.11.2 sf_1.0-16
loaded via a namespace (and not attached):
[1] xfun_0.45 raster_3.6-26 htmlwidgets_1.6.4
[4] lattice_0.22-6 tools_4.4.1 crosstalk_1.2.1
[7] LearnBayes_2.15.1 parallel_4.4.1 stats4_4.4.1
[10] sandwich_3.1-0 proxy_0.4-27 KernSmooth_2.23-24
[13] satellite_1.0.5 RColorBrewer_1.1-3 leaflet_2.2.2
[16] lifecycle_1.0.4 compiler_4.4.1 deldir_2.0-4
[19] munsell_0.5.1 terra_1.7-78 codetools_0.2-20
[22] leafsync_0.1.0 stars_0.6-5 htmltools_0.5.8.1
[25] class_7.3-22 MASS_7.3-60.2 classInt_0.4-10
[28] lwgeom_0.2-14 wk_0.9.1 abind_1.4-5
[31] boot_1.3-30 multcomp_1.4-25 nlme_3.1-164
[34] digest_0.6.35 mvtnorm_1.2-5 splines_4.4.1
[37] fastmap_1.2.0 grid_4.4.1 colorspace_2.1-0
[40] cli_3.6.2 magrittr_2.0.3 base64enc_0.1-3
[43] dichromat_2.0-0.1 XML_3.99-0.16.1 survival_3.6-4
[46] leafem_0.2.3 TH.data_1.1-2 e1071_1.7-14
[49] scales_1.3.0 sp_2.1-4 rmarkdown_2.27
[52] zoo_1.8-12 png_0.1-8 coda_0.19-4.1
[55] evaluate_0.24.0 knitr_1.47 tmaptools_3.1-1
[58] s2_1.1.6 rlang_1.1.4 Rcpp_1.0.12
[61] glue_1.7.0 DBI_1.2.3 rstudioapi_0.16.0
[64] jsonlite_1.8.8 R6_2.5.1 units_0.8-5
Reload data from pervious session
load("_data/msoa2_spatial.RData")
8.1 Environmental inequality
How would you investigate the following descriptive research question: Are immigrant minorities in London exposed to higher levels of pollution? Also consider the spatial structure. What’s your dependent and what is your independent variable?
1) Define a neigbours weights object of your choice
Assume a typical neighbourhood would be 2.5km in diameter
coords <- st_centroid(msoa.spdf)
Warning: st_centroid assumes attributes are constant over
geometries
# Neighbours within 3km distance
dist_15.nb <- dnearneigh(coords, d1 = 0, d2 = 2500)
summary(dist_15.nb)
Neighbour list object:
Number of regions: 983
Number of nonzero links: 15266
Percentage nonzero weights: 1.579859
Average number of links: 15.53001
4 regions with no links:
158 463 478 505
6 disjoint connected subgraphs
Link number distribution:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
4 5 9 23 19 26 36 31 53 39 61 63 59 48 42 35 24 31 28 30 27 26
22 23 24 25 26 27 28 29 30 31 32 33 34
25 19 38 29 32 38 26 16 20 10 8 1 2
5 least connected regions:
160 469 474 597 959 with 1 link
2 most connected regions:
565 567 with 34 links
# There are some empty neighbour sets. Lets impute those with the nearest neighbour.
k2.nb <- knearneigh(coords, k = 1)
# Replace zero
nolink_ids <- which(card(dist_15.nb) == 0)
dist_15.nb[card(dist_15.nb) == 0] <- k2.nb$nn[nolink_ids, ]
summary(dist_15.nb)
Neighbour list object:
Number of regions: 983
Number of nonzero links: 15270
Percentage nonzero weights: 1.580273
Average number of links: 15.53408
2 disjoint connected subgraphs
Link number distribution:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
9 9 23 19 26 36 31 53 39 61 63 59 48 42 35 24 31 28 30 27 26 25
23 24 25 26 27 28 29 30 31 32 33 34
19 38 29 32 38 26 16 20 10 8 1 2
9 least connected regions:
158 160 463 469 474 478 505 597 959 with 1 link
2 most connected regions:
565 567 with 34 links
# listw object with row-normalization
dist_15.lw <- nb2listw(dist_15.nb, style = "W")
2) Estimate the extent of spatial auto-correlation in air pollution
moran.test(msoa.spdf$no2, listw = dist_15.lw)
Moran I test under randomisation
data: msoa.spdf$no2
weights: dist_15.lw
Moran I statistic standard deviate = 65.197, p-value <
2.2e-16
alternative hypothesis: greater
sample estimates:
Moran I statistic Expectation Variance
0.891520698 -0.001018330 0.000187411
3) Estimate a Spatial SAR regression model
mod_1.sar <- lagsarlm(log(no2) ~ per_mixed + per_asian + per_black + per_other
+ per_nonUK_EU + per_nonEU + log(POPDEN),
data = msoa.spdf,
listw = dist_15.lw,
Durbin = FALSE) # we could here extend to SDM
summary(mod_1.sar)
Call:
lagsarlm(formula = log(no2) ~ per_mixed + per_asian + per_black +
per_other + per_nonUK_EU + per_nonEU + log(POPDEN), data = msoa.spdf,
listw = dist_15.lw, Durbin = FALSE)
Residuals:
Min 1Q Median 3Q Max
-0.2140485 -0.0267085 -0.0021421 0.0238337 0.3505513
Type: lag
Coefficients: (asymptotic standard errors)
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.7004e-02 1.8122e-02 -0.9383 0.348110
per_mixed 3.4376e-04 1.4758e-03 0.2329 0.815810
per_asian -8.5205e-05 1.1494e-04 -0.7413 0.458507
per_black -4.2754e-04 2.3468e-04 -1.8218 0.068484
per_other 1.9693e-03 7.4939e-04 2.6279 0.008591
per_nonUK_EU 8.9027e-04 3.9638e-04 2.2460 0.024703
per_nonEU 1.8460e-03 3.5159e-04 5.2506 1.516e-07
log(POPDEN) 1.8650e-02 2.7852e-03 6.6963 2.138e-11
Rho: 0.9684, LR test value: 2002.5, p-value: < 2.22e-16
Asymptotic standard error: 0.0063124
z-value: 153.41, p-value: < 2.22e-16
Wald statistic: 23535, p-value: < 2.22e-16
Log likelihood: 1562.401 for lag model
ML residual variance (sigma squared): 0.0020568, (sigma: 0.045352)
Number of observations: 983
Number of parameters estimated: 10
AIC: -3104.8, (AIC for lm: -1104.3)
LM test for residual autocorrelation
test value: 108.97, p-value: < 2.22e-16
4) Estimate a Spatial SEM regression model
mod_1.sem <- errorsarlm(log(no2) ~ per_mixed + per_asian + per_black + per_other
+ per_nonUK_EU + per_nonEU + log(POPDEN),
data = msoa.spdf,
listw = dist_15.lw,
Durbin = FALSE) # we could here extend to SDEM
summary(mod_1.sem)
Call:
errorsarlm(formula = log(no2) ~ per_mixed + per_asian + per_black +
per_other + per_nonUK_EU + per_nonEU + log(POPDEN), data = msoa.spdf,
listw = dist_15.lw, Durbin = FALSE)
Residuals:
Min 1Q Median 3Q Max
-0.207751 -0.027321 -0.004065 0.023338 0.353188
Type: error
Coefficients: (asymptotic standard errors)
Estimate Std. Error z value Pr(>|z|)
(Intercept) 2.52996667 0.36112709 7.0058 2.457e-12
per_mixed 0.00368862 0.00222547 1.6575 0.097427
per_asian -0.00019513 0.00024168 -0.8074 0.419428
per_black -0.00041368 0.00034777 -1.1895 0.234235
per_other 0.00295571 0.00114515 2.5811 0.009849
per_nonUK_EU 0.00154967 0.00059535 2.6030 0.009243
per_nonEU 0.00097237 0.00053695 1.8109 0.070153
log(POPDEN) 0.02738580 0.00326415 8.3899 < 2.2e-16
Lambda: 0.99598, LR test value: 1964.9, p-value: < 2.22e-16
Asymptotic standard error: 0.0018225
z-value: 546.48, p-value: < 2.22e-16
Wald statistic: 298640, p-value: < 2.22e-16
Log likelihood: 1543.617 for error model
ML residual variance (sigma squared): 0.0020635, (sigma: 0.045426)
Number of observations: 983
Number of parameters estimated: 10
AIC: -3067.2, (AIC for lm: -1104.3)
5) Estimate a Spatial SLX regression model
mod_1.slx <- lmSLX(log(no2) ~ per_mixed + per_asian + per_black + per_other
+ per_nonUK_EU + per_nonEU + log(POPDEN),
data = msoa.spdf,
listw = dist_15.lw)
summary(mod_1.slx)
Call:
lm(formula = formula(paste("y ~ ", paste(colnames(x)[-1], collapse = "+"))),
data = as.data.frame(x), weights = weights)
Coefficients:
Estimate Std. Error t value
(Intercept) 1.943e+00 4.115e-02 4.720e+01
per_mixed 4.910e-03 5.282e-03 9.296e-01
per_asian -1.357e-03 5.779e-04 -2.348e+00
per_black -1.421e-03 8.321e-04 -1.708e+00
per_other 4.510e-05 2.726e-03 1.655e-02
per_nonUK_EU 1.821e-04 1.436e-03 1.269e-01
per_nonEU 1.605e-03 1.272e-03 1.262e+00
log.POPDEN. 4.202e-02 7.736e-03 5.432e+00
lag.per_mixed -1.251e-02 7.626e-03 -1.641e+00
lag.per_asian 1.080e-03 6.956e-04 1.552e+00
lag.per_black -1.395e-03 1.306e-03 -1.069e+00
lag.per_other 1.915e-02 4.229e-03 4.528e+00
lag.per_nonUK_EU 1.038e-02 2.406e-03 4.314e+00
lag.per_nonEU 7.459e-03 1.891e-03 3.945e+00
lag.log.POPDEN. 2.332e-01 1.400e-02 1.666e+01
Pr(>|t|)
(Intercept) 2.535e-253
per_mixed 3.528e-01
per_asian 1.907e-02
per_black 8.795e-02
per_other 9.868e-01
per_nonUK_EU 8.991e-01
per_nonEU 2.073e-01
log.POPDEN. 7.064e-08
lag.per_mixed 1.011e-01
lag.per_asian 1.210e-01
lag.per_black 2.856e-01
lag.per_other 6.707e-06
lag.per_nonUK_EU 1.767e-05
lag.per_nonEU 8.560e-05
lag.log.POPDEN. 5.786e-55
6) Estimate a Spatial Durbin regression model
mod_1.dub <- lagsarlm(log(no2) ~ per_mixed + per_asian + per_black + per_other
+ per_nonUK_EU + per_nonEU + log(POPDEN),
data = msoa.spdf,
listw = dist_15.lw,
Durbin = TRUE) # Extend here to Durbin
summary(mod_1.dub)
Call:
lagsarlm(formula = log(no2) ~ per_mixed + per_asian + per_black +
per_other + per_nonUK_EU + per_nonEU + log(POPDEN), data = msoa.spdf,
listw = dist_15.lw, Durbin = TRUE)
Residuals:
Min 1Q Median 3Q Max
-0.1854009 -0.0263818 -0.0020816 0.0229647 0.3321974
Type: mixed
Coefficients: (asymptotic standard errors)
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.00409824 0.01983728 -0.2066 0.83633
per_mixed 0.00434535 0.00218712 1.9868 0.04695
per_asian -0.00028620 0.00023959 -1.1945 0.23227
per_black -0.00056734 0.00034455 -1.6466 0.09964
per_other 0.00222708 0.00112918 1.9723 0.04857
per_nonUK_EU 0.00085417 0.00059478 1.4361 0.15097
per_nonEU 0.00095220 0.00052681 1.8075 0.07069
log(POPDEN) 0.02649122 0.00320358 8.2693 2.220e-16
lag.per_mixed -0.00475294 0.00315799 -1.5051 0.13231
lag.per_asian 0.00024092 0.00028983 0.8312 0.40584
lag.per_black 0.00025812 0.00054125 0.4769 0.63344
lag.per_other -0.00074506 0.00176141 -0.4230 0.67230
lag.per_nonUK_EU 0.00094549 0.00100320 0.9425 0.34595
lag.per_nonEU 0.00130970 0.00078547 1.6674 0.09544
lag.log(POPDEN) -0.02526415 0.00588517 -4.2928 1.764e-05
Rho: 0.98286, LR test value: 1536.9, p-value: < 2.22e-16
Asymptotic standard error: 0.0051804
z-value: 189.73, p-value: < 2.22e-16
Wald statistic: 35997, p-value: < 2.22e-16
Log likelihood: 1576.566 for mixed model
ML residual variance (sigma squared): 0.001969, (sigma: 0.044374)
Number of observations: 983
Number of parameters estimated: 17
AIC: -3119.1, (AIC for lm: -1584.3)
LM test for residual autocorrelation
test value: 103.97, p-value: < 2.22e-16
7) Estimate a Spatial Durbin Error regression model
mod_1.dube <- errorsarlm(log(no2) ~ per_mixed + per_asian + per_black + per_other
+ per_nonUK_EU + per_nonEU + log(POPDEN),
data = msoa.spdf,
listw = dist_15.lw,
Durbin = TRUE) # Extend here to Durbin
summary(mod_1.dube)
Call:
errorsarlm(formula = log(no2) ~ per_mixed + per_asian + per_black +
per_other + per_nonUK_EU + per_nonEU + log(POPDEN), data = msoa.spdf,
listw = dist_15.lw, Durbin = TRUE)
Residuals:
Min 1Q Median 3Q Max
-0.1839285 -0.0254426 -0.0027042 0.0216084 0.2944840
Type: error
Coefficients: (asymptotic standard errors)
Estimate Std. Error z value Pr(>|z|)
(Intercept) 2.64939215 0.24748370 10.7053 < 2.2e-16
per_mixed 0.00553333 0.00223688 2.4737 0.013373
per_asian -0.00017156 0.00024183 -0.7094 0.478062
per_black -0.00057947 0.00034426 -1.6832 0.092334
per_other 0.00203392 0.00112534 1.8074 0.070701
per_nonUK_EU 0.00086254 0.00058902 1.4644 0.143091
per_nonEU 0.00135822 0.00053480 2.5397 0.011096
log(POPDEN) 0.02716824 0.00354239 7.6695 1.732e-14
lag.per_mixed 0.00107140 0.00819322 0.1308 0.895960
lag.per_asian -0.00060616 0.00070080 -0.8649 0.387069
lag.per_black -0.00191733 0.00130997 -1.4636 0.143291
lag.per_other 0.01014125 0.00496979 2.0406 0.041293
lag.per_nonUK_EU 0.00925620 0.00217624 4.2533 2.106e-05
lag.per_nonEU 0.00563564 0.00185541 3.0374 0.002386
lag.log(POPDEN) -0.01370891 0.01128957 -1.2143 0.224634
Lambda: 0.99424, LR test value: 1527.8, p-value: < 2.22e-16
Asymptotic standard error: 0.0024921
z-value: 398.96, p-value: < 2.22e-16
Wald statistic: 159170, p-value: < 2.22e-16
Log likelihood: 1572.051 for error model
ML residual variance (sigma squared): 0.0019546, (sigma: 0.044211)
Number of observations: 983
Number of parameters estimated: 17
AIC: -3110.1, (AIC for lm: -1584.3)
8.1.1 8) Sneak preview on tomorrow: Which of the spatial model specifications about would you choose / prefer in a real world example?
8.1.2 9) Please calculate the spatially lagged value of the median house price.
# Use lag.listw to lag variable
splag <- lag.listw(dist_15.lw,
var = msoa.spdf$med_house_price)
msoa.spdf$w.med_house_price <- splag
# Compare
summary(msoa.spdf$med_house_price)
Min. 1st Qu. Median Mean 3rd Qu. Max.
126494 224406 264188 315562 357937 1601000
summary(msoa.spdf$w.med_house_price)
Min. 1st Qu. Median Mean 3rd Qu. Max.
160320 233656 285488 314505 352936 794943
8.1.3 10) Can you use the results of the previous task to run a non-linear SLX model, where you predict if an MSOA is within the ulez zone based on the house prices? Can you make sense of the result?
### Calculate SLX logit
mod_2.log <- glm(ulez ~ med_house_price + w.med_house_price,
data = msoa.spdf,
family = "binomial")
summary(mod_2.log)
Call:
glm(formula = ulez ~ med_house_price + w.med_house_price, family = "binomial",
data = msoa.spdf)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -6.029e+00 5.704e-01 -10.570 < 2e-16 ***
med_house_price 1.572e-06 1.179e-06 1.333 0.18248
w.med_house_price 5.107e-06 1.955e-06 2.612 0.00899 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 254.47 on 982 degrees of freedom
Residual deviance: 221.56 on 980 degrees of freedom
AIC: 227.56
Number of Fisher Scoring iterations: 7
### Distance to city center
# Define centre
centre <- st_as_sf(data.frame(lon = -0.128120855701165,
lat = 51.50725909644806),
coords = c("lon", "lat"),
crs = 4326)
# Reproject
centre <- st_transform(centre, crs = st_crs(msoa.spdf))
# Calculate distance
msoa.spdf$dist_centre <- as.numeric(st_distance(msoa.spdf, centre)) / 1000
# hist(msoa.spdf$dist_centre)
### Add distance to city centre
mod_2.log <- glm(ulez ~ med_house_price + w.med_house_price + dist_centre,
data = msoa.spdf,
family = "binomial")
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
summary(mod_2.log)
Call:
glm(formula = ulez ~ med_house_price + w.med_house_price + dist_centre,
family = "binomial", data = msoa.spdf)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.228e+01 3.042e+00 4.038 5.38e-05 ***
med_house_price 7.024e-06 2.981e-06 2.357 0.0184 *
w.med_house_price -1.883e-05 6.177e-06 -3.048 0.0023 **
dist_centre -3.170e+00 6.769e-01 -4.684 2.82e-06 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 254.465 on 982 degrees of freedom
Residual deviance: 50.485 on 979 degrees of freedom
AIC: 58.485
Number of Fisher Scoring iterations: 12