6 Exercise II

\[ \newcommand{\Exp}{\mathrm{E}} \newcommand\given[1][]{\:#1\vert\:} \newcommand{\Cov}{\mathrm{Cov}} \newcommand{\Var}{\mathrm{Var}} \newcommand{\rank}{\mathrm{rank}} \newcommand{\bm}[1]{\boldsymbol{\mathbf{#1}}} \]

Required packages

pkgs <- c("sf", "mapview", "spdep", "spatialreg", "tmap", "viridisLite") # note: load spdep first, then spatialreg
lapply(pkgs, require, character.only = TRUE)

Session info

sessionInfo()

R version 4.3.1 (2023-06-16 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19044)

Matrix products: default


locale:
[1] LC_COLLATE=English_United Kingdom.utf8 
[2] LC_CTYPE=English_United Kingdom.utf8   
[3] LC_MONETARY=English_United Kingdom.utf8
[4] LC_NUMERIC=C                           
[5] LC_TIME=English_United Kingdom.utf8    

time zone: Europe/Berlin
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods  
[7] base     

other attached packages:
[1] viridisLite_0.4.2 tmap_3.3-3        spatialreg_1.2-9 
[4] Matrix_1.5-4.1    spdep_1.2-8       spData_2.3.0     
[7] mapview_2.11.0    sf_1.0-13        

loaded via a namespace (and not attached):
 [1] xfun_0.39          raster_3.6-23      htmlwidgets_1.6.2 
 [4] lattice_0.21-8     vctrs_0.6.3        tools_4.3.1       
 [7] crosstalk_1.2.0    LearnBayes_2.15.1  generics_0.1.3    
[10] parallel_4.3.1     sandwich_3.0-2     stats4_4.3.1      
[13] tibble_3.2.1       proxy_0.4-27       fansi_1.0.4       
[16] pkgconfig_2.0.3    KernSmooth_2.23-21 satellite_1.0.4   
[19] RColorBrewer_1.1-3 leaflet_2.1.2      webshot_0.5.5     
[22] lifecycle_1.0.3    compiler_4.3.1     deldir_1.0-9      
[25] munsell_0.5.0      terra_1.7-39       leafsync_0.1.0    
[28] codetools_0.2-19   stars_0.6-1        htmltools_0.5.5   
[31] class_7.3-22       pillar_1.9.0       MASS_7.3-60       
[34] classInt_0.4-9     lwgeom_0.2-13      wk_0.7.3          
[37] abind_1.4-5        boot_1.3-28.1      multcomp_1.4-25   
[40] nlme_3.1-162       tidyselect_1.2.0   digest_0.6.32     
[43] mvtnorm_1.2-2      dplyr_1.1.2        splines_4.3.1     
[46] fastmap_1.1.1      grid_4.3.1         colorspace_2.1-0  
[49] expm_0.999-7       cli_3.6.1          magrittr_2.0.3    
[52] base64enc_0.1-3    dichromat_2.0-0.1  XML_3.99-0.14     
[55] survival_3.5-5     utf8_1.2.3         TH.data_1.1-2     
[58] leafem_0.2.0       e1071_1.7-13       scales_1.2.1      
[61] sp_1.6-1           rmarkdown_2.23     zoo_1.8-12        
[64] png_0.1-8          coda_0.19-4        evaluate_0.21     
[67] knitr_1.43         tmaptools_3.1-1    s2_1.1.4          
[70] rlang_1.1.1        Rcpp_1.0.10        glue_1.6.2        
[73] DBI_1.1.3          rstudioapi_0.14    jsonlite_1.8.5    
[76] R6_2.5.1           units_0.8-2

Reload data from pervious session

load("_data/msoa2_spatial.RData")

6.1 Environmental inequality

How would you investigate the following descriptive research question: Are ethnic (and immigrant) minorities in London exposed to higher levels of pollution? Also consider the spatial structure. What’s your dependent and whats your independent variable?

1) Define a neigbours weights object of your choice

Assume a typical neighbourhood would be 2.5km in diameter

coords <- st_centroid(msoa.spdf)

Warning: st_centroid assumes attributes are constant over
geometries

# Neighbours within 3km distance
dist_15.nb <- dnearneigh(coords, d1 = 0, d2 = 2500)

summary(dist_15.nb)

Neighbour list object:
Number of regions: 983 
Number of nonzero links: 15266 
Percentage nonzero weights: 1.579859 
Average number of links: 15.53001 
4 regions with no links:
158 463 478 505
Link number distribution:

 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 
 4  5  9 23 19 26 36 31 53 39 61 63 59 48 42 35 24 31 28 30 27 26 
22 23 24 25 26 27 28 29 30 31 32 33 34 
25 19 38 29 32 38 26 16 20 10  8  1  2 
5 least connected regions:
160 469 474 597 959 with 1 link
2 most connected regions:
565 567 with 34 links

# There are some mpty one. Lets impute with the nearest neighbour
k2.nb <- knearneigh(coords, k = 1)

# Replace zero
nolink_ids <- which(card(dist_15.nb) == 0)
dist_15.nb[card(dist_15.nb) == 0] <- k2.nb$nn[nolink_ids, ]

summary(dist_15.nb)

Neighbour list object:
Number of regions: 983 
Number of nonzero links: 15270 
Percentage nonzero weights: 1.580273 
Average number of links: 15.53408 
Link number distribution:

 1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 
 9  9 23 19 26 36 31 53 39 61 63 59 48 42 35 24 31 28 30 27 26 25 
23 24 25 26 27 28 29 30 31 32 33 34 
19 38 29 32 38 26 16 20 10  8  1  2 
9 least connected regions:
158 160 463 469 474 478 505 597 959 with 1 link
2 most connected regions:
565 567 with 34 links

# listw object with row-normalization
dist_15.lw <- nb2listw(dist_15.nb, style = "W")

2) Estimate the extent of spatial auto-correlation

moran.test(msoa.spdf$no2, listw = dist_15.lw)


    Moran I test under randomisation

data:  msoa.spdf$no2  
weights: dist_15.lw    

Moran I statistic standard deviate = 65.197, p-value <
2.2e-16
alternative hypothesis: greater
sample estimates:
Moran I statistic       Expectation          Variance 
      0.891520698      -0.001018330       0.000187411

3) Estimate a spatial SAR regression model

Estimate a spatial autoregressive SAR model

mod_1.sar <- lagsarlm(log(no2) ~ per_mixed + per_asian + per_black + per_other
                      + per_nonUK_EU + per_nonEU  + log(POPDEN),  
                      data = msoa.spdf, 
                      listw = dist_15.lw,
                      Durbin = FALSE) # we could here extend to SDM
summary(mod_1.sar)


Call:
lagsarlm(formula = log(no2) ~ per_mixed + per_asian + per_black + 
    per_other + per_nonUK_EU + per_nonEU + log(POPDEN), data = msoa.spdf, 
    listw = dist_15.lw, Durbin = FALSE)

Residuals:
       Min         1Q     Median         3Q        Max 
-0.2140485 -0.0267085 -0.0021421  0.0238337  0.3505513 

Type: lag 
Coefficients: (asymptotic standard errors) 
                Estimate  Std. Error z value  Pr(>|z|)
(Intercept)  -1.7004e-02  1.8122e-02 -0.9383  0.348110
per_mixed     3.4376e-04  1.4758e-03  0.2329  0.815810
per_asian    -8.5205e-05  1.1494e-04 -0.7413  0.458507
per_black    -4.2754e-04  2.3468e-04 -1.8218  0.068484
per_other     1.9693e-03  7.4939e-04  2.6279  0.008591
per_nonUK_EU  8.9027e-04  3.9638e-04  2.2460  0.024703
per_nonEU     1.8460e-03  3.5159e-04  5.2506 1.516e-07
log(POPDEN)   1.8650e-02  2.7852e-03  6.6963 2.138e-11

Rho: 0.9684, LR test value: 2002.5, p-value: < 2.22e-16
Asymptotic standard error: 0.0063124
    z-value: 153.41, p-value: < 2.22e-16
Wald statistic: 23535, p-value: < 2.22e-16

Log likelihood: 1562.401 for lag model
ML residual variance (sigma squared): 0.0020568, (sigma: 0.045352)
Number of observations: 983 
Number of parameters estimated: 10 
AIC: -3104.8, (AIC for lm: -1104.3)
LM test for residual autocorrelation
test value: 108.97, p-value: < 2.22e-16

Have a look into the true multiplier matrix \(({\bm I_N}-\rho {\bm W})^{-1}\beta_k\)

W <- listw2mat(dist_15.lw)
I <- diag(dim(W)[1])

rho <- unname(mod_1.sar$rho)

M <- solve(I - rho*W)

M[1:10, 1:10]

                1           2           3           4           5
 [1,] 1.164650997 0.002433319 0.004089559 0.004034508 0.006545994
 [2,] 0.010706605 1.407336301 0.643881932 0.370049927 0.464794934
 [3,] 0.011246286 0.402426207 1.474021599 0.429011868 0.641526285
 [4,] 0.008875918 0.185024963 0.343209495 1.684533322 0.614086824
 [5,] 0.012000989 0.193664556 0.427684190 0.511739020 1.560840834
 [6,] 0.010741524 0.192552594 0.452940016 0.631452476 0.672787841
 [7,] 0.012779708 0.141953871 0.299247377 0.418234186 0.616895800
 [8,] 0.014769006 0.125781189 0.253122442 0.295553039 0.500919513
 [9,] 0.011708131 0.147549264 0.309080773 0.568442619 0.629156269
[10,] 0.009937859 0.152900148 0.306652041 0.727001926 0.553973310
                6           7          8           9          10
 [1,] 0.004882511 0.005808958 0.00872714 0.005854065 0.003613767
 [2,] 0.385105188 0.283907742 0.32703109 0.324608380 0.244640236
 [3,] 0.566175019 0.374059222 0.41132397 0.424986063 0.306652041
 [4,] 0.631452476 0.418234186 0.38421895 0.625286881 0.581601541
 [5,] 0.560656534 0.514079833 0.54266281 0.576726579 0.369315540
 [6,] 1.571175245 0.558170218 0.46513922 0.661184961 0.543820047
 [7,] 0.558170218 1.475511568 0.58520461 0.614170880 0.463886540
 [8,] 0.357799398 0.450157392 1.46638195 0.474994894 0.272339890
 [9,] 0.601077237 0.558337164 0.56135760 1.581077095 0.517983092
[10,] 0.679775059 0.579858174 0.44255232 0.712226751 1.560083138

Create an \(N \times N\) effects matrix. What is the effect of unit 6 on unit 10?

# For beta 1

beta <- mod_1.sar$coefficients

effM <- beta[2] * M

effM[1:10, 1:10]

                 1            2            3            4
 [1,] 4.003610e-04 8.364789e-07 1.405829e-06 1.386904e-06
 [2,] 3.680507e-06 4.837866e-04 2.213411e-04 1.272085e-04
 [3,] 3.866028e-06 1.383382e-04 5.067103e-04 1.474773e-04
 [4,] 3.051190e-06 6.360427e-05 1.179819e-04 5.790759e-04
 [5,] 4.125465e-06 6.657422e-05 1.470209e-04 1.759156e-04
 [6,] 3.692511e-06 6.619197e-05 1.557029e-04 2.170684e-04
 [7,] 4.393158e-06 4.879813e-05 1.028694e-04 1.437724e-04
 [8,] 5.077000e-06 4.323860e-05 8.701349e-05 1.015994e-04
 [9,] 4.024792e-06 5.072160e-05 1.062497e-04 1.954081e-04
[10,] 3.416243e-06 5.256102e-05 1.054148e-04 2.499145e-04
                 5            6            7            8
 [1,] 2.250254e-06 1.678414e-06 1.996890e-06 3.000045e-06
 [2,] 1.597781e-04 1.323839e-04 9.759625e-05 1.124204e-04
 [3,] 2.205314e-04 1.946286e-04 1.285868e-04 1.413969e-04
 [4,] 2.110988e-04 2.170684e-04 1.437724e-04 1.320793e-04
 [5,] 5.365554e-04 1.927315e-04 1.767203e-04 1.865460e-04
 [6,] 2.312779e-04 5.401079e-04 1.918768e-04 1.598965e-04
 [7,] 2.120644e-04 1.918768e-04 5.072225e-04 2.011702e-04
 [8,] 1.721963e-04 1.229973e-04 1.547463e-04 5.040841e-04
 [9,] 2.162790e-04 2.066266e-04 1.919342e-04 1.929725e-04
[10,] 1.904341e-04 2.336798e-04 1.993323e-04 1.521320e-04
                 9           10
 [1,] 2.012396e-06 1.242270e-06
 [2,] 1.115875e-04 8.409764e-05
 [3,] 1.460934e-04 1.054148e-04
 [4,] 2.149489e-04 1.999316e-04
 [5,] 1.982558e-04 1.269561e-04
 [6,] 2.272892e-04 1.869438e-04
 [7,] 2.111277e-04 1.594658e-04
 [8,] 1.632845e-04 9.361968e-05
 [9,] 5.435118e-04 1.780621e-04
[10,] 2.448354e-04 5.362949e-04

# "Effect" of unit 6 on unit 10
effM[10, 6]

           6 
0.0002336798

Estimate a spatial autoregressive SLX model

mod_1.slx <- lmSLX(log(no2) ~ per_mixed + per_asian + per_black + per_other
                      + per_nonUK_EU + per_nonEU  + log(POPDEN),  
                      data = msoa.spdf, 
                      listw = dist_15.lw,
                  Durbin = TRUE)

Calculate and interpret the summary impact measures for SAR and SLX.

mod_1.sar.imp <- impacts(mod_1.sar, listw = dist_15.lw, R = 300)
summary(mod_1.sar.imp)

Impact measures (lag, exact):
                    Direct     Indirect        Total
per_mixed     0.0004939013  0.010385844  0.010879745
per_asian    -0.0001224192 -0.002574253 -0.002696672
per_black    -0.0006142789 -0.012917166 -0.013531445
per_other     0.0028294759  0.059498722  0.062328198
per_nonUK_EU  0.0012791011  0.026897166  0.028176267
per_nonEU     0.0026523198  0.055773451  0.058425770
log(POPDEN)   0.0267960076  0.563471199  0.590267206
========================================================
Simulation results ( variance matrix):
Direct:

Iterations = 1:300
Thinning interval = 1 
Number of chains = 1 
Sample size per chain = 300 

1. Empirical mean and standard deviation for each variable,
   plus standard error of the mean:

                   Mean        SD  Naive SE Time-series SE
per_mixed     0.0006290 0.0021515 1.242e-04      1.242e-04
per_asian    -0.0001198 0.0001745 1.007e-05      9.506e-06
per_black    -0.0006360 0.0003383 1.953e-05      1.953e-05
per_other     0.0028917 0.0011012 6.358e-05      6.358e-05
per_nonUK_EU  0.0012287 0.0005869 3.388e-05      3.388e-05
per_nonEU     0.0026579 0.0005215 3.011e-05      3.011e-05
log(POPDEN)   0.0268259 0.0039352 2.272e-04      2.272e-04

2. Quantiles for each variable:

                   2.5%        25%        50%        75%     97.5%
per_mixed    -3.827e-03 -0.0007547  0.0006964  2.161e-03 4.419e-03
per_asian    -4.861e-04 -0.0002168 -0.0001131 -5.023e-06 1.970e-04
per_black    -1.298e-03 -0.0008304 -0.0006485 -4.257e-04 6.152e-05
per_other     9.995e-04  0.0020405  0.0028719  3.625e-03 5.070e-03
per_nonUK_EU  3.855e-05  0.0008247  0.0012489  1.635e-03 2.382e-03
per_nonEU     1.665e-03  0.0023344  0.0026625  2.934e-03 3.719e-03
log(POPDEN)   1.916e-02  0.0242278  0.0268096  2.959e-02 3.369e-02

========================================================
Indirect:

Iterations = 1:300
Thinning interval = 1 
Number of chains = 1 
Sample size per chain = 300 

1. Empirical mean and standard deviation for each variable,
   plus standard error of the mean:

                  Mean       SD  Naive SE Time-series SE
per_mixed     0.012864 0.050341 0.0029064      0.0029064
per_asian    -0.002726 0.004235 0.0002445      0.0002285
per_black    -0.013862 0.008827 0.0005097      0.0005097
per_other     0.062871 0.029156 0.0016833      0.0016833
per_nonUK_EU  0.026055 0.012921 0.0007460      0.0007460
per_nonEU     0.058273 0.021248 0.0012267      0.0012267
log(POPDEN)   0.579798 0.138802 0.0080137      0.0080137

2. Quantiles for each variable:

                 2.5%      25%       50%        75%    97.5%
per_mixed    -0.08134 -0.01628  0.014219  4.466e-02 0.093136
per_asian    -0.01218 -0.00489 -0.002383 -9.328e-05 0.004075
per_black    -0.02880 -0.01909 -0.013133 -8.606e-03 0.001124
per_other     0.02104  0.04230  0.059895  7.821e-02 0.119970
per_nonUK_EU  0.00100  0.01768  0.025476  3.338e-02 0.053705
per_nonEU     0.03309  0.04539  0.055598  6.570e-02 0.097426
log(POPDEN)   0.36831  0.48458  0.563189  6.568e-01 0.853630

========================================================
Total:

Iterations = 1:300
Thinning interval = 1 
Number of chains = 1 
Sample size per chain = 300 

1. Empirical mean and standard deviation for each variable,
   plus standard error of the mean:

                  Mean       SD  Naive SE Time-series SE
per_mixed     0.013493 0.052405 0.0030256      0.0030256
per_asian    -0.002846 0.004401 0.0002541      0.0002375
per_black    -0.014498 0.009134 0.0005274      0.0005274
per_other     0.065763 0.030096 0.0017376      0.0017376
per_nonUK_EU  0.027283 0.013461 0.0007772      0.0007772
per_nonEU     0.060931 0.021644 0.0012496      0.0012496
log(POPDEN)   0.606624 0.140899 0.0081348      0.0081348

2. Quantiles for each variable:

                  2.5%       25%       50%       75%    97.5%
per_mixed    -0.084945 -0.017213  0.014974  0.046660 0.097314
per_asian    -0.012624 -0.005098 -0.002513 -0.000098 0.004275
per_black    -0.029797 -0.020036 -0.013809 -0.009053 0.001185
per_other     0.022118  0.044524  0.062740  0.081491 0.124774
per_nonUK_EU  0.001039  0.018489  0.026746  0.035081 0.056067
per_nonEU     0.035101  0.048061  0.058173  0.068653 0.101046
log(POPDEN)   0.390419  0.511445  0.591754  0.683240 0.884687

For SLX, you can just interpret the coefficients. Impacts will give you the same results.

4) Is SAR the right model choice or would you rather estimate a different model?

How do results change once you specify a spatial Durbin model?
Please calculate and interpret the impacts for a spatial Durbin model.

6.2 Life Expecatancy in Germany

Below, we read and transform some characteristics of the INKAR data on German counties.

load("_data/inkar2.Rdata")

Variables are

Variable	Description
“Kennziffer”	ID
“Raumeinheit”	Name
“Aggregat”	Level
“year”	Year
“poluation_density”	Population Density
“median_income”	Median Household income (only for 2020)
“gdp_in1000EUR”	Gross Domestic Product in 1000 euros
“unemployment_rate”	Unemployment rate
“share_longterm_unemployed”	Share of longterm unemployed (among unemployed)
“share_working_indutry”	Share of employees in undistrial sector
“share_foreigners”	Share of foreign nationals
“share_college”	Share of school-finishers with college degree
“recreational_space”	Recreational space per inhabitant
“car_density”	Density of cars
“life_expectancy”	Life expectancy

And we get the respective county shapes:

kreise.spdf <- st_read(dsn = "_data/vg5000_ebenen_1231",
                       layer = "VG5000_KRS")

Reading layer `VG5000_KRS' from data source 
  `C:\work\Lehre\Geodata_Spatial_Regression_short\_data\vg5000_ebenen_1231' 
  using driver `ESRI Shapefile'
Simple feature collection with 400 features and 24 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 280353.1 ymin: 5235878 xmax: 921261.6 ymax: 6101302
Projected CRS: ETRS89 / UTM zone 32N

6.2.1 1) Merge data with the shape file (as with conventional data)

# Merge
inkar_2020.spdf <- merge(kreise.spdf, inkar.df[inkar.df$year == 2020, ], 
                         by.x = "AGS", by.y = "Kennziffer")

6.2.2 2) Create a map of life-expectancy

cols <- viridis(n = 100, direction = -1, option = "G")

mp1 <-  tm_shape(inkar_2020.spdf) + 
  tm_fill(col = "life_expectancy", 
          style = "cont", # algorithm to def cut points
          palette = cols, # colours
          stretch.palette = TRUE,
          title = "in years"
          ) +
  tm_borders(col = "white", lwd = 0.5, alpha = 0.5) +
  tm_layout(frame = FALSE,
            legend.frame = TRUE, legend.bg.color = TRUE,
            legend.position = c("right", "bottom"),
            legend.outside = FALSE,
            main.title = "Life expectancy", 
            main.title.position = "center",
            main.title.size = 1.6,
            legend.title.size = 0.8,
            legend.text.size = 0.8)

mp1

6.2.3 3) Chose some variables that could predict life expectancy. See for instance the following paper.

6.2.4 4) Generate a neighbours object (e.g. the 10 nearest neighbours).

# nb <- poly2nb(kreise.spdf, row.names = "ags", queen = TRUE)
knn <- knearneigh(st_centroid(kreise.spdf), k = 10)

Warning: st_centroid assumes attributes are constant over
geometries

nb <- knn2nb(knn, row.names = kreise.spdf$ags)
listw <- nb2listw(nb, style = "W")

6.2.5 5) Estimate a cross-sectional spatial model for the year 2020 and calculate the impacts.

### Use a spatial Durbin Error model

# Spec formula
fm <- life_expectancy ~ median_income + unemployment_rate + share_college + car_density

# Estimate error model with Durbin = TRUE 
mod_1.durb <- errorsarlm(fm,  
                      data = inkar_2020.spdf, 
                      listw = listw,
                      Durbin = TRUE)

summary(mod_1.durb)


Call:
errorsarlm(formula = fm, data = inkar_2020.spdf, listw = listw, 
    Durbin = TRUE)

Residuals:
      Min        1Q    Median        3Q       Max 
-1.343988 -0.349564  0.013309  0.333105  1.819014 

Type: error 
Coefficients: (asymptotic standard errors) 
                         Estimate  Std. Error  z value  Pr(>|z|)
(Intercept)            8.4970e+01  1.4366e+00  59.1460 < 2.2e-16
median_income          5.4013e-04  8.2285e-05   6.5642 5.233e-11
unemployment_rate     -3.8970e-01  2.0095e-02 -19.3923 < 2.2e-16
share_college          6.7806e-03  3.2502e-03   2.0862  0.036957
car_density           -3.2042e-03  4.9774e-04  -6.4376 1.214e-10
lag.median_income      4.9282e-04  1.8112e-04   2.7209  0.006510
lag.unemployment_rate -3.4685e-02  4.5454e-02  -0.7631  0.445415
lag.share_college     -1.7066e-03  7.0324e-03  -0.2427  0.808256
lag.car_density       -5.2210e-03  1.7540e-03  -2.9766  0.002915

Lambda: 0.57895, LR test value: 48.146, p-value: 3.9563e-12
Asymptotic standard error: 0.069524
    z-value: 8.3274, p-value: < 2.22e-16
Wald statistic: 69.345, p-value: < 2.22e-16

Log likelihood: -305.6855 for error model
ML residual variance (sigma squared): 0.26001, (sigma: 0.50991)
Number of observations: 400 
Number of parameters estimated: 11 
AIC: NA (not available for weighted model), (AIC for lm: 679.52)

# Calculate impacts (which is unnecessary in this case)
mod_1.durb.imp <- impacts(mod_1.durb, listw = listw, R = 300)
summary(mod_1.durb.imp, zstats = TRUE, short = TRUE)

Impact measures (SDEM, glht, n):
                         Direct      Indirect        Total
median_income      0.0005401288  0.0004928219  0.001032951
unemployment_rate -0.3896966870 -0.0346850573 -0.424381744
share_college      0.0067806074 -0.0017065824  0.005074025
car_density       -0.0032042421 -0.0052210252 -0.008425267
========================================================
Standard errors:
                        Direct     Indirect        Total
median_income     8.228463e-05 0.0001811231 0.0001813201
unemployment_rate 2.009540e-02 0.0454539507 0.0455753453
share_college     3.250157e-03 0.0070323527 0.0069549479
car_density       4.977409e-04 0.0017540485 0.0018942462
========================================================
Z-values:
                      Direct   Indirect      Total
median_income       6.564152  2.7209218  5.6968356
unemployment_rate -19.392336 -0.7630812 -9.3116518
share_college       2.086240 -0.2426759  0.7295561
car_density        -6.437570 -2.9765569 -4.4478205

p-values:
                  Direct     Indirect  Total     
median_income     5.233e-11  0.0065100 1.2205e-08
unemployment_rate < 2.22e-16 0.4454149 < 2.22e-16
share_college     0.036957   0.8082565 0.46566   
car_density       1.214e-10  0.0029151 8.6746e-06

6.2.6 6) Calculate the spatial lagged variables for your covariates (e.g. use create_WX(), which needs a non-spatial df as input) .

# Extract covariate names
covars <- attr(terms(fm),"term.labels")

w_vars <- create_WX(st_drop_geometry(inkar_2020.spdf)[, covars],
                    listw = listw,
                    prefix = "w")

inkar_2020.spdf <- cbind(inkar_2020.spdf, w_vars)

6.2.7 6) Can you run a spatial machine learning model? (for instance, using `randomForest`)?

library(randomForest)

Warning: package 'randomForest' was built under R version 4.3.2

randomForest 4.7-1.1

Type rfNews() to see new features/changes/bug fixes.

# Train
rf.mod <- randomForest(life_expectancy ~ median_income + unemployment_rate + share_college + car_density +
                         w.median_income + w.unemployment_rate + w.share_college + w.car_density,
                       data = st_drop_geometry(inkar_2020.spdf), 
                       ntree = 1000,
                       importance = TRUE)

# Inspect the mechanics of the model
importance(rf.mod)

                     %IncMSE IncNodePurity
median_income       33.60659      40.78685
unemployment_rate   64.62806     114.71155
share_college       22.82387      26.27735
car_density         23.51385      32.96287
w.median_income     36.80049      58.60537
w.unemployment_rate 25.87530      53.53837
w.share_college     17.79590      23.29938
w.car_density       22.88666      30.58426

varImpPlot(rf.mod)

You could even go further and use higher order neighbours (e.g. nblag(queens.nb, maxlag = 3)) to check the importance of direct neighbours and the neighbours neighbours and so on …

# Create higher order NB object
listw.lag <- nblag(nb, maxlag = 3)


# Create listwise of 1st, 2nd and 3rd order neighbours
listw.lw1 <- nb2listw(listw.lag[[1]], style = "W")
listw.lw2 <- nb2listw(listw.lag[[2]], style = "W")
listw.lw3 <- nb2listw(listw.lag[[3]], style = "W")

# Create lagged X
w_vars2 <- create_WX(st_drop_geometry(inkar_2020.spdf)[, covars],
                    listw = listw.lw2,
                    prefix = "w2")

w_vars3 <- create_WX(st_drop_geometry(inkar_2020.spdf)[, covars],
                    listw = listw.lw3,
                    prefix = "w3")

inkar_2020.spdf <- cbind(inkar_2020.spdf, w_vars2, w_vars3)

# Train
rf.mod <- randomForest(life_expectancy ~ median_income + unemployment_rate + share_college + car_density +
                         w.median_income + w.unemployment_rate + w.share_college + w.car_density +
                         w2.median_income + w2.unemployment_rate + w2.share_college + w2.car_density +
                         w3.median_income + w3.unemployment_rate + w3.share_college + w3.car_density,
                       data = st_drop_geometry(inkar_2020.spdf), 
                       ntree = 1000,
                       importance = TRUE)

# Inspect the mechanics of the model
importance(rf.mod)

                       %IncMSE IncNodePurity
median_income        27.698772     23.107581
unemployment_rate    61.408198    111.652254
share_college        22.900494     17.460207
car_density          19.633700     16.759375
w.median_income      28.809540     36.364268
w.unemployment_rate  16.679323     24.988638
w.share_college      11.884784      9.512635
w.car_density        17.127785     13.527587
w2.median_income     21.241519     19.556860
w2.unemployment_rate 14.932238     24.966208
w2.share_college     14.712295     12.168590
w2.car_density       11.148194     12.264224
w3.median_income     16.685965     16.188840
w3.unemployment_rate 17.785560     25.537481
w3.share_college      8.743832     10.353950
w3.car_density        8.991503     12.893037

varImpPlot(rf.mod)