::p_load(sf, tidyverse, tmap, spdep, funModeling, purrr, here) pacman
Take Home Ex-1
Overview
Step 1: Load the required Packages
sf : Simple features for R. We import this package to help us read the aspatial and geospatial data.
tidyverse: This package help us to transform and better present the data.
tmap: We use this package to plot thematic maps
spdep: We use this package to help us obtain the spatial weights.
funModeling: We use this package to help us with EDA
here: Helps us generate a path to a specific directory on the root
Load the packages:
Step 2: Import the Data
Import Nigeria water point data-file:
Generate a path:
We use the here function to generate a specific file path on the root folder.
<- here("data", "dataNigeria", "geospatial")
shapefile_path shapefile_path
[1] "D:/f4sared/ISSS624/data/dataNigeria/geospatial"
Some useful link for the CRS:
st_read() belongs to the sf package. It reads simple features from file or database. Simple features or simple feature access refers to formal standard ISO 19125-1:2004 that describes how real world data can be represented in computers, with emphasis on the spatial geometry of these objects. Link below:
To find the CRS of the shapefile, open the .prj file as a text. It will tell you which projection system is being used.
Read the shapefile using st_read() belonging to the sf package:
The data read will be saved as a simple feature data table.
We will use the filter() function of dplyr package to filter only rows for Nigeria
# wp <- st_read(
# dsn = shapefile_path,
# layer = "geo_export",
# crs = 4326) %>%
# filter(clean_coun == "Nigeria")
Generate the save path using here function:
<- here("data", "dataNigeria", "geospatial", "wp_nga.rds")
savefile_path savefile_path
[1] "D:/f4sared/ISSS624/data/dataNigeria/geospatial/wp_nga.rds"
We will next save the file using write_rds() of the tidyverse package:
rds is a native data format of R.
# wp_ng <- write_rds(wp, savefile_path)
Import Nigeria geo-boundary file:
Next we will make the path to the geo boundary file:
<- here("data", "dataNigeria", "boundary")
shapefile_path shapefile_path
[1] "D:/f4sared/ISSS624/data/dataNigeria/boundary"
Next we will Import the Nigeria LGA Boundary Data with st_read() function:
The imported data will be saved as a simple features dataset.
<- st_read(
nga dsn = shapefile_path,
layer = "geoBoundaries-NGA-ADM2",
crs = 4326)
Reading layer `geoBoundaries-NGA-ADM2' from data source
`D:\f4sared\ISSS624\data\dataNigeria\boundary' using driver `ESRI Shapefile'
Simple feature collection with 774 features and 5 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: 2.668534 ymin: 4.273007 xmax: 14.67882 ymax: 13.89442
Geodetic CRS: WGS 84
Downsize further the wp_nga data:
Load the previously saved data:
We will select specific columns using select().
# final <- read_rds(rdsfile_path) %>% select(1:2, 14:17, 23)
Create the path for saving the file
# savefile_path <- here("data", "dataNigeria", "geospatial", "wp_nga_v2.rds")
# savefile_path
Save the file:
# write_rds(final, savefile_path)
Step 3: Data wrangling
Visualize Initial distribution
Generate path to rds file saved previously:
<- here("data", "dataNigeria", "geospatial","wp_nga_v2.rds")
rdsfile_path rdsfile_path
[1] "D:/f4sared/ISSS624/data/dataNigeria/geospatial/wp_nga_v2.rds"
Load the rds file with read_rds() function of the tidyverse package:
We will also make use of the piping to replace the “na” values with “unknown”.
mutate() is a function of the dplyr package.
<- read_rds(rdsfile_path) %>%
wp_nga mutate(status_cle = replace_na(status_cle, "Unknown"))
Check the CRS of the spatial datafile with st_crs():
st_crs(wp_nga)
Coordinate Reference System:
User input: EPSG:4326
wkt:
GEOGCRS["WGS 84",
DATUM["World Geodetic System 1984",
ELLIPSOID["WGS 84",6378137,298.257223563,
LENGTHUNIT["metre",1]]],
PRIMEM["Greenwich",0,
ANGLEUNIT["degree",0.0174532925199433]],
CS[ellipsoidal,2],
AXIS["geodetic latitude (Lat)",north,
ORDER[1],
ANGLEUNIT["degree",0.0174532925199433]],
AXIS["geodetic longitude (Lon)",east,
ORDER[2],
ANGLEUNIT["degree",0.0174532925199433]],
USAGE[
SCOPE["Horizontal component of 3D system."],
AREA["World."],
BBOX[-90,-180,90,180]],
ID["EPSG",4326]]
Use the freq() of the funModeling package to show the distribution percentage of status_cle:
freq(data=wp_nga,
input = 'status_cle')
Warning: The `<scale>` argument of `guides()` cannot be `FALSE`. Use "none" instead as
of ggplot2 3.3.4.
ℹ The deprecated feature was likely used in the funModeling package.
Please report the issue at <https://github.com/pablo14/funModeling/issues>.
status_cle frequency percentage cumulative_perc
1 Functional 45883 48.29 48.29
2 Non-Functional 29385 30.93 79.22
3 Unknown 10656 11.22 90.44
4 Functional but needs repair 4579 4.82 95.26
5 Non-Functional due to dry season 2403 2.53 97.79
6 Functional but not in use 1686 1.77 99.56
7 Abandoned/Decommissioned 234 0.25 99.81
8 Abandoned 175 0.18 99.99
9 Non functional due to dry season 7 0.01 100.00
Filter for functional water-points:
Here we will use the filter() function from the dplyr package to select “functional” rows only:
We use the %in% to denote the membership in the group of strings.
<- wp_nga %>%
wpt_functional filter(status_cle %in%
c("Functional",
"Functional but not in use",
"Functional but needs repair"))
Then we will plot with freq() function from funModeling to show the distribution:
freq(data=wpt_functional,
input = 'status_cle')
status_cle frequency percentage cumulative_perc
1 Functional 45883 87.99 87.99
2 Functional but needs repair 4579 8.78 96.77
3 Functional but not in use 1686 3.23 100.00
Filter for non-functional
Filter for non-functional rows:
Use %in% for to select rows that fall into the specific categories.
<- wp_nga %>%
wpt_nonfunctional filter(status_cle %in%
c("Abandoned/Decommissioned",
"Abandoned",
"Non-Functional",
"Non functional due to dry season",
"Non-Functional due to dry season"))
Plot the distribution with the freq() function:
freq(data=wpt_nonfunctional,
input = 'status_cle')
status_cle frequency percentage cumulative_perc
1 Non-Functional 29385 91.25 91.25
2 Non-Functional due to dry season 2403 7.46 98.71
3 Abandoned/Decommissioned 234 0.73 99.44
4 Abandoned 175 0.54 99.98
5 Non functional due to dry season 7 0.02 100.00
Filter for unknown:
Lastly we filter for the rows that have unknown status:
<- wp_nga %>%
wpt_unknown filter(status_cle == "Unknown")
Perform data manipulation (Point-In-Polygon):
Using st_intersects, we will be able to create a list of rows from wp_nga that intersects each row of nga.
For the intersection to work, st_intersect will check if each point falls within the polygon of nga.
Next we use the lengths() function to count the number of instances. Then we append to a new column.
We repeat this step across all 3 categories of Functional, Non-Functional & Unknown:
<- nga %>%
nga_wp mutate(`total wpt` = lengths(
st_intersects(nga, wp_nga))) %>%
mutate(`wpt functional` = lengths(
st_intersects(nga, wpt_functional))) %>%
mutate(`wpt non-functional` = lengths(
st_intersects(nga, wpt_nonfunctional))) %>%
mutate(`wpt unknown` = lengths(
st_intersects(nga, wpt_unknown)))
Next, using the mutate() function of dplyr, we will create 2 new columns:
pct_functional = `wpt functional`/`total wpt`
pct_non-functional = `wpt non-functional`/`total wpt`
We will then use select() of dplyr to retain the fields that we require.
<- nga_wp %>%
nga_wp mutate(pct_functional = `wpt functional`/`total wpt`) %>%
mutate(`pct_non-functional` = `wpt non-functional`/`total wpt`)
We will then create a save file path:
<- here("data", "dataNigeria", "geospatial", "nga_wp.rds")
savefile_path savefile_path
[1] "D:/f4sared/ISSS624/data/dataNigeria/geospatial/nga_wp.rds"
Next we will save this final dataframe using write_rds() of tidyverse package:
write_rds(nga_wp, savefile_path)
Step 4: Plot distribution of water point types
Plot the distribution of the water points using qtm() package of tmap:
Here we will add the additional settings to better adjust the size of the legend.
This is done via the tm_layout() function.
<- read_rds(savefile_path)
nga_wp
<- qtm(nga_wp, fill = "total wpt") +
total tm_layout(legend.height = 0.4,legend.width = 0.4)
<- qtm(nga_wp, fill = "wpt functional") +
wp_functional tm_layout(legend.height = 0.4,legend.width = 0.4)
<- qtm(nga_wp, fill = "wpt non-functional") +
wp_nonfunctional tm_layout(legend.height = 0.4,legend.width = 0.4)
<- qtm(nga_wp, fill = "wpt unknown") +
unknown tm_layout(legend.height = 0.4,legend.width = 0.4)
tmap_arrange(total, wp_functional, wp_nonfunctional, unknown, asp=2, ncol=2)
Step 5: Transform to EPSG 26391
We will use the st_crs() function to check the current EPSG of the sf datafile:
st_crs(nga_wp)
Coordinate Reference System:
User input: EPSG:4326
wkt:
GEOGCRS["WGS 84",
ENSEMBLE["World Geodetic System 1984 ensemble",
MEMBER["World Geodetic System 1984 (Transit)"],
MEMBER["World Geodetic System 1984 (G730)"],
MEMBER["World Geodetic System 1984 (G873)"],
MEMBER["World Geodetic System 1984 (G1150)"],
MEMBER["World Geodetic System 1984 (G1674)"],
MEMBER["World Geodetic System 1984 (G1762)"],
MEMBER["World Geodetic System 1984 (G2139)"],
ELLIPSOID["WGS 84",6378137,298.257223563,
LENGTHUNIT["metre",1]],
ENSEMBLEACCURACY[2.0]],
PRIMEM["Greenwich",0,
ANGLEUNIT["degree",0.0174532925199433]],
CS[ellipsoidal,2],
AXIS["geodetic latitude (Lat)",north,
ORDER[1],
ANGLEUNIT["degree",0.0174532925199433]],
AXIS["geodetic longitude (Lon)",east,
ORDER[2],
ANGLEUNIT["degree",0.0174532925199433]],
USAGE[
SCOPE["Horizontal component of 3D system."],
AREA["World."],
BBOX[-90,-180,90,180]],
ID["EPSG",4326]]
Next, we will use st_transform() to project to the new crs 26391:
<- st_transform(nga_wp, crs = 26391) nga_wp_26391
We then check the new CRS:
st_crs(nga_wp_26391)
Coordinate Reference System:
User input: EPSG:26391
wkt:
PROJCRS["Minna / Nigeria West Belt",
BASEGEOGCRS["Minna",
DATUM["Minna",
ELLIPSOID["Clarke 1880 (RGS)",6378249.145,293.465,
LENGTHUNIT["metre",1]]],
PRIMEM["Greenwich",0,
ANGLEUNIT["degree",0.0174532925199433]],
ID["EPSG",4263]],
CONVERSION["Nigeria West Belt",
METHOD["Transverse Mercator",
ID["EPSG",9807]],
PARAMETER["Latitude of natural origin",4,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8801]],
PARAMETER["Longitude of natural origin",4.5,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8802]],
PARAMETER["Scale factor at natural origin",0.99975,
SCALEUNIT["unity",1],
ID["EPSG",8805]],
PARAMETER["False easting",230738.26,
LENGTHUNIT["metre",1],
ID["EPSG",8806]],
PARAMETER["False northing",0,
LENGTHUNIT["metre",1],
ID["EPSG",8807]]],
CS[Cartesian,2],
AXIS["(E)",east,
ORDER[1],
LENGTHUNIT["metre",1]],
AXIS["(N)",north,
ORDER[2],
LENGTHUNIT["metre",1]],
USAGE[
SCOPE["Engineering survey, topographic mapping."],
AREA["Nigeria - onshore west of 6°30'E, onshore and offshore shelf."],
BBOX[3.57,2.69,13.9,6.5]],
ID["EPSG",26391]]
Once again, we will plot the map to check if everything has been projected correctly:
<- qtm(nga_wp_26391, fill = "total wpt") +
total tm_layout(legend.height = 0.4,legend.width = 0.4)
<- qtm(nga_wp_26391, fill = "wpt functional") +
wp_functional tm_layout(legend.height = 0.4,legend.width = 0.4)
<- qtm(nga_wp_26391, fill = "wpt non-functional") +
wp_nonfunctional tm_layout(legend.height = 0.4,legend.width = 0.4)
<- qtm(nga_wp_26391, fill = "wpt unknown") +
unknown tm_layout(legend.height = 0.4,legend.width = 0.4)
tmap_arrange(total, wp_functional, wp_nonfunctional, unknown, asp=2, ncol=2)
Step 6: Thematic mapping with qtm() & tmap()
Plotting with qtm()
Set the tmap mode to “view” for interactive mode. Else set to “plot”.
qtm() represents quick plotting with tmap package:
tmap_mode("plot")
tmap mode set to plotting
qtm(nga_wp_26391,
fill = "wpt non-functional")
Plotting with tmap()
Jenks:
Jenks: Identify groups with similar values and maximizes the difference between them.
Using interactive tmap(), we will plot using “jenks” classification:
tmap_mode("view")
tmap mode set to interactive viewing
tm_shape(nga_wp_26391)+
tm_fill(c("wpt non-functional", "wpt functional"),
style = "jenks") +
tm_borders(alpha = 0.5) +
tmap_style("white")
tmap style set to "white"
other available styles are: "gray", "natural", "cobalt", "col_blind", "albatross", "beaver", "bw", "classic", "watercolor"
Using the “jenks” style, we are able to identify visually some areas with high value already.
Quantile:
Splits variables in quantiles, Consequently there are same number of observations in each interval.
Using interactive tmap(), we will plot using “quantile” classification:
tm_shape(nga_wp_26391)+
tm_fill(c("wpt non-functional", "wpt functional"),
style = "quantile") +
tm_borders(alpha = 0.5) +
tmap_style("white")
tmap style set to "white"
other available styles are: "gray", "natural", "cobalt", "col_blind", "albatross", "beaver", "bw", "classic", "watercolor"
When using “quantile” style, we observed a surge in the darker area. Having a closer look at the quantile, we can see that the last quantile has a much bigger band, this mean that quantile may not be too suitable.
Equal:
We will then consider an alternative “equal”:
tm_shape(nga_wp_26391)+
tm_fill(c("wpt non-functional", "wpt functional"),
style = "equal") +
tm_borders(alpha = 0.5) +
tmap_style("white")
tmap style set to "white"
other available styles are: "gray", "natural", "cobalt", "col_blind", "albatross", "beaver", "bw", "classic", "watercolor"
This distribution above matches that of the “jenks” style. We can spot similar patterns on the plot.
We will use bbox to get better visuals:
https://stackoverflow.com/questions/60892033/how-do-you-position-the-title-and-legend-in-tmap
tmap_mode("plot")
tmap mode set to plotting
# make some bbox magic
<- st_bbox(nga_wp_26391)
bbox_new <- bbox_new$xmax - bbox_new$xmin # range of x values
xrange <- bbox_new$ymax - bbox_new$ymin # range of y values
yrange # bbox_new[1] <- bbox_new[1] - (0.25 * xrange) # xmin - left
3] <- bbox_new[3] + (0.75 * xrange) # xmax - right
bbox_new[# bbox_new[2] <- bbox_new[2] - (0.25 * yrange) # ymin - bottom
4] <- bbox_new[4] + (0.01 * yrange) # ymax - top
bbox_new[<- bbox_new %>% # take the bounding box ...
bbox_new st_as_sfc() # ... and make it a sf polygon
tm_shape(nga_wp_26391, bbox = bbox_new)+
tm_fill("wpt non-functional",
style = "jenks",
legend.hist = TRUE,
legend.is.portrait = TRUE,
legend.hist.z = 0.5) +
tm_borders(alpha = 0.5) +
tm_layout(main.title = "Jenks Plot of WPT Non-Functional",
main.title.position = "center",
main.title.size = 1,
legend.position = c("right", "center"),
frame = TRUE) +
tm_compass(type="8star", size = 2) +
tm_scale_bar(width = 0.2, position= c("center", "bottom")) +
tm_grid(lwd = 0.1, alpha = 0.2)
tmap_mode("plot")
tmap mode set to plotting
# make some bbox magic
<- st_bbox(nga_wp_26391)
bbox_new <- bbox_new$xmax - bbox_new$xmin # range of x values
xrange <- bbox_new$ymax - bbox_new$ymin # range of y values
yrange # bbox_new[1] <- bbox_new[1] - (0.25 * xrange) # xmin - left
3] <- bbox_new[3] + (0.75 * xrange) # xmax - right
bbox_new[# bbox_new[2] <- bbox_new[2] - (0.25 * yrange) # ymin - bottom
4] <- bbox_new[4] + (0.01 * yrange) # ymax - top
bbox_new[<- bbox_new %>% # take the bounding box ...
bbox_new st_as_sfc() # ... and make it a sf polygon
tm_shape(nga_wp_26391, bbox = bbox_new)+
tm_fill("wpt functional",
style = "jenks",
legend.hist = TRUE,
legend.is.portrait = TRUE,
legend.hist.z = 0.5) +
tm_borders(alpha = 0.5) +
tm_layout(main.title = "Jenks Plot of WPT Non-Functional",
main.title.position = "center",
main.title.size = 1,
legend.position = c("right", "center"),
frame = TRUE) +
tm_compass(type="8star", size = 2) +
tm_scale_bar(width = 0.2, position= c("center", "bottom")) +
tm_grid(lwd = 0.1, alpha = 0.2)
Step 7: Prepare the Spatial Weights
Usually for geospatial weights preparation, there are 2 options:
Contiguity based weights
- Useful when the polygons are of similar size
Distance based weights
Fixed distance
K-nearest neighbors (Adaptive distance)
Inversed-Distance
For our analysis, we will consider using the fixed distance based weights because the polygons size is not spread evenly across the map of Nigeria.
Prepare the coordinates
In order to work with fixed distance weight matrix, we need to prepare the coordinates of the center of gravity (COG) for all the polygons.
To achieve this, we will need to pass the column “geometry” into the function st_centroid() from the sf package. This function will calculate the COG and return us the coordinates accordingly. We will also make use of the map_dbl() function from the purrr package to apply the st_centroid() function to row.
Longitude:
<- map_dbl(nga_wp_26391$geometry, ~st_centroid(.x)[[1]]) longitude
Latitude:
<- map_dbl(nga_wp_26391$geometry, ~st_centroid(.x)[[2]]) latitude
Next, we will use the function of cbind() from the base package to bind our coordinates together.
Bind long & lat:
<- cbind(longitude, latitude) coords
Check if output is correct:
head(coords)
longitude latitude
[1,] 549364.0 123694.9
[2,] 547123.4 120376.5
[3,] 1189496.9 1059770.9
[4,] 489057.4 534262.6
[5,] 593718.2 113824.1
[6,] 642618.7 251222.3
Determine cut off distance:
In order to ensure that each polygon has at least 1 neighbor, we need to determine the cut off distance. To do this, we will first run the k-nearest neighbor with k=1. This is achieved with the function of knearneigh() of the spdep package.
We run the code as follow:
<- knn2nb(knearneigh(coords)) k1
After obtaining k1, a list of 774 polygons where each row shows the nearest neighbor polygon ID, we next need to calculate the distance between all of them using nbdists() of spdep.
Next, in order to calculate the summary, we need then to unlist() the output of nbdists(). This is then followed by using the summary() function which reports to us the max distance for us to use.
We run the code as follow:
<- unlist(nbdists(k1, coords, longlat = FALSE))
k1dists summary(k1dists)
Min. 1st Qu. Median Mean 3rd Qu. Max.
2669 12834 20304 22084 27783 72139
From the above summary, it shows that the minimum distance for every polygon to have a neighbor is 71.661, thus we will use the value 72140.
Compute the fixed distance neighbor list:
With the above information, we will then compute the fixed distance weight matrix using dnearneigh() function from spdep. For this to work, we will need the coordinate of the polygons, min dist & max dist.
We will run the code as follow, this will give us a nb (neighbor) object data wm_d72:
<- dnearneigh(coords, 0, 72140, longlat = FALSE)
nb_d72 nb_d72
Neighbour list object:
Number of regions: 774
Number of nonzero links: 18130
Percentage nonzero weights: 3.026327
Average number of links: 23.42377
Next, we will display our results for viewing:
str(nb_d72)
List of 774
$ : int [1:63] 2 5 10 25 55 66 68 103 122 181 ...
$ : int [1:62] 1 5 10 25 55 66 68 103 122 181 ...
$ : int [1:2] 261 447
$ : int [1:10] 12 20 257 263 446 454 466 641 690 695
$ : int [1:56] 1 2 55 66 104 136 137 169 184 202 ...
$ : int [1:21] 9 14 18 19 56 170 217 218 330 337 ...
$ : int [1:19] 8 15 22 176 177 214 281 282 283 295 ...
$ : int [1:32] 7 15 22 49 176 177 214 275 276 277 ...
$ : int [1:26] 6 18 19 56 66 77 103 104 217 218 ...
$ : int [1:64] 1 2 23 25 66 103 181 190 191 203 ...
$ : int [1:22] 26 27 43 68 126 157 190 191 204 336 ...
$ : int [1:11] 4 135 257 263 401 417 429 446 454 690 ...
$ : int [1:13] 31 37 38 40 94 211 320 393 436 471 ...
$ : int [1:24] 6 170 193 194 195 217 309 310 311 362 ...
$ : int [1:27] 7 8 22 32 49 51 62 82 176 177 ...
$ : int [1:37] 30 38 39 41 44 45 70 71 120 124 ...
$ : int [1:34] 28 29 35 72 172 173 178 179 182 275 ...
$ : int [1:29] 6 9 19 56 66 77 103 104 217 218 ...
$ : int [1:41] 6 9 18 25 56 66 77 103 104 181 ...
$ : int [1:7] 4 106 239 263 419 454 466
$ : int [1:9] 60 61 162 269 484 520 578 596 626
$ : int [1:31] 7 8 15 32 49 51 62 82 176 177 ...
$ : int [1:64] 10 25 52 53 54 56 58 77 78 79 ...
$ : int [1:5] 123 476 527 673 761
$ : int [1:68] 1 2 10 19 23 54 56 66 77 103 ...
$ : int [1:30] 11 27 43 68 157 190 191 204 336 370 ...
$ : int [1:24] 11 26 43 68 157 191 204 336 370 371 ...
$ : int [1:43] 17 29 35 70 71 124 172 173 178 179 ...
$ : int [1:45] 17 28 35 70 71 124 172 173 178 179 ...
$ : int [1:30] 16 38 39 40 41 44 45 175 185 186 ...
$ : int [1:13] 13 37 94 158 210 211 212 289 308 561 ...
$ : int [1:28] 15 22 49 51 62 82 177 196 207 214 ...
$ : int [1:29] 47 111 130 142 145 155 166 219 227 233 ...
$ : int [1:11] 42 86 104 136 137 213 375 553 559 733 ...
$ : int [1:33] 17 28 29 72 172 173 178 179 182 275 ...
$ : int [1:8] 50 107 247 408 432 455 681 759
$ : int [1:21] 13 31 38 39 40 41 186 192 197 198 ...
$ : int [1:25] 13 16 30 37 39 40 41 44 186 192 ...
$ : int [1:27] 16 30 37 38 40 41 44 185 186 192 ...
$ : int [1:21] 13 30 37 38 39 41 44 186 192 211 ...
$ : int [1:22] 16 30 37 38 39 40 44 45 186 192 ...
$ : int [1:20] 34 86 136 137 184 202 285 286 375 499 ...
$ : int [1:19] 11 26 27 68 122 126 157 190 191 246 ...
$ : int [1:27] 16 30 38 39 40 41 45 70 175 186 ...
$ : int [1:27] 16 30 41 44 70 175 187 188 192 290 ...
$ : int [1:12] 119 380 387 417 423 429 438 459 521 656 ...
$ : int [1:24] 33 111 127 130 155 166 227 234 238 242 ...
$ : int [1:12] 64 65 74 113 131 265 386 407 428 482 ...
$ : int [1:30] 8 15 22 32 51 62 82 176 177 207 ...
$ : int [1:4] 36 107 409 432
$ : int [1:27] 15 22 32 49 62 82 177 207 214 284 ...
$ : int [1:47] 23 53 54 57 58 77 78 79 80 165 ...
$ : int [1:37] 23 52 54 57 58 78 79 80 165 189 ...
$ : int [1:58] 23 25 52 53 56 57 58 77 78 79 ...
$ : int [1:33] 1 2 5 68 122 157 169 184 190 208 ...
$ : int [1:51] 6 9 18 19 23 25 54 66 77 78 ...
$ : int [1:35] 52 53 54 58 78 79 80 165 189 197 ...
$ : int [1:37] 23 52 53 54 57 78 79 165 189 197 ...
$ : int [1:5] 128 129 493 700 748
$ : int [1:14] 21 61 158 269 310 311 561 563 578 589 ...
$ : int [1:11] 21 60 162 268 269 484 578 589 592 596 ...
$ : int [1:28] 15 22 32 49 51 82 177 196 207 214 ...
$ : int [1:5] 384 416 467 765 772
$ : int [1:7] 48 65 74 113 131 265 407
$ : int [1:11] 48 64 74 109 113 265 386 407 683 701 ...
$ : int [1:47] 1 2 5 9 10 18 19 25 56 103 ...
$ : int [1:26] 72 120 124 179 182 304 305 339 346 347 ...
$ : int [1:30] 1 2 11 26 27 43 55 122 157 190 ...
$ : int [1:8] 140 146 248 274 473 500 512 513
$ : int [1:44] 16 28 29 44 45 71 120 124 172 173 ...
$ : int [1:50] 16 28 29 70 120 124 172 173 175 178 ...
$ : int [1:19] 17 35 67 182 361 374 378 404 566 567 ...
$ : int [1:6] 361 374 377 404 665 666
$ : int [1:14] 48 64 65 109 113 116 251 265 672 683 ...
$ : int [1:15] 110 229 255 258 272 373 382 398 422 433 ...
$ : int [1:9] 254 287 427 459 470 547 647 677 751
$ : int [1:55] 9 18 19 23 25 52 54 56 78 79 ...
$ : int [1:51] 23 52 53 54 56 57 58 77 79 80 ...
$ : int [1:57] 23 52 53 54 56 57 58 77 78 80 ...
$ : int [1:39] 23 52 53 54 57 77 78 79 165 189 ...
$ : int [1:19] 99 145 227 233 242 255 270 426 449 483 ...
$ : int [1:21] 15 22 32 49 51 62 177 207 214 297 ...
$ : int [1:6] 132 258 383 414 529 767
$ : int [1:3] 148 437 692
$ : int [1:38] 101 105 130 142 145 155 156 219 235 242 ...
$ : int [1:17] 34 42 136 137 184 202 285 286 499 538 ...
$ : int [1:19] 147 149 151 221 226 245 267 399 410 415 ...
$ : int [1:5] 150 489 648 700 714
$ : int [1:11] 100 107 159 260 408 463 542 674 676 681 ...
$ : int 237
$ : int [1:3] 160 271 406
$ : int [1:11] 95 119 390 391 392 423 487 642 656 668 ...
$ : int [1:3] 354 607 665
$ : int [1:7] 13 31 158 436 561 596 709
$ : int [1:10] 92 390 391 392 405 423 469 656 708 770
$ : int [1:17] 97 108 139 167 168 350 389 403 412 420 ...
$ : int [1:13] 96 108 114 139 147 168 389 403 420 451 ...
$ : int [1:4] 153 231 432 696
$ : int [1:18] 81 145 154 167 227 233 255 270 426 449 ...
[list output truncated]
- attr(*, "class")= chr "nb"
- attr(*, "region.id")= chr [1:774] "1" "2" "3" "4" ...
- attr(*, "call")= language dnearneigh(x = coords, d1 = 0, d2 = 72140, longlat = FALSE)
- attr(*, "dnn")= num [1:2] 0 72140
- attr(*, "bounds")= chr [1:2] "GE" "LE"
- attr(*, "nbtype")= chr "distance"
- attr(*, "sym")= logi TRUE
Plot the fixed distance neighbor list:
Using the plot() function, we will then visualize the neighbors that we have identified.
plot(nga_wp_26391$geometry, border="black", axes = TRUE)
plot(nb_d72, coords, add=TRUE, col="red")
Step 8: Plotting Spatial Lag
Create row standardized weight matrix
We will use the nb2listw() function of spdep package to convert the nb list to weights. We set the style to “W” in order to perform the row-standardized steps.
We run the code as follow:
<- nb2listw(nb_d72, style="W", zero.policy = TRUE) swm_d72
Compute spatial lag with row-standardized weights
Now we will use the lag.listw() function from spdep to return us the lag list. As an input, we will provide the row-standardized weight matrix and the variables “wpt non-functional” & wpt functional.
Lag for non-functional
We run the code as follow:
<- list(nga_wp_26391$shapeName, lag.listw(swm_d72, nga_wp_26391$`wpt non-functional`))
lag.list <- as.data.frame(lag.list)
lag.res_NF colnames(lag.res_NF) <- c("shapeName", "lag_NF")
$lag_NF <- lag.res_NF$lag_NF nga_wp_26391
We will then plot the lagged variable:
<- qtm(nga_wp_26391, "wpt non-functional") +
non_func tm_layout(main.title = "WPT Non-Functional",
main.title.position = "center",
main.title.size = 1,
legend.position = c("right", "bottom"),
legend.width = 0.3,
legend.height = 0.25,
frame = TRUE)
<- qtm(nga_wp_26391, "lag_NF")+
lag_non_func tm_layout(main.title = "Lagged WPT Non-Functional",
main.title.position = "center",
main.title.size = 1,
legend.position = c("right", "bottom"),
legend.width = 0.3,
legend.height = 0.25,
frame = TRUE)
tmap_arrange(non_func, lag_non_func, asp=1, ncol=2)
Lag for functional
We run the code as follow:
<- list(nga_wp_26391$shapeName, lag.listw(swm_d72, nga_wp_26391$`wpt functional`))
lag.list <- as.data.frame(lag.list)
lag.res_F colnames(lag.res_F) <- c("shapeName", "lag_F")
$lag_F <- lag.res_F$lag_F nga_wp_26391
<- qtm(nga_wp_26391, "wpt functional") +
non_func tm_layout(main.title = "WPT Functional",
main.title.position = "center",
main.title.size = 1,
legend.position = c("right", "bottom"),
legend.width = 0.3,
legend.height = 0.25,
frame = TRUE)
<- qtm(nga_wp_26391, "lag_F")+
lag_non_func tm_layout(main.title = "Lagged WPT Functional",
main.title.position = "center",
main.title.size = 1,
legend.position = c("right", "bottom"),
legend.width = 0.3,
legend.height = 0.25,
frame = TRUE)
tmap_arrange(non_func, lag_non_func, asp=1, ncol=2)
Step 9: Global Measure of Spatial Auto Correlation
Global Moran’s I
To get an indication of the global spatial clustering auto-correlation, we will perform the Moran’s I test. This is an analytic approach.
WPT Non-Functional Moran’s I test:
moran.test(nga_wp_26391$`wpt non-functional`,
listw=swm_d72,
zero.policy = TRUE,
na.action=na.omit)
Moran I test under randomisation
data: nga_wp_26391$`wpt non-functional`
weights: swm_d72
Moran I statistic standard deviate = 22.437, p-value < 2.2e-16
alternative hypothesis: greater
sample estimates:
Moran I statistic Expectation Variance
0.3264059335 -0.0012936611 0.0002133094
The global Moran’s I statistic is >> 0 and the p-value is statistically significant. This shows that there is indication of clustering of the non-functional water point.
WPT Functional Moran’s I test:
moran.test(nga_wp_26391$`wpt functional`,
listw=swm_d72,
zero.policy = TRUE,
na.action=na.omit)
Moran I test under randomisation
data: nga_wp_26391$`wpt functional`
weights: swm_d72
Moran I statistic standard deviate = 34.908, p-value < 2.2e-16
alternative hypothesis: greater
sample estimates:
Moran I statistic Expectation Variance
0.5041687788 -0.0012936611 0.0002096643
The global Moran’s I statistic is >> 0 and the p-value is statistically significant. This shows that there is indication of clustering of the functional water point.
Monte Carlo Moran’s I
While the Moran’s I test is fast since it works analytically, we will need to perform Monte Carlo Moran’s I. This method allows us to perform simulation by generating many random datasets across multiple simulations.
Our Moran’s I should fall to either extremes of the Moran’s I histogram from the simulation. This shows that our Moran’s I value did not occur because of randomization.
For the Monte Carlo Moran’s I test, we will use the moran.mc() function from the spdep package.
WPT Non-Functional Moran’s I test:
set.seed(1234)
= moran.mc(nga_wp_26391$`wpt non-functional`,
bperm_NFlistw=swm_d72,
nsim=999,
zero.policy = TRUE,
na.action=na.omit)
bperm_NF
Monte-Carlo simulation of Moran I
data: nga_wp_26391$`wpt non-functional`
weights: swm_d72
number of simulations + 1: 1000
statistic = 0.32641, observed rank = 1000, p-value = 0.001
alternative hypothesis: greater
Here we can see that our results are statistically significant with p value of 0.001. Also the Moran’s I calculated is similar to the analytic test.
We will visualize the simulation results from Monte Carlo as follow:
hist(bperm_NF$res,
freq=TRUE,
breaks=20,
xlab="Simulated Moran's I for Non-Functional")
abline(v=0,
col="red")
WPT Functional Moran’s I test:
set.seed(1234)
= moran.mc(nga_wp_26391$`wpt functional`,
bperm_Flistw=swm_d72,
nsim=999,
zero.policy = TRUE,
na.action=na.omit)
bperm_F
Monte-Carlo simulation of Moran I
data: nga_wp_26391$`wpt functional`
weights: swm_d72
number of simulations + 1: 1000
statistic = 0.50417, observed rank = 1000, p-value = 0.001
alternative hypothesis: greater
Here we can see that our results are statistically significant with p value of 0.001. Also the Moran’s I calculated is similar to the analytic test.
We will visualize the simulation results from Monte Carlo as follow:
hist(bperm_F$res,
freq=TRUE,
breaks=20,
xlab="Simulated Moran's I for Non-Functional")
abline(v=0,
col="red")
Step 10: Correlogram for Global Moran’s I
In the earlier section, we did the computation for Moran’s I and Monte Carlo Moran’s I. Both of them are a global measure of spatial autocorrelation. For both the non-functional and functional water point, we observe the presence of spatial correlation.
Another way for us to examine the spatial autocorrelation is by using correlogram. These diagram provide useful insights to how the Moran’s I or Geary’s C changes as the spatial lag/distance increase.
With the increase of spatial lad / distance, more neighbors are now included within the weight matrix. This then have further impact on the measure of spatial correlation.
For this section, we will use sp.correlogram() of spdep package. This function will require the neighbor object that we have calculated precious in step 7. We also need to state the style input. We will choose “W” for row-standardization.
We run the correlogram for the non-functional water point as follow:
<- sp.correlogram(nb_d72,
MI_corr_NF $`wpt non-functional`,
nga_wp_26391order=6,
method="I",
style="W",
zero.policy=TRUE)
plot(MI_corr_NF)
We check the output for statistical significance:
print(MI_corr_NF)
Spatial correlogram for nga_wp_26391$`wpt non-functional`
method: Moran's I
estimate expectation variance standard deviate Pr(I) two sided
1 (774) 3.2641e-01 -1.2937e-03 2.1331e-04 22.4373 < 2.2e-16
2 (772) 1.7632e-01 -1.2970e-03 1.0426e-04 17.3957 < 2.2e-16
3 (772) 7.5578e-02 -1.2970e-03 7.8364e-05 8.6841 < 2.2e-16
4 (772) -8.3811e-03 -1.2970e-03 7.0841e-05 -0.8417 0.4
5 (772) -7.8042e-02 -1.2970e-03 6.4305e-05 -9.5703 < 2.2e-16
6 (772) -5.8045e-02 -1.2970e-03 5.6307e-05 -7.5626 3.952e-14
1 (774) ***
2 (772) ***
3 (772) ***
4 (772)
5 (772) ***
6 (772) ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
For the non-functional water points, our Moran’s I decrease with increase lag order. The p values for the first 3 rows are statistically significant.
We run the correlogram for the functional water point as follow:
<- sp.correlogram(nb_d72,
MI_corr_F $`wpt functional`,
nga_wp_26391order=10,
method="I",
style="W",
zero.policy=TRUE)
plot(MI_corr_F)
We check the output for statistical significance:
print(MI_corr_F)
Spatial correlogram for nga_wp_26391$`wpt functional`
method: Moran's I
estimate expectation variance standard deviate Pr(I) two sided
1 (774) 5.0417e-01 -1.2937e-03 2.0966e-04 34.9081 < 2e-16
2 (772) 3.8586e-01 -1.2970e-03 1.0247e-04 38.2455 < 2e-16
3 (772) 3.2289e-01 -1.2970e-03 7.7024e-05 36.9384 < 2e-16
4 (772) 2.3024e-01 -1.2970e-03 6.9640e-05 27.7456 < 2e-16
5 (772) 1.3859e-01 -1.2970e-03 6.3217e-05 17.5942 < 2e-16
6 (772) 6.0259e-02 -1.2970e-03 5.5351e-05 8.2738 < 2e-16
7 (772) -1.4690e-02 -1.2970e-03 4.9489e-05 -1.9038 0.05694
8 (772) -1.0019e-01 -1.2970e-03 4.8999e-05 -14.1278 < 2e-16
9 (772) -1.5706e-01 -1.2970e-03 5.4069e-05 -21.1829 < 2e-16
10 (772) -1.7121e-01 -1.2970e-03 5.9742e-05 -21.9828 < 2e-16
1 (774) ***
2 (772) ***
3 (772) ***
4 (772) ***
5 (772) ***
6 (772) ***
7 (772) .
8 (772) ***
9 (772) ***
10 (772) ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
For the functional water points, our Moran’s I decrease more slowly lag order. The p values for the first 6 rows are statistically significant.
Step 11: Local Measure of Spatial Auto Correlation
While global Moran’s I is a useful statistic to tell us the presence of spatial auto-correlation across the map, it provides little info about the locality of the auto correlation.
The is where local Moran’s I come in to save the day. A high local Moran’s I statistic indicates similar values clustering around a polygon which is a good indication of spatial auto correlation.
Non-Functional Water Points (Local Moran’s I)
We will use the function order() to create an ordered name list of the areas. This is then followed by the function localmoran() from spdep. Our variable of interest and row-standardized weight matrix is required.
<- localmoran(nga_wp_26391$`wpt non-functional`, swm_d72)
localMI_NF head(localMI_NF)
Ii E.Ii Var.Ii Z.Ii Pr(z != E(Ii))
1 0.361394136 -9.995243e-04 1.128237e-02 3.4117747 0.0006454144
2 0.074414950 -4.092463e-05 4.705097e-04 3.4325327 0.0005979717
3 1.258199847 -1.627684e-03 6.280738e-01 1.5896655 0.1119102304
4 -0.006652507 -5.427505e-05 4.151689e-03 -0.1024036 0.9184363392
5 0.082615173 -2.590965e-04 3.325093e-03 1.4372021 0.1506605779
6 0.006672593 -1.538445e-07 5.523369e-06 2.8392431 0.0045220690
<- data.frame(localMI_NF)
localMI_NF colnames(localMI_NF)[5] ="Pr"
$lc_Ii_NF <- localMI_NF$'Ii'
nga_wp_26391$lc_Pr_NF <- localMI_NF$'Pr' nga_wp_26391
<- tm_shape(nga_wp_26391) +
localMI.map_NF tm_fill(col = "lc_Ii_NF",
style = "jenks",
title = "local moran statistics") +
tm_borders(alpha = 0.5)+
tm_layout(main.title = "Plot of Local Moran's I",
main.title.position = "center",
main.title.size = 1,
legend.position = c("right", "bottom"),
frame = TRUE)
<- tm_shape(nga_wp_26391) +
pvalue.map_NF tm_fill(col = "lc_Pr_NF",
breaks=c(-Inf, 0.001, 0.01, 0.05, 0.1, Inf),
palette="-Blues",
title = "local Moran's I p-values") +
tm_borders(alpha = 0.5)+
tm_layout(main.title = "Plot of Local Moran's I P-Value",
main.title.position = "center",
main.title.size = 1,
legend.position = c("right", "bottom"),
frame = TRUE)
tmap_arrange(localMI.map_NF, pvalue.map_NF, asp=1, ncol=2)
Variable(s) "lc_Ii_NF" contains positive and negative values, so midpoint is set to 0. Set midpoint = NA to show the full spectrum of the color palette.
Functional Water Points (Local Moran’s I)
We perform the same steps for Functional water points:
<- localmoran(nga_wp_26391$`wpt functional`, swm_d72)
localMI_F head(localMI_F)
Ii E.Ii Var.Ii Z.Ii Pr(z != E(Ii))
1 0.43151603 -7.191834e-04 0.008120236 4.7966255 1.613609e-06
2 0.27475350 -2.904635e-04 0.003338620 4.7601280 1.934703e-06
3 0.69235062 -8.956670e-04 0.345864093 1.1787856 2.384836e-01
4 0.05590525 -3.884365e-04 0.029702941 0.3266329 7.439455e-01
5 0.33277612 -3.884365e-04 0.004984321 4.7190630 2.369335e-06
6 0.05909213 -4.231402e-05 0.001519106 1.5172126 1.292130e-01
<- data.frame(localMI_F)
localMI_F colnames(localMI_F)[5] ="Pr"
$lc_Ii_F <- localMI_F$'Ii'
nga_wp_26391$lc_Pr_F <- localMI_F$'Pr' nga_wp_26391
<- tm_shape(nga_wp_26391) +
localMI.map_F tm_fill(col = "lc_Ii_F",
style = "jenks",
title = "local moran statistics") +
tm_borders(alpha = 0.5)+
tm_layout(main.title = "Plot of Local Moran's I",
main.title.position = "center",
main.title.size = 1,
legend.position = c("right", "bottom"),
frame = TRUE)
<- tm_shape(nga_wp_26391) +
pvalue.map_F tm_fill(col = "lc_Pr_F",
breaks=c(-Inf, 0.001, 0.01, 0.05, 0.1, Inf),
palette="-Blues",
title = "local Moran's I p-values") +
tm_borders(alpha = 0.5)+
tm_layout(main.title = "Plot of Local Moran's I P-Value",
main.title.position = "center",
main.title.size = 1,
legend.position = c("right", "bottom"),
frame = TRUE)
tmap_arrange(localMI.map_F, pvalue.map_F, asp=1, ncol=2)
Variable(s) "lc_Ii_F" contains positive and negative values, so midpoint is set to 0. Set midpoint = NA to show the full spectrum of the color palette.
Step 12: Moran Scatterplot
Moran’s scatterplot is a useful tool for us to visualize the grouping of the polygons when we plot their variable of interest against the lagged variables. In the case there we observe spatial clustering of the variables across a cluster of polygons, we will expect the individual value of each polygon to be highly similar to the lagged variable. The lagged variable in this case refers to the average of the variable across the neighbors of the polygon.
Thus, points in the top-right and bottom-left of the Moran’s scatter plot represent polygons whose variable is highly similar with that of their neighbors. This is a good indication of spatial autocorrelation and clustering. On the other hand, points in the top-left and bottom-right of the scatter plot show polygon whose variable is significantly different from the average variable across the neighbors. This is an indication of negative autocorrelation or outlier.
The slope of the repression line in the Moran’s scatterplot represents the estimation of the global Moran’s I
Scatterplot for Non-Functional water points
To plot the Moran’s I scatterplot, we will use the moran.plot() function from spdep. We will need to provide the sf dataframe with the variable column along with the weight matrix. In this case, we will use our row-standardized weight matrix.
<- moran.plot(nga_wp_26391$`wpt non-functional`, swm_d72,
NF_scatter labels=as.character(nga_wp_26391$shapeName),
xlab="wpt non-functional",
ylab="Spatially lagged wpt non-functional")
To help with comparison and ease of reading the scatterplot, it helps to perform some scaling. For this we will use the scale() function with divides the column by the largest value. We then use the piping function to save it as a vector before adding it to our dataframe.
We perform the following step:
<- scale(nga_wp_26391$`wpt non-functional`) %>% as.vector
scaled_NF $scaled_NF <- scaled_NF nga_wp_26391
We then plot the scatterplot once more:
<- moran.plot(nga_wp_26391$scaled_NF, swm_d72,
NF_scatter_scaled labels=as.character(nga_wp_26391$shapeName),
xlab="wpt non-functional",
ylab="Spatially lagged wpt non-functional")
After scaling, we can see now that the axis is re-positioned to (0,0). From this scatterplot, we can see easily that there groups of polygons whose value exhibit clustering behavior. Interestingly, there are quite a number of outliers in the top-left and bottom right quadrant.
Scatterplot for Functional water points
We do the same for Functional water pints:
<- moran.plot(nga_wp_26391$`wpt functional`, swm_d72,
F_scatter labels=as.character(nga_wp_26391$shapeName),
xlab="wpt functional",
ylab="Spatially lagged wpt functional")
We perform scaling as follow:
<- scale(nga_wp_26391$`wpt functional`) %>% as.vector
scaled_F $scaled_F <- scaled_F nga_wp_26391
We plot the scaled Moran’s I scatterplot as follow:
<- moran.plot(nga_wp_26391$scaled_F, swm_d72,
F_scatter_scaled labels=as.character(nga_wp_26391$shapeName),
xlab="wpt non-functional",
ylab="Spatially lagged wpt non-functional")
We see here significant number of points in the top-left and bottom right quadrant which is indicative of the spatial auto correlation and clustering. Similarly, we also see outliers in the plot.
Step 13: LISA map
While Moran’s I scatterplot provides a useful insight into the distribution of the points over 4 different types of quadrant, we have no idea about the location of each point relative to space. Thus it makes sense to indicate this on the map which then allows us to identify regions with significant spatial correlation and clustering.
LISA: Non-Functional water point
To achieve this, we will first create a vector to hold the different categories. We do this with a the function vector(), there should be 774 reflecting 774 polygons:
<- vector(mode="numeric",length=nrow(localMI_NF)) quadrant_NF
Previously in step 8, we have already computed the spatial lag based on row-standardized weight matrix. We view the value once more:
$lag_NF[1:20] nga_wp_26391
[1] 26.34921 26.08065 0.00000 42.80000 34.75000 64.28571 31.15789 28.18750
[9] 46.53846 23.23438 13.90909 49.36364 52.38462 32.16667 19.74074 67.89189
[17] 46.11765 40.75862 36.82927 43.57143
We then prepare a numeric vector to hold variable of interest that is mean adjusted:
<- nga_wp_26391$`wpt non-functional` - mean(nga_wp_26391$`wpt non-functional`) DV_NF
We also prepare a numeric vector of the local Moran’s I value:
<- localMI_NF[,1] - mean(localMI_NF[,1]) LM_I_NF
We will also set the significance level, that help us filter away rows with non-significant P values:
<- 0.05 signif
Next we will work to fill up the value of the quadrant_NF:
<0 & LM_I_NF>0] <- 1
quadrant_NF[DV_NF >0 & LM_I_NF<0] <- 2
quadrant_NF[DV_NF <0 & LM_I_NF<0] <- 3
quadrant_NF[DV_NF >0 & LM_I_NF>0] <- 4 quadrant_NF[DV_NF
For rows that are not statistically significant, we will set to 0:
5]>signif] <- 0 quadrant_NF[localMI_NF[,
Finally we will add our vector quadrant_NF to our main sf data object nga_wp_26391. We then follow this with a tmap plot taking note to use the style of category this time.
In addition, we will make use of the earlier plotted local Moran’s I map for comparison, we do this by calling the function tamp_arrange:
$quadrant_NF <- quadrant_NF
nga_wp_26391<- c("#ffffff", "#2c7bb6", "#abd9e9", "#fdae61", "#d7191c")
colors <- c("insignificant", "low-low", "low-high", "high-low", "high-high")
clusters
<- tm_shape(nga_wp_26391) +
LISA_NF tm_fill(col = "quadrant_NF",
style = "cat",
palette = colors[c(sort(unique(quadrant_NF)))+1],
labels = clusters[c(sort(unique(quadrant_NF)))+1],
popup.vars = c("")) +
tm_borders(alpha=0.5) +
tm_layout(main.title = "Plot of LISA Non Functional WP",
main.title.position = "center",
main.title.size = 1,
legend.position = c("right", "bottom"),
frame = TRUE)
tm_view(set.zoom.limits = c(11,17)) #This line is not needed since not using interactive
$tm_layout
$tm_layout$set.zoom.limits
[1] 11 17
$tm_layout$style
[1] NA
attr(,"class")
[1] "tm"
tmap_arrange(LISA_NF, localMI.map_NF, asp=1, ncol=2)
Variable(s) "lc_Ii_NF" contains positive and negative values, so midpoint is set to 0. Set midpoint = NA to show the full spectrum of the color palette.
From this LISA plot above, we can see a clear relationship between the local Moran’s I plot earlier and the newly plotted LISA plot. We can see 5 hots spots in red, these 5 regions correspond to areas with high local Moran’s I value.
LISA: Functional Water Point
Same as above, we create a vector:
<- vector(mode="numeric",length=nrow(localMI_F)) quadrant_F
We check the earlier calculated lag values:
$lag_F[1:20] nga_wp_26391
[1] 20.50794 20.41935 0.00000 59.10000 18.19643 93.80952 37.36842 43.56250
[9] 61.03846 17.04688 25.31818 58.63636 41.53846 50.29167 35.48148 63.18919
[17] 67.73529 48.44828 40.53659 87.85714
We combine the earlier steps into a single chunk of code:
<- nga_wp_26391$`wpt functional` - mean(nga_wp_26391$`wpt functional`)
DV_F <- localMI_F[,1] - mean(localMI_F[,1])
LM_I_F <0 & LM_I_F>0] <- 1
quadrant_F[DV_F >0 & LM_I_F<0] <- 2
quadrant_F[DV_F <0 & LM_I_F<0] <- 3
quadrant_F[DV_F >0 & LM_I_F>0] <- 4
quadrant_F[DV_F 5]>signif] <- 0 quadrant_F[localMI_F[,
Once again, we will plot the value:
$quadrant_F <- quadrant_F
nga_wp_26391<- c("#ffffff", "#2c7bb6", "#abd9e9", "#fdae61", "#d7191c")
colors <- c("insignificant", "low-low", "low-high", "high-low", "high-high")
clusters
<- tm_shape(nga_wp_26391) +
LISA_F tm_fill(col = "quadrant_F",
style = "cat",
palette = colors[c(sort(unique(quadrant_F)))+1],
labels = clusters[c(sort(unique(quadrant_F)))+1],
popup.vars = c("")) +
tm_borders(alpha=0.5) +
tm_layout(main.title = "Plot of LISA Functional WP",
main.title.position = "center",
main.title.size = 1,
legend.position = c("right", "bottom"),
frame = TRUE)
tm_view(set.zoom.limits = c(11,17)) #This line is not needed since not using interactive
$tm_layout
$tm_layout$set.zoom.limits
[1] 11 17
$tm_layout$style
[1] NA
attr(,"class")
[1] "tm"
#
tmap_arrange(LISA_F, localMI.map_F, asp=1, ncol=2)
Variable(s) "lc_Ii_F" contains positive and negative values, so midpoint is set to 0. Set midpoint = NA to show the full spectrum of the color palette.
Again, we see that the hotpsots from the LISA map matches the regions of high local Moran’s I for the functional water points.
Step 14: Hot spot and Cold Spot with Getis & Ord’s G-Statistics G*
In the earlier steps, the Local Moran’s I help us to get a feel of the spatial correlation and clustering behavior. This is normally calculated with the value of the polygon itself excluded.
However sometimes besides clustering, it is also useful to know the hot spot and cold spot. This is because 2 groups of polygons can exhibit the same intensity of clustering, but the intensity of their value is different.
Getis & Ord thus swoops in to save the day. This is also another local measure of spatial correlation but this time it takes into account the value inside the polygon it self. Thus a polygon with a high intensity value will be now less affected by neighboring polygon’s with less intensive values.
One of the key requirements to calculate the G* will be to obtain the distance based weight matrix. We have prepared this matrix earlier in step 8.
Non-Functional water point G*
We compute the G* statistics using localG() function from spdep:
<- localG(nga_wp_26391$`wpt non-functional`, swm_d72)
gi_NF head(gi_NF)
[1] -3.4117747 -3.4325327 -1.5896655 0.1024036 -1.4372021 2.8392431
Next we will append the calculated values to the main sf dataframe:
$gi_NF <- gi_NF nga_wp_26391
Lastly, we will use the tmap package to plot the G* statistics along side the non-functioning water point. Here we will use the “jenks” style as it group values that are most similar together:
<- tm_shape(nga_wp_26391) +
gdppc_NF tm_fill(col = "wpt non-functional",
style = "jenks",
title = "No. of wpt non-functional") +
tm_borders(alpha = 0.5)+
tm_layout(main.title = "wpt non-functional",
main.title.position = "center",
main.title.size = 1,
legend.position = c("right", "bottom"),
frame = TRUE)
<-tm_shape(nga_wp_26391) +
Gimap_NF tm_fill(col = "gi_NF",
style = "jenks",
palette="-RdBu",
title = "local G*") +
tm_borders(alpha = 0.5)+
tm_layout(main.title = "wpt non-functional G*",
main.title.position = "center",
main.title.size = 1,
legend.position = c("right", "bottom"),
frame = TRUE)
tmap_arrange(gdppc_NF, Gimap_NF, asp=1, ncol=2)
Variable(s) "gi_NF" contains positive and negative values, so midpoint is set to 0. Set midpoint = NA to show the full spectrum of the color palette.
When compared to the plot on the number of non-functioning waterpoints, we see that the G* captures the hotspot patterns greatly. In addition, it also pinpointed the region of cold spots on the map.
Functional water point G*
Here we will perform the same steps as above into 1 code chunk:
<- localG(nga_wp_26391$`wpt functional`, swm_d72)
gi_F $gi_F <- gi_F nga_wp_26391
We till then plot the G* for Functional water point as follow:
<- tm_shape(nga_wp_26391) +
gdppc_F tm_fill(col = "wpt functional",
style = "jenks",
title = "No. of wpt functional") +
tm_borders(alpha = 0.5)+
tm_layout(main.title = "wpt functional",
main.title.position = "center",
main.title.size = 1,
legend.position = c("right", "bottom"),
frame = TRUE)
<-tm_shape(nga_wp_26391) +
Gimap_F tm_fill(col = "gi_F",
style = "jenks",
palette="-RdBu",
title = "local G*") +
tm_borders(alpha = 0.5)+
tm_layout(main.title = "wpt functional G*",
main.title.position = "center",
main.title.size = 1,
legend.position = c("right", "bottom"),
frame = TRUE)
tmap_arrange(gdppc_F, Gimap_F, asp=1, ncol=2)
Variable(s) "gi_F" contains positive and negative values, so midpoint is set to 0. Set midpoint = NA to show the full spectrum of the color palette.
From this plot, we can see a clear hotspot at the top of the map. Interestingly we also observe a outer area with lower hotspot intensity.