Abstract

This study is a reproduction of one part of:

Lyimo, E. H., Mayengo, G. M., Castico, D. J., Nguma, D., Martin, E. H., Hariohay, K. M., Kisingo, A. W., Lucas, J., Kimambo, N., & Holler, J. (2023). Makuyuni Wildlife Corridor: Analysis of the Effects of Socioecological Interactions and Changing Land Use on Movement Patterns of Large Mammal Species [Draft]. https://docs.google.com/document/d/1-RP3a5NUJcFMskoWzuZzMz4x-JI5DeKz/.

Specifically, this reproduction study is mostly concerned with the work of Justin Lucas, who added “extension data” and results to the original study. Lucas produced a least cost surface, an accumulated cost surface, and a wildlife corridor visualization for Makuyuni Wildlife Corridor in Tanzania, the study area of Lyimo et al. (2023). To do so, he built a “least cost model” in QGIS Model Builder. We aimed to reproduce the least cost surface portion of the extension results in R by following the QGIS model workflow as closely as possible. We found that while we were able to produce a least cost raster in R, the cost values did not exactly match with those in the raster by Lucas. This discrepancy may be attributed to differences in computational environment and may not carry a simple solution. We hope that this reproduction study is able to provide the start of a translation of the results of the original study to R.

Study metadata

Original study spatio-temporal metadata

  • Spatial Coverage: Makuyuni Wildlife Corridor, Tanzania
  • Spatial Resolution: 10 meter
  • Spatial Reference System: EPSG 32736
  • Temporal Coverage: 2023
  • Temporal Resolution: Not applicable

Study design

This is a reproduction study of Lyimo et al. (2023), an unpublished manuscript in which the authors create and analyze least cost pathways for wildlife for Makuyuni Wildlife Corridor in Tanzania.

The original study is observational in that the authors use a model to estimate the cost of wildlife movement and establish transects from which to observe wildlife presence and location for model evaluation. The authors do not indicate modifications of the environment in the study area nor delineation of experimental groups for animal individuals (FIU 2023).

One of the main portions of the workflow for Lyimo et al. (2023) is a least cost model created in QGIS by Justin Lucas, one of the study authors. This model produces a least cost surface by incorporating assigned costs for major roads, secondary roads, land cover, proximity to buildings, and the location of a buffer zone around the study area. The model frequently involves reprojection, rasterization, and reclassification of input files, which mostly contain vector data. One of the final steps for the creation of the least cost surface is a raster calculator function which uses the 5 types of data above to produce cost values. While the model includes information on start and end regions, we did not find it to contain steps to produce an accumulated cost raster or least cost pathways, which are features of the Lyimo et al. (2023). We chose to focus on reproducing only the least cost surface raster.

Both authors of the reproduction study worked together to reproduce the QGIS model steps in RStudio. After difficulties staying within one package, we opted to use whichever spatial analysis packages would allow for a close reproduction of the QGIS functions. After Smith and Holler (2023), our hypothesis for this reproduction study is that the reproduced least cost surface raster (created in R) and the output of the QGIS model will not be identical. We will evaluate the differences between these rasters through map algebra. In other words, an output raster created from the subtraction of our R-based raster from the original QGIS-based raster will contain values other than 0, indicating that there are differences in cost values between the rasters.

Materials and procedure

Dr. Joseph Holler, one of the study authors, provided us access to a Google shared drive which contained relevant files for the least cost model. One important document in this environment was the “content directory”, in which Lucas indicated which files in the “extension data” folder served as inputs to the QGIS model. We downloaded files to desktop computers which had QGIS version 13.28 installed and ran the model to evaluate its outputs. We determined the least cost raster output to be the same raster as the Google Drive file entitled “DD_Cost_6.tif”. This was thus the file we aimed to reproduce.

RStudio analysis was conducted on a 2020 Macbook Air running macOS Ventura version 13.4. After tests with the raster and stars packages, we concluded that we would pursue a reproduction of the raster using both of these packages. Steps in the workflow below are intended to follow the QGIS model steps. We indicate any corresponding steps before each code section.

Computational environment

Data and variables

Most input data sources for the study are secondary. The least cost output of the QGIS model is primary in that it was created by Lucas.

Least cost raster (DD_Cost_6.tif)

  • Abstract: This is one of the outputs of the QGIS model created by Justin Lucas. It is a least cost raster of the
  • Spatial Coverage: The data source pertains to the study site and a buffer zone surrounding the site.
  • Spatial Resolution: 10 m
  • Spatial Reference System: EPSG:3857 - WGS 84 / Pseudo-Mercator
  • Temporal Coverage: 2023
  • Temporal Resolution: Not applicable
  • Lineage: This file is equivalent to the output of the QGIS model. The model included on the Google Drive as “finalmodel.model3”. Running the model will produce a file named “DDclipped_cost”. We assume that one instance of this file was named “DD_Cost_6” at some point before being uploaded to the Google Drive, since cost values are identical in both files.
  • Distribution: This was shared by the original authors.
  • Constraints: Unknown
  • Data Quality: No planned quality assessment.
  • Variables: The band shows the calculated least cost for an animal to travel across a given pixel.

Secondary data sources for the study are delineated below.

Landcover (landcover.tif)

  • Abstract: This is a land cover raster for the entirety of Tanzania.
  • Spatial Coverage: The data source pertains to the whole country.
  • Spatial Resolution: 4.77 m
  • Spatial Reference System: EPSG:3857 - WGS 84 / Pseudo-Mercator
  • Temporal Coverage: 2023
  • Temporal Resolution: Not applicable
  • Lineage: Sourced from the following paper: Song, L., Estes, A. B., & Estes, L. D. (2023). A super-ensemble approach to map land cover types with high resolution over data-sparse African savanna landscapes. International Journal of Applied Earth Observation and Geoinformation, 116, 103152. https://doi.org/10.1016/j.jag.2022.103152
  • Distribution: This was shared by the original authors.
  • Constraints: Unknown
  • Data Quality: No planned quality assessment.
  • Variables: Band 1 is land cover. The classification scheme is established by Song et al. (2023).
    • 1 (orange): cropland
    • 2 (dark green): forest/dense tree
    • 3 (light green): grassland
    • 4 (green): shrubland
    • 5 (blue): water
    • 6 (gray): built-up
    • 7 (tan): bareland
    • 8 (teal): wetland

Bounding box (bounding_box.shp)

  • Abstract: This is a shapefile which was “used to create [a] buffer to prevent edge effects”, per Lucas. It extends past the boundaries of adjusted_studysite when the shapefiles are visualized together.
  • Spatial Coverage: The data source pertains to land surrounding Makuyuni Wildlife Corridor and the corridor itself.
  • Spatial Resolution: Unknown
  • Spatial Extent: 833295.2141958434367552,9585634.2357019279152155 : 860913.8547548535279930,9617039.8727052193135023
  • Spatial Reference System: EPSG:32736 - WGS 84 / UTM zone 36S
  • Temporal Coverage: Unknown, presumed 2023
  • Temporal Resolution: Not applicable
  • Lineage: Sourced from the shared data from the original authors, presumed to be namely Justin Lucas.
  • Distribution: This was shared by the original authors.
  • Constraints: Unknown
  • Data Quality: No planned quality assessment.
  • Variables: The shapefile is filled with a Simple Fill and has no land cover data.

Pixel size

  • Abstract: This is a variable within the QGIS model which is used in specifying resolution when rasterizing vector data. It is stated to have a default value of 70 (presumed meters), but we noted that the text of the original study stated that resolution was 10x10 meters. We therefore did not use this input but are recording its presence for transparency.
  • Temporal Coverage: Unknown, presumed 2023
  • Temporal Resolution: Not applicable
  • Lineage: Sourced from the shared data from the original authors, presumed to be namely Justin Lucas.
  • Distribution: This was shared by the original authors.
  • Constraints: Unknown
  • Data Quality: No planned quality assessment.
  • Variables: Pixel size has a default value of 70 meters and maximum value of 100 meters.

Initial clip (initial_clip.shp)

  • Abstract: Indicated to be the combination of Microsoft AI building generation and OpenStreetMap buildings.
  • Spatial Coverage: The study site.
  • Spatial Extent: 24.9389406999999999,-33.9686420999999967 : 36.7868238000000005,-1.4968789000000000
  • Spatial Reference System: EPSG:4326 - WGS 84
  • Temporal Coverage: Unknown, presumed 2023
  • Temporal Resolution: Not applicable
  • Lineage: Sourced from the shared data from the original authors, presumed to be namely Justin Lucas.
  • Distribution: This was shared by the original authors.
  • Constraints: Unknown
  • Data Quality: No planned quality assessment.

Secondary roads (secondary_roads.shp)

  • Abstract: This is a shapefile which is described as containing “all tracks and roads that were within the study area aside from the two major highways” by Justin Lucas. The shapefile contains line geometries and has 176 total features.
  • Spatial Coverage: The data source pertains to land surrounding Makuyuni Wildlife Corridor and the corridor itself.
  • Spatial Resolution: Unknown
  • Spatial Extent: 4002985.2809690786525607,-417528.5057486270670779 : 4055363.5504161105491221,-386600.4248493338818662
  • Spatial Reference System: EPSG:3857 - WGS 84 / Pseudo-Mercator
  • Temporal Coverage: Unknown, presumed 2023
  • Temporal Resolution: Not applicable
  • Lineage: Sourced from the shared data from the original authors, presumed to be namely Justin Lucas. Lucas states that this shapefile was sourced from OpenStreetMap.
  • Distribution: This was shared by the original authors.
  • Constraints: Unknown
  • Data Quality: No planned quality assessment.

Major roads (major_roads.shp)

  • Abstract: This is a shapefile which contains “two highways that run through the study area”, per Justin Lucas.
  • Spatial Coverage: The data source pertains to the entirety of the extent of the highways, which extend far past the study site and outside of Tanzania to the north and south.
  • Spatial Extent: 24.9389406999999999,-33.9686420999999967 : 36.7868238000000005,-1.4968789000000000
  • Spatial Reference System: EPSG:4326 - WGS 84
  • Temporal Coverage: Unknown, presumed 2023
  • Temporal Resolution: Not applicable
  • Lineage: Sourced from the shared data from the original authors, presumed to be namely Justin Lucas. Lucas states that this shapefile is sourced from OpenStreetMap.
  • Distribution: This was shared by the original authors.
  • Constraints: Unknown
  • Data Quality: No planned quality assessment.

Buildings (merged_buildings.shp)

  • Abstract: Indicated to be the combination of Microsoft AI building generation and OpenStreetMap buildings.
  • Spatial Coverage: The study site.
  • Spatial Extent: 24.9389406999999999,-33.9686420999999967 : 36.7868238000000005,-1.4968789000000000
  • Spatial Reference System: EPSG:4326 - WGS 84
  • Temporal Coverage: Unknown, presumed 2023
  • Temporal Resolution: Not applicable
  • Lineage: Sourced from the shared data from the original authors, presumed to be namely Justin Lucas.
  • Distribution: This was shared by the original authors.
  • Constraints: Unknown
  • Data Quality: No planned quality assessment.

Bomas (bomas.shp)

  • Abstract: This is a shapefile which ostensibly contains locations of bomas, or livestock enclosures, within the wildlife corridor. It is also referred to with the term “rural”.
  • Spatial Coverage: The data source pertains to land within the wildlife corridor.
  • Spatial Extent: 4007605.1797135984525084,-412580.2386067652842030 : 4032954.7106412472203374,-387445.8448925596894696
  • Spatial Reference System: EPSG:3857 - WGS 84 / Pseudo-Mercator
  • Temporal Coverage: Unknown, presumed 2023
  • Temporal Resolution: Not applicable
  • Lineage: Sourced from the shared data from the original authors, presumed to be namely Justin Lucas.
  • Distribution: This was shared by the original authors.
  • Constraints: Unknown
  • Data Quality: No planned quality assessment.

Note: while “start_region.shp” and “end_region.shp” are included as inputs to the QGIS model, we did not work with these files due to their relevance to an accumulated cost surface raster rather than a least cost surface raster (i.e., there do not need to be defined start and end regions for a determination of cost). As mentioned above, the QGIS model does not contain all steps to create such an output.

Workflow

  1. Clip landcover by extent of bounding box.

Inputs: “landcover”, “bounding_box” Outputs: unnamed intermediate output “lc_study_clip” Corresponding QGIS workflow function: Clip landcover by study site

Planned deviation: In the QGIS model, the landcover input file is clipped to the initial clip file before being reprojected and subsequently clipped to the extent of the bounding box file. We judged that we could avoid clipping twice by clipping to the extent of the bounds before reprojecting.

# Read in landcover .tif file under the raster package
landcover <- here("data", "raw", "public", "landcover (1).tif")  %>% raster()

# Read in bounding_box shapefile under the stars package
bounds <- here("data", "raw", "public", "bounding_box", "bounding_box.shp") %>% st_read()
## Reading layer `bounding_box' from data source 
##   `/Users/a/Documents/GitHub/geog323-wildlife-corridor/data/raw/public/bounding_box/bounding_box.shp' 
##   using driver `ESRI Shapefile'
## Simple feature collection with 1 feature and 7 fields
## Geometry type: POLYGON
## Dimension:     XY
## Bounding box:  xmin: 833295.2 ymin: 9585634 xmax: 860913.9 ymax: 9617040
## Projected CRS: WGS 84 / UTM zone 36S
# Change the CRS of the bounds to that of landcover to enable clipping
bounds <- bounds %>% st_transform(proj4string(landcover)) 

# Clip landcover to bounds extent
lc_study_clip <- mask(crop(landcover, extent(bounds)), bounds)
  1. Warp (reproject) landcover.

Inputs: lc_study_clip Outputs: unnamed intermediate output “initial_raster_clip_proj” Corresponding QGIS workflow function: Warp (reproject)

Unplanned deviation: The landcover raster was resampled to a constructed raster with a resolution of 10x10 m to account for the pixel size input. Within the QGIS workflow, the pixel size has a minimum value of 70 and a maximum value of 100 (assumed units are meters). In contrast, the manuscript states that rasters had a resolution of 10x10 m. Pixel size is a variable whose value can be altered when running the least cost model, and it is assumed that Lucas used a value of 10 when creating the least cost raster contained within the “DD_Cost_6” file (Prof. Holler, personal communication, 12/11/2023).

lc_study_clip_proj <- projectRaster(lc_study_clip, crs=32736)

# Get resolution
#res(lc_study_clip_proj)

# Create new raster with default pixel size value of 10
pixel_default <- raster(resolution=c(10, 10), crs=proj4string(lc_study_clip_proj), ext=extent(lc_study_clip_proj))

# Resample to change the resolution to 10x10
lc_study_clip_proj <- resample(lc_study_clip_proj, pixel_default)

# Get new resolution
#res(lc_study_clip_proj)

#plot(lc_study_clip_proj)
  1. Reclassify landcover by table.

Inputs: “lc_study_clip_proj” Outputs: unnamed intermediate “lc_reclass” Corresponding QGIS workflow function: Reclassify landcover by table

Note: The workflow diagram on Google Drive contained a different reclassification table for landcover. The table in QGIS was instead used as a reference due to its greater relevance to the behavior of the model.

# Create 3x10 table of values with which to reclassify
# Data source: the QGIS workflow

# Create columns of the data table
c1 <- c(0,1,2,3,4,5,6,7,8,9)
c2 <- c(0,2,3,4,5,6,7,8,9,255)
c3 <- c(0,0.75,0.01,0.25,0.01,0.15,1,0.1,0.15,0)

# Create a matrix
reclass_table <- as.matrix(data.frame(c1, c2, c3))

# Reclassify raster with right=F
lc_reclass <- reclassify(lc_study_clip_proj, reclass_table, right=F)

4a. Reproject secondary roads to CRS=32736.

Inputs: secondary_roads Outputs: unnamed intermediate “secondary_roads_proj” Corresponding QGIS workflow function: Reproject Secondary Roads

# Read in shapefile under stars package
secondary_roads <- here("data", "raw", "public", "secondary_roads", "secondary_roads.shp") %>% st_read()
## Reading layer `secondary_roads' from data source 
##   `/Users/a/Documents/GitHub/geog323-wildlife-corridor/data/raw/public/secondary_roads/secondary_roads.shp' 
##   using driver `ESRI Shapefile'
## Simple feature collection with 176 features and 19 fields
## Geometry type: LINESTRING
## Dimension:     XY
## Bounding box:  xmin: 4002985 ymin: -417528.5 xmax: 4055364 ymax: -386600.4
## Projected CRS: WGS 84 / Pseudo-Mercator
# Reproject
secondary_roads_proj <- st_transform(secondary_roads, crs=32736) %>% st_crop(st_bbox(lc_study_clip_proj))

4b. Rasterize secondary roads.

Inputs: “secondary_roads_proj” Outputs: unnamed intermediate “secondary_roads_raster” Corresponding QGIS workflow function: Rasterize secondary roads

Unplanned deviation: As the function st_rasterize changed the resolution of the secondary_roads object to over 100x100, it was deemed necessary to create a blank stars raster (“blank”) with a resolution of 10x10 to use as a template within the function. The code used to create the template and rasterize using it is sourced from Holler et al. (2023).

# Create blank stars raster template
b <- st_bbox(c(xmin=833170.1, xmax=861050.1, ymin=9585512.5, ymax=9617152.5), crs=st_crs(32736)) %>% st_as_sfc()
blank <- st_as_stars(b, dx=10, dy=10)
blank[[1]][] = NA

# Rasterize in stars package
secondary_roads_raster <- st_rasterize(secondary_roads_proj, blank)

# Use st_as_raster function to create a raster object
# This function is sourced from user cbrown5 on GitHub

# Load in cbrown5's function
st_as_raster <- function(rstars){
  rext <- st_bbox(rstars)
  raster(t(rstars[[1]]), xmn = rext[1], xmx = rext[3],
                   ymn = rext[2], ymx=rext[4],
                   crs = st_crs(rstars)$proj4string)
}

# Apply function to stars raster
secondary_roads_raster <- st_as_raster(secondary_roads_raster)

# Clip raster to extent of study landcover
secondary_roads_raster <- setExtent(secondary_roads_raster, extent(lc_study_clip_proj))

# Resample raster in order to match the dimensions of landcover
secondary_roads_raster <- resample(secondary_roads_raster, lc_study_clip_proj)

# Reclassify values to achieve a binary raster
secondary_roads_raster <- secondary_roads_raster %>% 
  reclassify(cbind(NA, 0)) %>% 
  reclassify(cbind(1, Inf, 1))

5a. Reproject major roads.

Inputs: “major_roads_vector” Outputs: unnamed intermediate “major_roads_proj” Corresponding QGIS workflow function: Reproject major roads

# Read in major_roads shapefile under stars
major_roads <- here("data", "raw", "public", "major_roads_vector", "major_roads_vector.shp") %>% st_read()
## Reading layer `major_roads_vector' from data source 
##   `/Users/a/Documents/GitHub/geog323-wildlife-corridor/data/raw/public/major_roads_vector/major_roads_vector.shp' 
##   using driver `ESRI Shapefile'
## Simple feature collection with 4 features and 11 fields
## Geometry type: MULTILINESTRING
## Dimension:     XY
## Bounding box:  xmin: 24.93894 ymin: -33.96864 xmax: 36.78682 ymax: -1.496879
## Geodetic CRS:  WGS 84
# Reproject 
major_roads_proj <- major_roads %>% st_transform(crs=32736) %>% st_crop(st_bbox(lc_study_clip_proj))

5b. Rasterize and buffer major roads.

Inputs: “major_roads_buffer” Outputs: unnamed intermediate “major_roads_raster” Corresponding QGIS workflow function: Buffer; Rasterize major roads

# Rasterize stars object with blank template
major_roads_raster <- st_rasterize(major_roads_proj, blank)

# Convert to raster package object
major_roads_raster <- st_as_raster(major_roads_raster)

# Buffer by 300 meters
major_roads_buffer <- buffer(major_roads_raster, width=300)

# Resample to match dimensions of landcover
major_roads_raster_proj <- resample(major_roads_buffer, lc_study_clip_proj)

# Reclassify to achieve binary raster
major_roads_raster_proj <- major_roads_raster_proj %>% 
  reclassify(cbind(NA, 0)) %>% 
  reclassify(cbind(1, Inf, 1))

6a. Reproject rural data (bomas).

Inputs: “Bomas” Outputs: unnamed intermediate “bomas_proj” Corresponding QGIS workflow function: Reproject Rural

# Read in bomas shapefile under stars
bomas <- here("data", "raw", "public", "Bomas", "Bomas.shp") %>% st_read()
## Reading layer `Bomas' from data source 
##   `/Users/a/Documents/GitHub/geog323-wildlife-corridor/data/raw/public/Bomas/Bomas.shp' 
##   using driver `ESRI Shapefile'
## Simple feature collection with 1525 features and 1 field
## Geometry type: POLYGON
## Dimension:     XY
## Bounding box:  xmin: 4007605 ymin: -412580.2 xmax: 4032955 ymax: -387445.8
## Projected CRS: WGS 84 / Pseudo-Mercator
# Reproject and fix invalid geometries
bomas_proj <- bomas %>% st_transform(crs=32736) %>% st_make_valid() %>% st_crop(st_bbox(lc_study_clip_proj))

6b. Reproject buildings.

Inputs: “merged_buildings” Outputs: unnamed intermediate “buildings_proj” Corresponding QGIS workflow function: Reproject Buildings

# Read in buildings shapefile under stars
buildings <- here("data", "raw", "public", "merged_buildings", "merged_buildings.shp") %>% st_read()
## Reading layer `merged_buildings' from data source 
##   `/Users/a/Documents/GitHub/geog323-wildlife-corridor/data/raw/public/merged_buildings/merged_buildings.shp' 
##   using driver `ESRI Shapefile'
## Simple feature collection with 9775 features and 12 fields
## Geometry type: POLYGON
## Dimension:     XY
## Bounding box:  xmin: 4007554 ymin: -414103 xmax: 4033053 ymax: -387288.1
## Projected CRS: WGS 84 / Pseudo-Mercator
# Reproject
buildings_proj <- buildings %>% st_transform(crs=32736) %>% st_crop(st_bbox(lc_study_clip_proj))

6c. Merge vector layers.

Note: the following concept was inspired by feedback from Dr. Joseph Holler.

# Limit columns to just geometry
bomas_proj <- bomas_proj[,2]
buildings_proj <- buildings_proj[,c(13)]

# Bind rows together
bbmerge <- bind_rows(bomas_proj, buildings_proj)

6d. Rasterize the merged layer.

# Rasterize stars object with template
# Note: the following line was inspired by feedback from Dr. Joseph Holler.
buildings_merged_raster <- st_rasterize(bbmerge, blank)

# Convert to raster package object
buildings_merged_raster <- st_as_raster(buildings_merged_raster)

# Resample to match resolution of landcover
buildings_merged_raster <- resample(buildings_merged_raster, lc_study_clip_proj)

# Clip to extent of study landcover
buildings_merged_raster <- setExtent(buildings_merged_raster, extent(lc_study_clip_proj))

6e. Create a proximity raster.

Inputs: “buildings_merged_raster” Outputs: unnamed intermediate “buildings_prox” Corresponding QGIS workflow function: Proximity raster (SAGA function)

buildings_prox <- distance(buildings_merged_raster)

6g. Reclassify proximity raster by table.

Inputs: “buildings_prox” Outputs: unnamed intermediate “buildings_prox_reclass” Corresponding QGIS workflow function: Reclassify by table

# Create 6x3 table of values with which to reclassify
# Data source: the QGIS workflow

# Create columns
c1_prox <- c(0,1,50,100,150,200)
c2_prox <- c(1,50,100,150,200,Inf)
c3_prox <- c(1,0.8,0.6,0.4,0.2,0)

# Create matrix using columns
reclass_table_prox <- as.matrix(data.frame(c1_prox, c2_prox, c3_prox))

# Reclassify raster
buildings_prox_reclass <- reclassify(buildings_prox, reclass_table_prox, right=F)

#plot(buildings_prox_reclass)
  1. Rasterize buffer zone.

Inputs: “buffer_region” Outputs: unnamed intermediate “buffer_raster” Corresponding QGIS workflow function: Rasterize buffer zone

# Read in shapefile under stars
buffer <- here("data", "raw", "public", "buffer_region", "buffer_region.shp") %>% st_read()
## Reading layer `buffer_region' from data source 
##   `/Users/a/Documents/GitHub/geog323-wildlife-corridor/data/raw/public/buffer_region/buffer_region.shp' 
##   using driver `ESRI Shapefile'
## Simple feature collection with 1 feature and 7 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: 833295.2 ymin: 9585634 xmax: 860913.9 ymax: 9617040
## Projected CRS: WGS 84 / UTM zone 36S
# Reproject
buffer_proj <- buffer %>% st_transform(crs=32736) %>% st_crop(st_bbox(lc_study_clip_proj))

# rasterize with stars
buffer_raster <- st_rasterize(buffer_proj, blank)

# convert to raster object
buffer_raster <- st_as_raster(buffer_raster)

# Resample to resolution of landcover
buffer_raster <- resample(buffer_raster, lc_study_clip_proj)

# Clip to extent of study landcover and reclassify 
buffer_raster <- buffer_raster %>% 
  setExtent(extent(lc_study_clip_proj)) %>%
  reclassify(cbind(1, Inf, 1)) %>%
  reclassify(cbind(NA, 0))
  1. Compute cost surface.

Inputs: “major_roads_raster” [A], “secondary_roads_raster” [B], “buildings_prox_reclass” [C], “lc_reclass” [D], “buffer_raster” [E] Outputs: unnamed intermediate “cost_unclip” Corresponding QGIS workflow function: Raster calculator cost

A <- major_roads_raster_proj
B <- secondary_roads_raster
C <- buildings_prox_reclass
D <- lc_reclass
E <- buffer_raster

cost_new <- (A+(0.25*B)+C+D+E)/4
  1. Compare cost surface with output from original extension study.

Inputs: “DD_Cost_6”, “DDclipped_cost” Output: “diff” Corresponding QGIS workflow function: N/A

# Read in "DD_Cost_6.tif", the presumed cost raster output of the QGIS workflow, under raster
cost_original <- here("data", "derived", "public", "DD_Cost_6.tif") %>% raster()

# Resample to appropriate resolution
cost_original <- resample(cost_original, pixel_default)

# Resample cost raster to appropriate resolution
cost_new <- resample(cost_new, pixel_default)

# Overlay the rasters and subtract the reproduction cost raster from the original cost raster
diff <- overlay(cost_original, cost_new, fun=function(cost_original, cost_new){return(cost_original-cost_new)})

Figure 1

plot(cost_new)

10a. Visualize the difference raster using tmap.

# Set tmap mode
tmap_mode("view")

# Create tmap based on ggplot above
cost_diff_tmap_buffer <- tm_shape(diff) + tm_raster(
    palette = "Oranges",
    style = "fixed",
    breaks = c(-1, 0, 1, 500),
    #labels = c("-1", "0", "1", "500"),
    midpoint = NA,
    title = "Vulnerability: Original - Reproduction",
    legend.show=T
)

10b. Make buffer zone values NA to visualize cost values more simply; plot more meaningful tmap.

# Reclassify values above 1 to NA
diff_nobuffer <- diff %>% 
  reclassify(cbind(1, Inf, NA))

# Convert to stars to see descriptive statistics
cost_diff_stars <- st_as_stars(diff_nobuffer)

cost_diff_stars
## stars object with 2 dimensions and 1 attribute
## attribute(s), summary of first 1e+05 cells:
##         Min.       1st Qu.        Median        Mean       3rd Qu.      Max.
## layer  -0.31 -5.587935e-11 -6.938894e-18 0.002889722 -6.938894e-18 0.3173826
##         NA's
## layer  72754
## dimension(s):
##   from   to  offset delta                       refsys x/y
## x    1 2788  833170    10 +proj=utm +zone=36 +south... [x]
## y    1 3164 9617153   -10 +proj=utm +zone=36 +south... [y]
# Set tmap mode
tmap_mode("view")

# Create tm_raster
cost_diff_tmap <- tm_shape(diff_nobuffer) + tm_raster(
    palette = "PuOr",
    style = "cont",
    breaks = c(-1, 0, 1),
    #labels = c("-1", "0", "1", "500"),
    midpoint = 0,
    title = "Least Cost: Original - Reproduction",
    legend.show=T
)

Figure 2

cost_diff_tmap

Results

One result of this workflow is the creation of a cost surface raster (Figure 1; cost_new) for the Makuyuni Wildlife Corridor study area as defined in the original study. This output is intended to display the unitless cost for a large mammal to travel across a given pixel of the study area. For ease of viewing the range of non-buffer zone cost values in cost_new, we reclassified the buffer zone so that values within the buffer had values of 1 instead of 999, the value in the original workflow. With the high values of the buffer zone removed, we can visualize the range of the new raster more easily. Cost values outside the buffer zone range from 0.0025 to 0.7625 (the exact maximum was determined via visualization in QGIS, but the R scale may also be referenced). Although buffer zone values did not end up as all being 1 due to the raster calculation needed compute the least cost surface, it is still quite obvious where the buffer zone boundaries lie; there is almost always a sharp transition from the buffer zone to the interior. The buffer zone can be ignored for the subsequent summary of cost values. One can interpret the cost raster through the plots of other landscape fixtures such as major roads, secondary roads, and buildings. Cost is clearly higher on major roads and the areas closest to buildings, which is to be expected based on the concept that these elements pose a greater challenge for animal traversal than undeveloped land. The very highest cost values appear to be located in an area consisting of the overlap of the major roads buffer and a group of buildings; this location is found near the intersection of the two major road segments in the northeast of the study area. The lowest cost values accordingly appear for pixels which were classified as less costly during the landcover reclassification step of the workflow and which do not overlap with a road or building.

The difference raster (Figure 2; cost_diff_tmap), specifically one with buffer zone values reclassified, is the most salient output of the workflow and is more relevant for the discussion to follow. This raster is a comparison of the original and reproduction versions of the least cost raster. We removed the buffer zone values from the final version of the visualization, for they are mostly pertinent to an accumulated cost raster rather than a least cost raster. Figure 2 accordingly shows a raster without the buffer zone. Orange pixels contain difference values below 0 and represent locations where the cost is higher in the reproduction compared to the original raster (e.g., 0.5 - 0.7). Purple pixels contain difference values above 0 and represent locations where the cost is lower in the reproduction compared to the original raster (e.g., 0.9 - 0.2). White pixels represent areas where there was no (or very little) difference between the rasters. We observe that while some of the raster appears to show little to no difference, there are many distinct spots with positive difference values and seemingly fewer distinct spots with negative values. A review of inputs to the workflow suggests that the spots are aligned with buildings.

Discussion

The steps of QGIS workflow were possible to reproduce with some slight deviations. One limitation for reproducibility involves the pixel size input; it would be beneficial to have a version of the model where the pixel size is set at 10 meters, which is the value ultimately used in the analysis. It is unclear from looking at the model steps which value was input into the model in order to produce the cost raster. In addition, the reproducibility of the model could be improved via inclusion of more steps to produce the accumulated cost raster draft present in the manuscript. As the model exists now, start and end regions are rasterized as separate steps from the creation of the least cost raster, but cost pathways are not produced.

With respect to the main output of this workflow, we would ideally desire to see the difference raster be entirely composed of pixels with values of 0. That this is not the case establishes that the reproduction of the least cost raster in R did not create an identical raster to the output of the QGIS model. The results thus fail to reject the hypothesis set out in the Study Design section of this document. This outcome is somewhat troubling for an evaluation of the reproduction as one that was entirely successful, and it should thus be examined further.

As mentioned in the Results section, the pixels which have particularly low or high difference values appear to correspond to certain buildings. As a general rule, it would seem that for distinct spots of positive values, the reclassified building proximity raster created by the GIS model shows a building that is not present in the equivalent raster created in this workflow; there is a higher cost at these locations in the original raster. Similarly, spots of negative values would seem to correspond to areas where there is a building signature in the reclassified building proximity raster which is not present in the equivalent intermediate output of the QGIS function. This second class of phenomenon appears rarer than the first, which suggests that the QGIS workflow may have produced a reclassified building proximity layer with more buildings (and therefore more high-valued pixels) than the equivalent output in R.

In particular, the source of the error could be traced to the step in which the merged_buildings shapefile is rasterized. Based on an investigation of the intermediate outputs of the QGIS model, it is likely that there are some differences in what vector data is successfully translated to raster data. This hypothesized discrepancy ultimately leads to perceivable differences in the cost of traveling across many pixels in the study area. Differences with this particular step may be due to the computational environment in which the shapefile is rasterized; Dr. Holler, an author of the manuscript, implies that the differences could be addressed with extensive knowledge of GDAL commands which could be added as specifications in the QGIS model (Dr. Holler, personal communication, 12/13/2023). For this report, in which we believe we have reproduced the broad steps for rasterizing the buildings raster (e.g., using a raster template with the proper resolution), the specifics of why the discrepancy exists remain somewhat unclear.

Conclusion

As we were not able to produce an least cost raster in R which is identical to the output of the QGIS workflow, it would not be possible to term this reproduction study a full success. We do not characterize the difficulties experienced as a result of the buildings layer as the fault of the original workflow; rather, it appears that the issue could be considered related to the computational environment and the differences between the QGIS functions used in the model and the R functions which we selected. It is entirely possible that different approaches using R may produce a more similar cost raster (for example, with the terra package instead of the raster package), but we feel we have attempted the reproduction using clearly defined and legitimate steps.

We finally acknowledge the work of Justin Lucas in creating the QGIS model and celebrate his work to create a reproducible approach using a workflow diagram and content directory. Lucas documented inputs to the model thoroughly, which was a significant benefit to our analysis. We also thank Dr. Joseph Holler for his help with project issues.

Atallah, the author responsible for this version of the report, would also like to acknowledge co-author Lutz as having been a partner throughout project work. CRediT roles: Both authors fall under Conceptualization, Data Curation, Investigation, Methodology, Software, Validation, and Visualization. Atallah facilitated Supervision through communication with Dr. Holler and wrote the Abstract, Study Design, beginning of Materials and Procedure, all “deviation” annotations to the workflow, Results, Discussion, and Conclusion individually for this version of the report.

References

Lyimo, E. H., Mayengo, G. M., Castico, D. J., Nguma, D., Martin, E. H., Hariohay, K. M., Kisingo, A. W., Lucas, J., Kimambo, N., & Holler, J. (2023). Makuyuni Wildlife Corridor: Analysis of the Effects of Socioecological Interactions and Changing Land Use on Movement Patterns of Large Mammal Species [Draft].

Holler, J., Udoh, K., An-Pham, D., & Middlebury Open GIScience Classes. (2023). Reproduction Analysis of Malcomb et al 2014 [Unpublished].