Geographical detectors and 'GD' R Package: Q & A
The geographical detectors (GD) model is a powerful statistical method for spatial heterogeneity analysis and identifying the determinants of geographical phenomena. The 'GD' R package provides an efficient implementation of this model with comprehensive tools for spatial analysis.
R package "GD" Tutorial
Installation
Install the 'GD' R package from CRAN:
install.packages("GD")
library("GD")
Citation
"The OPGD model was performed using 'GD' R package (Song et al., 2020)."
Reference
Song, Y., Wang, J., Ge, Y. & Xu, C. (2020). "An optimal parameters-based geographical detector model enhances geographic characteristics of explanatory variables for spatial heterogeneity analysis: Cases with different types of spatial data." GIScience & Remote Sensing, 57(5), 593–610. DOI: 10.1080/15481603.2020.1760434
Q & A
Spatial unit should be defined before modelling because the number of spatial observations is affected by the spatial unit. According to Section 2.1 Optimal spatial data discretization in Song et al. (2021), "there are two applicable strategies based on the number of observations and practical requirements." For a relatively small data set, the recommended number of breaks is from 3 to 6 (or so). For a relatively large amount of data, the recommended break number is an integer sequence from 3 to 22 (or so), and the discretization method is the quantile break.
GD works for large datasets. According to the analysis in Section 5 Discussion in Song et al. (2020), "When the sample size reaches 1000, 10 000, 100 000, only 0.05 s, 0.14 s, and 1.55 s are used for simultaneous computation of four parts of geographical detectors by the GD package, respectively." If you cost too much time and fail to get results, please stop running and check the data and steps of modelling. If you try to apply GD to a very large dataset, please pay attention to the spatial distribution and density distribution of data, and spatial data discretization during modelling.
Sometimes because observation data contains "NA". Please remove NA before computation. If the issue still can't be solved, please compute spatial data discretization and GD models separately. If data doesn't contain NA, but the GD R package still can't return results, some explanatory variables may contain too many same value data. This will cause all observations in a spatial zone to be identical, the standard deviation is zero, and GD can't return results. If your case has this issue, you may address it through: (1) increasing the size of spatial units to reduce the number of observations; (2) using quantile breaks; and (3) trying to use higher resolution data. Please make sure that there are at least two observations with different values within each spatial zone. It is recommended to use "quantile breaks" to manually run spatial discretization to check the basic characteristics of variable data. The Geographically Optimal Zones-based Heterogeneity (GOZH) model is recommended if you would like to try the improved GD models (See Q4).
Currently, there are the following advanced GD models:
| Model | Description | Publication | Software |
|---|---|---|---|
| OPGD | Characterising spatial heterogeneity, identifying geographical factors and interactive impacts | Song et al. (2020) | R package "GD". Tutorial |
| IDSA | Estimating power of interactive determinants from spatial perspective | Song et al. (2021) | R package "IDSA" |
| GHM | Characterizing local and stratified heterogeneity, improving interpolation | Luo, Song*, et al. (2023) | |
| GOZH | Identifying determinants across large study areas using optimal zones and Ω-index | Luo, Song*, et al. (2022) | |
| RGD | Robust estimation of PD values | Zhang, Song*, et al. (2022) |
Review: Guo, J., Wang, J., Xu, C., & Song, Y. (2022). Modeling of spatial stratified heterogeneity. GIScience & Remote Sensing, 59(1), 1660–1677. Link
The outcomes are saved as a list of dataframe. Option 1: print results, copy to excel. Option 2: use "lapply" and "write.csv" to export to .csv files.
The primary reason is that the sum of Q values of individual variables is the nonlinear enhanced or weakened relations of variables.
Please use RStudio and click "last figure" button.
Drag the plotting space in RStudio to a larger area, then run the plotting codes.
The data set is not a "real data.frame". Use: data <- as.data.frame(data)
Advanced Models
The GD framework has been extended with several advanced models to address specific research needs:
OPGD (Optimal Parameters-based Geographical Detector)
An enhanced version of the original GD model that automatically optimizes discretization parameters to maximize the explanatory power of spatial factors. This model provides better accuracy in identifying spatial heterogeneity and is particularly effective for continuous variables.
IDSA (Interactive Detector for Spatial Associations)
Extends GD to explicitly model and quantify interactive effects between multiple spatial variables. This model is essential for understanding how different spatial factors interact to influence geographical phenomena.
GHM (Generalized Heterogeneity Model)
Characterizes both local and stratified heterogeneity patterns, improving spatial interpolation and prediction accuracy. Particularly useful for applications requiring localized heterogeneity assessment.
GOZH (Geographically Optimal Zones-based Heterogeneity)
Identifies spatial determinants across large study areas by automatically defining optimal spatial zones. The Ω-index provides a new perspective on stratified heterogeneity.
RGD (Robust Geographical Detector)
A robust version of GD that provides more reliable PD value estimates, particularly when dealing with outliers or non-normally distributed data.
References
- Song, Y., Wang, J.F., Ge, Y., et al. (2020). An optimal parameters-based geographical detector model enhances geographic characteristics of explanatory variables for spatial heterogeneity analysis. GIScience & Remote Sensing, 57(5), 593–610. Link
- Wang, J.F., Li, X.H., et al. (2010). Geographical detectors-based health risk assessment and its application in the neural tube defects study for the fanjingshan area, China. International Journal of Geographical Information Science, 24(1), 107–127.
- Wang, J.F., Zhang, T.L., & Fu, B.J. (2016). A measure of spatial stratified heterogeneity. Ecological Indicators, 67, 250–256.
- Luo, P., Song, Y.*, et al. (2023). A generalized heterogeneity model for spatial interpolation. International Journal of Geographical Information Science, 37(3), 634–659. Link
- Guo, J., Wang, J., Xu, C., & Song, Y. (2022). Modeling of spatial stratified heterogeneity. GIScience & Remote Sensing, 59(1), 1660–1677. Link
- Song, Y., & Wu, P. (2021). An interactive detector for spatial associations. International Journal of Geographical Information Science. Link
- Song, Y., Wright, G., Wu, P., et al. (2018). Segment-Based Spatial Analysis for assessing road infrastructure impact on nearby land use. Remote Sensing, 10(11), 1696. Link
- Luo, P., Song, Y.*, et al. (2022). Identifying determinants of spatio-temporal disparities in soil moisture of the Northern Hemisphere using a geographically optimal zones-based heterogeneity model. ISPRS Journal of Photogrammetry and Remote Sensing, 185, 111–128. Link
- Zhang, Z., Song, Y.*, & Wu, P. (2022). Robust geographical detector. International Journal of Applied Earth Observation and Geoinformation, 109, 102782. Link
- Song, Y., Wu, P., et al. (2020). A Spatial Heterogeneity-Based Segmentation Model for evaluating LiDAR point cloud quality. IEEE Transactions on Intelligent Transportation Systems. Link
- Luo, P., Song, Y.*, & Wu, P. (2021). Spatial disparities in trade-offs among multiple ecosystem services: A case study in the Yangtze River Economic Belt. GIScience & Remote Sensing. Link
- Zhang, Z., Song, Y.*, et al. (2023). Spatial disparity of urban performance with economic-social-environmental data fusion. GIScience & Remote Sensing, 60(1), 2167567. Link