Geographical detectors and 'GD' R Package: Q & A

The geographical detectors (GD) model is a powerful statistical method for spatial heterogeneity analysis and identifying the determinants of geographical phenomena. The 'GD' R package provides an efficient implementation of this model with comprehensive tools for spatial analysis.

R package "GD" Tutorial

GD Badge downloads downloads/month

Installation

Install the 'GD' R package from CRAN:

install.packages("GD")
library("GD")

Citation

"The OPGD model was performed using 'GD' R package (Song et al., 2020)."

Reference

Song, Y., Wang, J., Ge, Y. & Xu, C. (2020). "An optimal parameters-based geographical detector model enhances geographic characteristics of explanatory variables for spatial heterogeneity analysis: Cases with different types of spatial data." GIScience & Remote Sensing, 57(5), 593–610. DOI: 10.1080/15481603.2020.1760434

Note: Our research team published a series of articles about new methods of spatial stratified heterogeneity theory in top journals. Please refer to the reference list at the end of this page.

Q & A

Q1: What is recommended number of observations for spatial analysis using Geodetector (GD) or OPGD models? How many breaks are recommended for spatial data discretization?

Spatial unit should be defined before modelling because the number of spatial observations is affected by the spatial unit. According to Section 2.1 Optimal spatial data discretization in Song et al. (2021), "there are two applicable strategies based on the number of observations and practical requirements." For a relatively small data set, the recommended number of breaks is from 3 to 6 (or so). For a relatively large amount of data, the recommended break number is an integer sequence from 3 to 22 (or so), and the discretization method is the quantile break.

Q2: If GD/OPGD models and the "GD" R package work for large datasets? How much time does it cost?

GD works for large datasets. According to the analysis in Section 5 Discussion in Song et al. (2020), "When the sample size reaches 1000, 10 000, 100 000, only 0.05 s, 0.14 s, and 1.55 s are used for simultaneous computation of four parts of geographical detectors by the GD package, respectively." If you cost too much time and fail to get results, please stop running and check the data and steps of modelling. If you try to apply GD to a very large dataset, please pay attention to the spatial distribution and density distribution of data, and spatial data discretization during modelling.

Q3: GD package runs well for most variables, but it can't return outcomes for a few variables after a long-time running. What is the issue?

Sometimes because observation data contains "NA". Please remove NA before computation. If the issue still can't be solved, please compute spatial data discretization and GD models separately. If data doesn't contain NA, but the GD R package still can't return results, some explanatory variables may contain too many same value data. This will cause all observations in a spatial zone to be identical, the standard deviation is zero, and GD can't return results. If your case has this issue, you may address it through: (1) increasing the size of spatial units to reduce the number of observations; (2) using quantile breaks; and (3) trying to use higher resolution data. Please make sure that there are at least two observations with different values within each spatial zone. It is recommended to use "quantile breaks" to manually run spatial discretization to check the basic characteristics of variable data. The Geographically Optimal Zones-based Heterogeneity (GOZH) model is recommended if you would like to try the improved GD models (See Q4).

Q4: Are there any advanced GD models for more accurate and effective modelling?

Currently, there are the following advanced GD models:

Model Description Publication Software
OPGD Characterising spatial heterogeneity, identifying geographical factors and interactive impacts Song et al. (2020) R package "GD". Tutorial
IDSA Estimating power of interactive determinants from spatial perspective Song et al. (2021) R package "IDSA"
GHM Characterizing local and stratified heterogeneity, improving interpolation Luo, Song*, et al. (2023)
GOZH Identifying determinants across large study areas using optimal zones and Ω-index Luo, Song*, et al. (2022)
RGD Robust estimation of PD values Zhang, Song*, et al. (2022)

Review: Guo, J., Wang, J., Xu, C., & Song, Y. (2022). Modeling of spatial stratified heterogeneity. GIScience & Remote Sensing, 59(1), 1660–1677. Link

Q5: How to export results to data.frame or spreadsheet?

The outcomes are saved as a list of dataframe. Option 1: print results, copy to excel. Option 2: use "lapply" and "write.csv" to export to .csv files.

Q6: Why the sum of q values of two variables are not equal to the q value of the interaction?

The primary reason is that the sum of Q values of individual variables is the nonlinear enhanced or weakened relations of variables.

Q7: When I plot the results of "gdm" or spatial discretization functions, it returns multiple figures, but I can only see the last one. How to find previous figures?

Please use RStudio and click "last figure" button.

Q8: When I plot figures, some texts or elements are overlapped with legends. How to solve this issue?

Drag the plotting space in RStudio to a larger area, then run the plotting codes.

Q9: In gdm function, everything is good, but it returns an error that continuous variable names are not matched. How to solve?

The data set is not a "real data.frame". Use: data <- as.data.frame(data)

Advanced Models

The GD framework has been extended with several advanced models to address specific research needs:

OPGD (Optimal Parameters-based Geographical Detector)

An enhanced version of the original GD model that automatically optimizes discretization parameters to maximize the explanatory power of spatial factors. This model provides better accuracy in identifying spatial heterogeneity and is particularly effective for continuous variables.

IDSA (Interactive Detector for Spatial Associations)

Extends GD to explicitly model and quantify interactive effects between multiple spatial variables. This model is essential for understanding how different spatial factors interact to influence geographical phenomena.

GHM (Generalized Heterogeneity Model)

Characterizes both local and stratified heterogeneity patterns, improving spatial interpolation and prediction accuracy. Particularly useful for applications requiring localized heterogeneity assessment.

GOZH (Geographically Optimal Zones-based Heterogeneity)

Identifies spatial determinants across large study areas by automatically defining optimal spatial zones. The Ω-index provides a new perspective on stratified heterogeneity.

RGD (Robust Geographical Detector)

A robust version of GD that provides more reliable PD value estimates, particularly when dealing with outliers or non-normally distributed data.

References

  1. Song, Y., Wang, J.F., Ge, Y., et al. (2020). An optimal parameters-based geographical detector model enhances geographic characteristics of explanatory variables for spatial heterogeneity analysis. GIScience & Remote Sensing, 57(5), 593–610. Link
  2. Wang, J.F., Li, X.H., et al. (2010). Geographical detectors-based health risk assessment and its application in the neural tube defects study for the fanjingshan area, China. International Journal of Geographical Information Science, 24(1), 107–127.
  3. Wang, J.F., Zhang, T.L., & Fu, B.J. (2016). A measure of spatial stratified heterogeneity. Ecological Indicators, 67, 250–256.
  4. Luo, P., Song, Y.*, et al. (2023). A generalized heterogeneity model for spatial interpolation. International Journal of Geographical Information Science, 37(3), 634–659. Link
  5. Guo, J., Wang, J., Xu, C., & Song, Y. (2022). Modeling of spatial stratified heterogeneity. GIScience & Remote Sensing, 59(1), 1660–1677. Link
  6. Song, Y., & Wu, P. (2021). An interactive detector for spatial associations. International Journal of Geographical Information Science. Link
  7. Song, Y., Wright, G., Wu, P., et al. (2018). Segment-Based Spatial Analysis for assessing road infrastructure impact on nearby land use. Remote Sensing, 10(11), 1696. Link
  8. Luo, P., Song, Y.*, et al. (2022). Identifying determinants of spatio-temporal disparities in soil moisture of the Northern Hemisphere using a geographically optimal zones-based heterogeneity model. ISPRS Journal of Photogrammetry and Remote Sensing, 185, 111–128. Link
  9. Zhang, Z., Song, Y.*, & Wu, P. (2022). Robust geographical detector. International Journal of Applied Earth Observation and Geoinformation, 109, 102782. Link
  10. Song, Y., Wu, P., et al. (2020). A Spatial Heterogeneity-Based Segmentation Model for evaluating LiDAR point cloud quality. IEEE Transactions on Intelligent Transportation Systems. Link
  11. Luo, P., Song, Y.*, & Wu, P. (2021). Spatial disparities in trade-offs among multiple ecosystem services: A case study in the Yangtze River Economic Belt. GIScience & Remote Sensing. Link
  12. Zhang, Z., Song, Y.*, et al. (2023). Spatial disparity of urban performance with economic-social-environmental data fusion. GIScience & Remote Sensing, 60(1), 2167567. Link