Geodetector model and “GD” R package: Q & A

Downloads

Optimal Parameters-based Geographical Detectors (OPGD) Model in R

## install R package "GD"
install.packages("GD")
library("GD")

To cite GD” R package in publications, please use:
“The OPGD model was performed using ‘GD’ R package (Song et al., 2020).”

Song, Y., Wang, J., Ge, Y. & Xu, C. (2020) “An optimal parameters-based geographical detector model enhances geographic characteristics of explanatory variables for spatial heterogeneity analysis: Cases with different types of spatial data”, GIScience & Remote Sensing. 57(5), 593-610. doi: 10.1080/15481603.2020.1760434.

Our research team published a series of articles about new methods of spatial stratified heterogeneity theory in top journals. Please refer the reference list at the end of this page.

If you have a question, please send me an email. I will solve your question as soon as possible.

Q1: What is recommended number of observations for spatial analysis using Geodetector (GD) or OPGD (optimal parameters-based geographical detector) models? How many breaks are recommended for spatial data discretization?

A1: Spatial unit should be defined before modelling because the number of spatial observations is affected by the spatial unit. According to Section 2.1 Optimal spatial data discretization in Song et al. (2021), “there are two applicable strategies based on the number of observations and practical requirements.” For a relatively small data set, the recommended number of breaks is from 3 to 6 (or so). For a relatively large amount of data, the recommended break number is an integer sequence from 3 to 22 (or so), and the discretization method is the quantile break.

Q2: If GD/OPGD models and the “GD” R package work for large datasets? How much time does it cost?

A1: GD works for large datasets. According to the analysis in Section 5 Discussion in Song et al. (2020), “When the sample size reaches 1000, 10 000, 100 000, only 0.05 s, 0.14 s, and 1.55 s are used for simultaneous computation of four parts of geographical detectors by the GD package, respectively.” If you cost too much time and fail to get results, please stop running and check the data and steps of modelling.
If you try to apply GD to a very large dataset, please pay attention to the spatial distribution and density distribution of data, and spatial data discretization during modelling. Practical data is usually very complex in the real world. Please make sure you have an in-depth understanding of data of geographical attributes before spatial modelling.

Q3: GD package runs well for most variables, but it can’t return outcomes for a few variables after a long-time running. What is the issue?

A3: Sometimes because observation data contains “NA”. Please remove NA before computation. If the issue still can’t be solved, please compute spatial data discretization and GD models separately.
If data doesn’t contain NA, but the GD R package still can’t return results, some explanatory variables may contain too many same value data. This will cause all observations in a spatial zone to be identical, the standard deviation is zero, and GD can’t return results. If your case has this issue, you may address it through the following approaches: (1) increasing the size of spatial units to reduce the number of observations; (2) using quantile breaks; and (3) trying to use higher resolution data of this explanatory variable to avoid too many same values of the variable.
Please make sure that there are at least two (a few are recommended) observations with different values within each of the spatial zones derived from spatial discretization.
It is recommended to use “quantile breaks” to manually run spatial discretization to check the basic characteristics of variable data. If spatial data discretization runs well, it should work well for GD.
If the number of observations is higher than 1000, it is recommended to use “quantile breaks” for modelling, because the data is large enough and quantile break is more reliable than equal, natural, standard deviation, and geometrical breaks.
The Geographically Optimal Zones-based Heterogeneity (GOZH) model is recommended if you would like to try the improved GD models (See Q4).

Q4: Are there any advanced GD models for more accurate and effective modelling?

A4: Currently, there are the following advanced GD models:

ModelDescriptionPublicationSoftware
Optimal Parameters-based Geographical Detector (OPGD)OPGD is used for characterising spatial heterogeneity, identifying geographical factors and interactive impacts of factors, and estimating risks.Song et al. (2020)
Related publications: Song et al. (2018), Luo, Song*, et al., (2021)
R package “GD”. Tutorial
Interactive Detector for Spatial Associations (IDSA)IDSA is used for estimating the power of interactive determinants (PID) from a spatial perspective. The IDSA model considers spatial heterogeneity, spatial autocorrelation, and spatial fuzzy overlay of multiple explanatory variables for calculating PID.Song et al. (2021)R package “IDSA”
Generalized Heterogeneity Model (GHM)GHM is used for characterizing local and stratified heterogeneity within variables and to improve interpolation accuracy.Luo, Song*, et al. (2023)
Geographically Optimal Zones-based Heterogeneity (GOZH) GOZH is used for identifying individual and interactive determinants of geographical attributes (e.g., global soil moisture) across a large study area. GOZH can identify optimal spatial zones and compute the maximum power of determinant (PD) values using an Ω-index. Luo, Song*, et al. (2022)
Robust Geographical Detector (RGD)RGD model is used for the robust estimation of PD values.Zhang, Song*, et al. (2022)

You can also refer the review: Guo, J., Wang, J., Xu, C., & Song, Y. (2022). Modeling of spatial stratified heterogeneity. GIScience & Remote Sensing59(1), 1660-1677.

Q5: How to export results to data.frame or spreadsheet?

A5: The outcomes of GD/OPGD models are saved as a list of dataframe. There are two options to export results. First, you can print results, and copy to excel. Second, you can use “lapply” and “write.csv” to export the list of dataframe to .csv files.

Q6: Why the sum of q values of two variables are not equal to the q value of the interaction of the two variables?

A6: The primary reason is that the sum of Q values of individual variables is the nonlinear enhanced or weakened relations of variables.

Q7: When I plot the results of “gdm” or spatial discretization functions, it returns multiple figures, but I can only see the last one. How to find previous figures?

A7: Please use RStudio and click “last figure” button to check the last few figures.

Q8: When I plot figures, some texts or elements are overlapped with legends or other elements. How to solve this issue?

A8: Please drag the plotting space in Rstudio to a relatively large area, and then run the plotting codes. The size of plots can be changed.

Q9: In gdm function, everything is good, but it returns an error that continuous variable names are not matched with data.frame. How to solve this issue?

A9: This is because the data set is not a “real data.frame”. Please use the following code to convert it to data.frame.
data <- as.data.frame(data)

Reference

Song, Y., Wang, J.F., Ge, Y., et al. An optimal parameters-based geographical detector model enhances geographic characteristics of explanatory variables for spatial heterogeneity analysis: cases with different types of spatial dataGIScience & Remote Sensing, 2020. 57(5): 593-610

Wang, J. F., Li, X. H., Christakos, G., Liao, Y. L., et al. Geographical detectors‐based health risk assessment and its application in the neural tube defects study of the Heshun Region, China. International Journal of Geographical Information Science, 2010. 24(1), 107-127.

Wang, J. F., Zhang, T. L., & Fu, B. J. A measure of spatial stratified heterogeneity. Ecological indicators, 2016. 67, 250-256.

Luo., P., Song, Y.*, Zhu, D., Cheng, J., & Meng, L. A generalized heterogeneity model for spatial interpolation. International Journal of Geographical Information Science. 2023, 37(3): 634-659.

Guo, J., Wang, J., Xu, C., & Song, Y. Modeling of spatial stratified heterogeneity. GIScience & Remote Sensing, 2020. 59(1), 1660-1677.

Song, Y., Wu, P. An interactive detector for spatial associationsInternational Journal of Geographical Information Science, 2021.

Song, Y., Wright, G., Wu, P., Thatcher, D., et al. Segment-Based Spatial Analysis for Assessing Road Infrastructure Performance Using Monitoring Observations and Remote Sensing DataRemote Sensing, 2018. 10(11): 1696.

Luo, P., Song, Y.*, Huang, X., Ma, H., et al. Identifying determinants of spatio-temporal disparities in soil moisture of the Northern Hemisphere using a geographically optimal zones-based heterogeneity modelISPRS Journal of Photogrammetry and Remote Sensing, 2022. 185, 111-128.

Zhang, Z., Song, Y.*, & Wu, P. Robust geographical detectorInternational Journal of Applied Earth Observation and Geoinformation, 2022. 109, 102782.

Song, Y., Wu, P., Gilmore, D., et al. A Spatial Heterogeneity-Based Segmentation Model for Analyzing Road Deterioration Network Data in Multi-Scale Infrastructure SystemsIEEE Transactions on Intelligent Transportation Systems, 2020

Luo, P., Song, Y.* Wu, P. Spatial disparities in trade-offs: economic and environmental impacts of road infrastructure on continental levelGIScience & Remote Sensing, 2021.

Zhang, Z., Song, Y.*, Archer, N. and Wu, P. Spatial disparity of urban performance from a scaling perspective: a study of industrial features associated with economy, infrastructure, and innovation. GIScience & Remote Sensing. 2023. 60(1), p.2167567.

BibTeX

@article{song2020optimal,
  title={An optimal parameters-based geographical detector model enhances geographic characteristics of explanatory variables for spatial heterogeneity analysis: Cases with different types of spatial data},
  author={Song, Yongze and Wang, Jinfeng and Ge, Yong and Xu, Chengdong},
  journal={GIScience \& Remote Sensing},
  volume={57},
  number={5},
  pages={593--610},
  year={2020},
  publisher={Taylor \& Francis}
}

@article{wang2010geographical,
  title={Geographical detectors-based health risk assessment and its application in the neural tube defects study of the Heshun Region, China},
  author={Wang, Jinfeng and Li, Xinhai and Christakos, George and Liao, Yilan and Zhang, Wei and Gu, Xingfa and Zheng, Xiaoying},
  journal={International Journal of Geographical Information Science},
  volume={24},
  number={1},
  pages={107--127},
  year={2010},
  publisher={Taylor \& Francis}
}

@article{wang2016measure,
  title={A measure of spatial stratified heterogeneity},
  author={Wang, Jinfeng and Zhang, Tonglin and Fu, Bojie},
  journal={Ecological Indicators},
  volume={67},
  pages={250--256},
  year={2016},
  publisher={Elsevier}
}

@article{luo2023generalized,
  title={A generalized heterogeneity model for spatial interpolation},
  author={Luo, P and Song, Yongze and Zhu, D and Cheng, J and Meng, L},
  journal={International Journal of Geographical Information Science},
  volume={37},
  number={3},
  pages={634--659},
  year={2023},
  publisher={Taylor \& Francis}
}

@article{guo2020modeling,
  title={Modeling of spatial stratified heterogeneity},
  author={Guo, J and Wang, J and Xu, C and Song, Y},
  journal={GIScience \& Remote Sensing},
  volume={59},
  number={1},
  pages={1660--1677},
  year={2020},
  publisher={Taylor \& Francis}
}

@article{song2021interactive,
  title={An interactive detector for spatial associations},
  author={Song, Yongze and Wu, P},
  journal={International Journal of Geographical Information Science},
  year={2021},
  publisher={Taylor \& Francis}
}

@article{song2018segment,
  title={Segment-Based Spatial Analysis for Assessing Road Infrastructure Performance Using Monitoring Observations and Remote Sensing Data},
  author={Song, Yongze and Wright, G and Wu, P and Thatcher, D and others},
  journal={Remote Sensing},
  volume={10},
  number={11},
  pages={1696},
  year={2018},
  publisher={MDPI}
}

@article{luo2022identifying,
  title={Identifying determinants of spatio-temporal disparities in soil moisture of the Northern Hemisphere using a geographically optimal zones-based heterogeneity model},
  author={Luo, P and Song, Yongze and Huang, X and Ma, H and others},
  journal={ISPRS Journal of Photogrammetry and Remote Sensing},
  volume={185},
  pages={111--128},
  year={2022},
  publisher={Elsevier}
}

@article{zhang2022robust,
  title={Robust geographical detector},
  author={Zhang, Z and Song, Yongze and Wu, P},
  journal={International Journal of Applied Earth Observation and Geoinformation},
  volume={109},
  pages={102782},
  year={2022},
  publisher={Elsevier}
}

@article{song2020spatial,
  title={A Spatial Heterogeneity-Based Segmentation Model for Analyzing Road Deterioration Network Data in Multi-Scale Infrastructure Systems},
  author={Song, Yongze and Wu, P and Gilmore, D and others},
  journal={IEEE Transactions on Intelligent Transportation Systems},
  year={2020},
  publisher={IEEE}
}

@article{luo2021spatial,
  title={Spatial disparities in trade-offs: economic and environmental impacts of road infrastructure on continental level},
  author={Luo, P and Song, Yongze and Wu, P},
  journal={GIScience \& Remote Sensing},
  year={2021},
  publisher={Taylor \& Francis}
}

@article{zhang2023spatial,
  title={Spatial disparity of urban performance from a scaling perspective: a study of industrial features associated with economy, infrastructure, and innovation},
  author={Zhang, Z and Song, Yongze and Archer, N and Wu, P},
  journal={GIScience \& Remote Sensing},
  volume={60},
  number={1},
  pages={2167567},
  year={2023},
  publisher={Taylor \& Francis}
}

@article{song2020optimal,
  title={An optimal parameters-based geographical detector model enhances geographic characteristics of explanatory variables for spatial heterogeneity analysis: Cases with different types of spatial data},
  author={Song, Yongze and Wang, Jinfeng and Ge, Yong and Xu, Chengdong},
  journal={GIScience \& Remote Sensing},
  volume={57},
  number={5},
  pages={593--610},
  year={2020},
  publisher={Taylor \& Francis}
}

@article{wang2010geographical,
  title={Geographical detectors-based health risk assessment and its application in the neural tube defects study of the Heshun Region, China},
  author={Wang, Jinfeng and Li, Xinhai and Christakos, George and Liao, Yilan and Zhang, Wei and Gu, Xingfa and Zheng, Xiaoying},
  journal={International Journal of Geographical Information Science},
  volume={24},
  number={1},
  pages={107--127},
  year={2010},
  publisher={Taylor \& Francis}
}

@article{wang2016measure,
  title={A measure of spatial stratified heterogeneity},
  author={Wang, Jinfeng and Zhang, Tonglin and Fu, Bojie},
  journal={Ecological Indicators},
  volume={67},
  pages={250--256},
  year={2016},
  publisher={Elsevier}
}

@article{luo2023generalized,
  title={A generalized heterogeneity model for spatial interpolation},
  author={Luo, P and Song, Yongze and Zhu, D and Cheng, J and Meng, L},
  journal={International Journal of Geographical Information Science},
  volume={37},
  number={3},
  pages={634--659},
  year={2023},
  publisher={Taylor \& Francis}
}

@article{song2021interactive,
  title={An interactive detector for spatial associations},
  author={Song, Yongze and Wu, P},
  journal={International Journal of Geographical Information Science},
  year={2021},
  publisher={Taylor \& Francis}
}

@article{song2018segment,
  title={Segment-Based Spatial Analysis for Assessing Road Infrastructure Performance Using Monitoring Observations and Remote Sensing Data},
  author={Song, Yongze and Wright, G and Wu, P and Thatcher, D and others},
  journal={Remote Sensing},
  volume={10},
  number={11},
  pages={1696},
  year={2018},
  publisher={MDPI}
}

@article{luo2022identifying,
  title={Identifying determinants of spatio-temporal disparities in soil moisture of the Northern Hemisphere using a geographically optimal zones-based heterogeneity model},
  author={Luo, P and Song, Yongze and Huang, X and Ma, H and others},
  journal={ISPRS Journal of Photogrammetry and Remote Sensing},
  volume={185},
  pages={111--128},
  year={2022},
  publisher={Elsevier}
}

@article{zhang2022robust,
  title={Robust geographical detector},
  author={Zhang, Z and Song, Yongze and Wu, P},
  journal={International Journal of Applied Earth Observation and Geoinformation},
  volume={109},
  pages={102782},
  year={2022},
  publisher={Elsevier}
}

@article{song2020spatial,
  title={A Spatial Heterogeneity-Based Segmentation Model for Analyzing Road Deterioration Network Data in Multi-Scale Infrastructure Systems},
  author={Song, Yongze and Wu, P and Gilmore, D and others},
  journal={IEEE Transactions on Intelligent Transportation Systems},
  year={2020},
  publisher={IEEE}
}

@article{luo2021spatial,
  title={Spatial disparities in trade-offs: economic and environmental impacts of road infrastructure on continental level},
  author={Luo, P and Song, Yongze and Wu, P},
  journal={GIScience \& Remote Sensing},
  year={2021},
  publisher={Taylor \& Francis}
}

@article{zhang2023spatial,
  title={Spatial disparity of urban performance from a scaling perspective: a study of industrial features associated with economy, infrastructure, and innovation},
  author={Zhang, Z and Song, Yongze and Archer, N and Wu, P},
  journal={GIScience \& Remote Sensing},
  volume={60},
  number={1},
  pages={2167567},
  year={2023},
  publisher={Taylor \& Francis}
}