Model-based geostatistics for global public health using R

Author

Emanuele Giorgi
Claudio Fronterre

Published

August 9, 2025

Preface

Model-based geostatistics (MBG) has become an established modelling paradigm across several scientific disciplines, including global public health, which is the focus of this book. Much of the early development of MBG took place in data-rich contexts, particularly in the analysis of satellite imagery, where continuous outcomes dominate. In public health applications, however, data often take the form of counts or binary outcomes, for example the number of disease cases in a given place and time, or the infection status of an individual based on a diagnostic test. This shift in data type raises challenges, both methodological and computational, that are less prominent in data-rich settings with continuous responses. This book aims to fill this gap by providing an in-depth illustration of how geostatistical models for different types of outcome data, particularly count data, can and, in our view, should be developed in answer to public health problems.

One of the central issues in analysing geostatistical data in public health is identifying the right level of model complexity. In principle, sophisticated models could better capture the nuances of real-world data, but in practice we often have to adopt simpler models than we might wish, given the constraints of the data and the need for empirical justification. This book offers practical examples and guidance of how to navigate the balance between model complexity and empirical support. The material presented is a synthesis of our cumulative experience of geostatistical modelling through collaborations with epidemiologists and public health scientists, mainly in low- and middle-income countries.

This book is part of the R Series, and our emphasis is on the “how to do it” rather than the theoretical underpinnings. The intended reader is someone who has at least a basic understanding of linear regression models (more details on prerequisites are given in the introduction). You do not need a degree in statistics, but if you have never studied the subject before, you may find this book a difficult read. For quantitatively minded readers, however, the book, together with suggested readings, will provide a strong foundation for the analysis of geostatistical data. Each chapter places theoretical material in its final section, which can be skipped if desired, though reading it will do no harm.

The companion software for this book is the RiskMap package, which we developed in parallel with the text. RiskMap is the successor to PrevMap (Giorgi and Diggle 2017); if you are a loyal PrevMap user, it is probably time to say goodbye and make the transition to RiskMap. RiskMap is robust and offers a range of useful functionalities, though we do not claim it is the most computationally efficient implementation of geostatistical models; several other packages serve that role. Instead, our priority has been to provide one of the easiest interfaces for learning geostatistics in R. RiskMap can be a good starting point, and if speed becomes critical, you are free to move to other implementations. That said, we are continuing to improve RiskMap, and our goal is to make it fast enough that you will not feel the need to switch.

The scope of this first edition of the book is to equip readers with the R skills required to implement what we call a “standard” geostatistical analysis, covering data exploration, model fitting, prediction, and validation. By “standard” we mean the simplest class of geostatistical models, which form the foundation for more advanced approaches. There is little point in learning more complex techniques before mastering these basics. This book can be used alongside its companion, P. J. Diggle and Giorgi (2019), which provides a fuller account of the underlying theory and extends to models beyond what is presented in this first edition of the book.

Our approach throughout is based on maximum likelihood estimation, without the use of priors on model parameters. All the models presented are rooted in Gaussian process or random-effects formulations, but we avoid specifying arbitrarily chosen non-informative priors. In our applied work, we have not encountered situations where informative priors were genuinely available yet. For this reason, we prefer to maximise the likelihood and let the data speak directly. We do not believe that this approach produces meaningfully different results from fully Bayesian methods that are based on the use diffuse priors for the parameters of a geostistical model.

We are first and foremost grateful to our mentor and “statistical father,” Peter Diggle, whose wisdom and inspirational approach to modelling have been the foundation of our work. We owe him a debt of gratitude that can never truly be repaid, and dedicating this book to him is the least we can do. We also thank all of our students—past, present, and future—at MSc, PhD and postdoctoral levels, whose questions and feedback have continually shaped our thinking. Finally, we are grateful to our many colleagues and collaborators, without whom this book would not exist.

Thank you for using this book. We hope you enjoy reading it, and we invite you to get in touch with us if you have questions or comments.

Amazigo, U. 2008. “The African Programme for Onchocerciasis Control (APOC).” Ann Trop Med Parasitol 102 (Suppl 1): 19–22. https://doi.org/10.1179/136485908X337436.

Bates, Douglas, Martin Mächler, Ben Bolker, and Steve Walker. 2015. “Fitting Linear Mixed-Effects Models Using lme4.” Journal of Statistical Software 67 (1): 1–48. https://doi.org/10.18637/jss.v067.i01.

Bhattachan, Abhinav, Rachel Barrios, Laura Harrington, Valerie A. Paz-Soldan, and A. Marm Kilpatrick. 2023. “Drought-Associated Reductions in Urban Mosquito Abundance During California’s Historic Drought.” Science of The Total Environment 891: 164519. https://doi.org/10.1016/j.scitotenv.2023.164519.

Bishop, Christopher M. 2006. Pattern Recognition and Machine Learning. Springer.

Bolin, David, and Jonas Wallin. 2023. “Local scale invariance and robustness of proper scoring rules.” Statistical Science 38 (1): 140–59. https://doi.org/10.1214/22-STS864.

Bolling, Bethany G., Christopher M. Barker, Chester G. Moore, W. John Pape, and Lars Eisen. 2009. “Seasonal Patterns for Entomological Measures of Risk for Exposure to Culex Vectors and West Nile Virus in Relation to Human Disease Cases in Northeastern Colorado.” Journal of Medical Entomology 46 (6): 1519–31. https://doi.org/10.1603/033.046.0641.

Bowman, A. W. 1997. Applied Smoothing Techniques for Data Analysis : The Kernel Approach with s-Plus Illustrations. Oxford Statistical Science Series ; 18. Oxford : New York: Clarendon Press ; Oxford University Press.

Breslow, N. E., and D. G. Clayton. 1993. “Approximate Inference in Generalized Linear Mixed Models.” Journal of the American Statistical Association 88: 9–25.

Brooks, Steve, Andrew Gelman, Galin L. Jones, and Xiao-Li Meng. 2011. Handbook of Markov Chain Monte Carlo. CRC Press.

California Department of Public Health. 2025. “California Mosquito-Borne Virus Surveillance and Response Plan.” Online.

Centers for Disease Control and Prevention. 2024. “West Nile Virus Surveillance and Control Guidelines.” CDC guidance. https://www.cdc.gov/west-nile-virus/php/surveillance-and-control-guidelines/index.html.

Chilès, J-P, and P. Delfiner. 2016. Geostatistics (Second Edition). Hoboken: Wiley.

Christensen, OF, GO Roberts, and M Sköld. 2006. “Robust Markov Chain Monte Carlo Methods for Spatial Generalized Linear Mixed Models.” Journal of Computational and Graphical Statistics 15 (1): 1–17.

Christensen, Ole F. 2004. “Monte Carlo Maximum Likelihood in Model-Based Geostatistics.” Journal of Computational and Graphical Statistics 13 (3): 702–18.

City of Boulder. 2025. “West Nile Virus — Estimating Risk to People: The Vector Index.” https://bouldercolorado.gov/west-nile-virus.

City of Fort Collins. 2014. “West Nile Virus Program Manual.” City of Fort Collins. https://www.fcgov.com/westnile/pdf/wnv_program_manual.pdf.

———. 2025. “Local Data — West Nile Virus.” https://www.fcgov.com/westnile/local-data.

Cowles, Mary Kathryn, and Bradley P. Carlin. 1996. “Markov Chain Monte Carlo Convergence Diagnostics: A Comparative Review.” Journal of the American Statistical Association 91 (434): 883–904.

Cressie, N. A. C. 1991. Statistics for Spatial Data. New York: Wiley.

Cressie, Noel. 1985. “Fitting Variogram Models by Weighted Least Squares.” Mathematical Geology 17 (5): 563–86.

Czado, Claudia, Tilmann Gneiting, and Leonhard Held. 2009. “Predictive Model Assessment for Count Data.” Biometrics 65 (4): 1254–61. https://doi.org/10.1111/j.1541-0420.2009.01191.x.

Dawid, A. P. 1984. “Statistical Theory: The Prequential Approach.” Journal of the Royal Statistical Society: Series A (General) 147 (2): 278–92.

Diggle, P J, and E Giorgi. 2019. Model-Based Geostatistics for Global Public Health : Methods and Applications. Chapman and Hall/CRC Interdisciplinary Statistics Ser. Milton: Chapman; Hall/CRC.

Diggle, P. J., J. A. Tawn, and R. A. Moyeed. 1998. “Model-Based Geostatistics.” Journal of the Royal Statistical Society: Series C (Applied Statistics) 47 (3): 299–350. https://doi.org/10.1111/1467-9876.00113.

Diggle, Peter, and Paulo Justiniano Ribeiro. 2007. Model-Based Geostatistics. Springer Series in Statistics. Springer.

Dobson, A. J., and A. Barnett. 2008. An Introduction to Generalized Linear Models. Third. Chapman; Hall/CRC.

Efron, Bradley. 1979. “Bootstrap Methods: Another Look at the Jackknife.” The Annals of Statistics 7 (1): 1–26.

Evans, Jeffrey S., and Melanie A. Murphy. 2021. spatialEco. https://github.com/jeffreyevans/spatialEco.

Fernández, J. A, A Rey, and A Carballeira. 2000. “An Extended Study of Heavy Metal Deposition in Galicia (NW Spain) Based on Moss Analysis.” Science of The Total Environment 254 (1): 31–44. https://doi.org/10.1016/S0048-9697(00)00431-9.

Gelman, Andrew, John B. Carlin, Hal S. Stern, David B. Dunson, Aki Vehtari, and Donald B. Rubin. 2013. Bayesian Data Analysis. 3rd ed. CRC Press.

Gelman, Andrew, and Donald B Rubin. 1992. “Inference from Iterative Simulation Using Multiple Sequences.” Statistical Science 7 (4): 457–72.

Geweke, John. 1992. “Evaluating the Accuracy of Sampling‐based Approaches to the Calculation of Posterior Moments.” Edited by Jose M. Bernardo, James O. Berger, A. Philip Dawid, and Adrian F. M. Smith, 169–93.

Geyer, Charles J. 1991. “Markov Chain Monte Carlo Maximum Likelihood.” Journal of Computational and Graphical Statistics 1 (4): 39–55.

Geyer, Charles J. 1994. “Likelihood and Exponential Families.” Department of Statistics, University of Minnesota.

———. 1996. “Markov Chain Monte Carlo Maximum Likelihood.” Department of Statistics, University of Minnesota.

———. 2019. “Monte Carlo Methods in MCMC.” In Handbook of MCMC, edited by Steve Brooks, Andrew Gelman, Galin L. Jones, and Xiao-Li Meng, 3–48. CRC Press.

Ghana Statistical Service, Ghana Health Service, and ICF International. 2015. Ghana Demographic and Health Survey 2014. Rockville, Maryland, USA: Ghana Statistical Service, Ghana Health Service,; ICF International. https://dhsprogram.com/publications/publication-FR307-DHS-Final-Reports.cfm.

Giorgi, Emanuele, and Peter J. Diggle. 2017. “PrevMap: An r Package for Prevalence Mapping.” Journal of Statistical Software 78 (8): 1–29. https://doi.org/10.18637/jss.v078.i08.

Gneiting, Tilmann, Fadoua Balabdaoui, and Adrian E Raftery. 2007. “Probabilistic Forecasts, Calibration and Sharpness.” Journal of the Royal Statistical Society: Series B (Statistical Methodology) 69 (2): 243–68.

Gneiting, Tilmann, and Adrian E Raftery. 2007. “Strictly Proper Scoring Rules, Prediction, and Estimation.” Journal of the American Statistical Association 102 (477): 359–78. https://doi.org/10.1198/016214506000001437.

Harrell, Frank E. 2015. Regression Modeling Strategies. 2nd ed. New York: Springer.

Hastie, Trevor, Robert Tibshirani, and Jerome Friedman. 2001. The Elements of Statistical Learning. Springer Series in Statistics. New York, NY, USA: Springer New York Inc.

Hastings, W. K. 1970. “Monte Carlo Sampling Methods Using Markov Chains and Their Applications.” Biometrika 57 (1): 97–109.

Johnson, Olatunji, Claudio Fronterre, Benjamin Amoah, Antonio Montresor, Emanuele Giorgi, Nicholas Midzi, Masceline Jenipher Mutsaka-Makuvaza, et al. 2021. “Model-Based Geostatistical Methods Enable Efficient Design and Analysis of Prevalence Surveys for Soil-Transmitted Helminth Infection and Other Neglected Tropical Diseases.” Clinical Infectious Diseases 72 (Supplement_3): S172–79. https://doi.org/10.1093/cid/ciab192.

Jones, Roderick C., Kingsley N. Weaver, Shamika Smith, Claudia Blanco, Cristina Flores, Kevin Gibbs, Daniel Markowski, and John-Paul Mutebi. 2011. “Use of the Vector Index and Geographic Information System to Prospectively Inform West Nile Virus Interventions.” Journal of the American Mosquito Control Association 27 (3): 315–19. https://doi.org/10.2987/10-6098.1.

Katz, Elizabeth, and Bill & Melinda Gates Foundation. 2020. “Gender and Malaria Evidence Reivew.” Bill & Melinda Gates Foundation. https://www.gatesgenderequalitytoolbox.org/wp-content/uploads/BMGF_Malaria-Review_FC.pdf.

Kilpatrick, A. Marm, and W. John Pape. 2013. “Predicting Human West Nile Virus Infections with Mosquito Surveillance Data.” American Journal of Epidemiology 178 (5): 829–35. https://doi.org/10.1093/aje/kwt046.

Kramer, Laura D., Jun Li, and Pei-Yong Shi. 2007. “West Nile Virus.” The Lancet Neurology 6 (2): 171–81. https://doi.org/10.1016/S1474-4422(07)70030-3.

Krige, D. G. 1951. “A Statistical Approach to Some Basic Mine Valuation Problems on the Witwatersrand.” Journal of the Chemical, Metallurgical and Mining Society of South Africa 52: 119–39.

Kyomuhangi, Irene, Tarekegn A. Abeku, Matthew J. Kirby, Gezahegn Tesfaye, and Emanuele Giorgi. 2021. “Understanding the Effects of Dichotomization of Continuous Outcomes on Geostatistical Inference.” Spatial Statistics 42: 100424. https://doi.org/https://doi.org/10.1016/j.spasta.2020.100424.

Lovelace, Robin, Jakub Nowosad, and Jannes Muenchow. 2020. Geocomputation with R. London, England: CRC Press. https://r.geocompx.org/.

Lucas, Tim C. D., Anita K. Nandi, Rosalind E. Howes, Daniel J. Weiss, Ewan Cameron, Nick Golding, and Peter W. Gething. 2020. “malariaAtlas: An r Interface to Global Malariometric Data Hosted by the Malaria Atlas Project.” Wellcome Open Research 5: 74. https://doi.org/10.12688/wellcomeopenres.15987.1.

Mahoney, Michael J, Lucas K Johnson, Julia Silge, Hannah Frick, Max Kuhn, and Colin M Beier. 2023. “Assessing the Performance of Spatial Cross-Validation Approaches for Models of Spatially Structured Data.” https://doi.org/10.48550/arXiv.2303.07334.

Matern, B. 2013. Spatial Variation. Lecture Notes in Statistics. Springer New York. https://books.google.co.uk/books?id=HrbSBwAAQBAJ.

Matheron, G. 1963. “Principles of Geostatistics.” Economic Geology 58: 1246–66.

Metropolis, Nicholas, Arianna W. Rosenbluth, Marshall N. Rosenbluth, Augusta H. Teller, and Edward Teller. 1953. “Equation of State Calculations by Fast Computing Machines.” The Journal of Chemical Physics 21 (6): 1087–92.

Neal, Radford M. 2003. “Slice Sampling.” The Annals of Statistics 31 (3): 705–67.

Nelder, J. A., and R. W. M. Wedderburn. 1972. “Generalized Linear Models.” Journal of the Royal Statistical Society A 135: 370–84.

Organization, World Health. 2024. “Indicator Metadata Registry: Child Malnutrition—Underweight Among Children Under Five Years of Age (Weight-for-Age <-2 SD).” https://www.who.int/data/gho/indicator-metadata-registry/imr-details/27.

Pawitan, Yudi. 2001. In All Likelihood : Statistical Modelling and Inference Using Likelihood. Oxford ; New York: Clarendon Press : Oxford University Press.

Petersen, Lyle R., Aaron C. Brault, and Roger S. Nasci. 2013. “West Nile Virus: Review of the Literature.” JAMA 310 (3): 308–15. https://doi.org/10.1001/jama.2013.8042.

Puranik, Amitha, Peter J. Diggle, Maurice R. Odiere, Katherine Gass, Stella Kepha, Collins Okoyo, Charles Mwandawiro, et al. 2024. “Understanding the Impact of Covariates on the Classification of Implementation Units for Soil-Transmitted Helminths Control: A Case Study from Kenya.” BMC Medical Research Methodology 24 (1): 294. https://doi.org/10.1186/s12874-024-02420-1.

Ripley, B. D. 1981. Spatial Statistics. New York: Wiley.

Robert, Christian P, and George Casella. 2004. Monte Carlo Statistical Methods. 2nd ed. Springer.

Roberts, Gareth O., and Richard L. Tweedie. 1996. “Exponential Convergence of Langevin Distributions and Their Discrete Approximations.” Bernoulli 2 (4): 341–63.

Ross, Sheldon. 2013. First Course in Probability, a. 9th ed. Harlow: Pearson Education UK.

Rossky, Peter J., J. D. Doll, and Harold L. Friedman. 1978. “Brownian Dynamics as Smart Monte Carlo Simulation.” The Journal of Chemical Physics 69 (10): 4628–33.

Rue, H., S. Martino, and N. Chopin. 2009. “Approximate Bayesian Inference for Latent Gaussian Models by Using Integrated Nested Laplace Approximations.” Journal of the Royal Statistical Society: Series B (Statistical Methodology) 71 (2): 319–92. https://doi.org/10.1111/j.1467-9868.2008.00700.x.

Sacramento-Yolo Mosquito and Vector Control District. 2018. “2018 Annual Report.” Online.

Shmueli, Galit. 2010. “To Explain or to Predict?” Statistical Science 25 (3): 289–310.

Smith, David L, Carlos A Guerra, Robert W Snow, and Simon I Hay. 2007. “Standardizing Estimates of the Plasmodium Falciparum Parasite Rate.” Malaria Journal 6 (1): 131–31.

Stein, Michael L. 1999. Interpolation of Spatial Data Some Theory for Kriging. 1st ed. 1999. Springer Series in Statistics. New York, NY: Springer New York : Imprint: Springer.

Stevenson, Gillian H. AND Gitonga, Jennifer C. AND Stresman. 2013. “Reliability of School Surveys in Estimating Geographic Variation in Malaria Transmission in the Western Kenyan Highlands.” PLOS ONE 8 (10). https://doi.org/10.1371/journal.pone.0077641.

Tene Fossog, Billy, Diego Ayala, Pelayo Acevedo, Pierre Kengne, Ignacio Ngomo Abeso Mebuy, Boris Makanga, Julie Magnus, et al. 2015. “Habitat Segregation and Ecological Character Displacement in Cryptic African Malaria Mosquitoes.” Evolutionary Applications 8 (4): 326–45. https://doi.org/10.1111/eva.12242.

Tobler, W. R. 1970. “A Computer Movie Simulating Urban Growth in the Detroit Region.” Economic Geography 46: 234–40.

United Nations Statistics Division, World Health Organization, and UNICEF. 2025. “SDG Indicator 2.2.1 Metadata: Prevalence of Stunting (Height-for-Age <-2 SD) Among Children Under Five Years of Age.” https://unstats.un.org/sdgs/metadata/files/Metadata-02-02-01.pdf.

Watson, G. S. 1971. “Trend -Surface Analysis.” Mathematical Geology 3: 215–26.

———. 1972. “Trend Surface Analysis and Spatial Correlation.” Geological Society of America Special Paper 146: 39–46.

Weisberg, Sanford. 2014. Applied Linear Regression. Fourth. Hoboken NJ: Wiley. http://z.umn.edu/alr4ed.

WHO. 2006. “WHO Child Growth Standards: Length/Height-for-Age, Weight-for-Age, Weight-for-Length, Weight-for-Height and Body Mass Index-for-Age: Methods and Development.” Geneva: WHO. https://www.who.int/publications/i/item/924154693X.

Yin, Hui, Yutong Wang, Yuxin Wang, Zihan Li, Yujie Li, and Yuxin Wang. 2024. “A Rapid Review of Clustering Algorithms.” arXiv Preprint arXiv:2401.07389.

Zhang, Hao. 2002. “On Estimation and Prediction for Spatial Generalized Linear Mixed Models.” Biometrics 58 (1): 129–36.

———. 2004. “Inconsistent Estimation and Asymptotically Equal Interpolations in Model-Based Geostatistics.” Journal of the American Statistical Association 99 (465): 250–61.

Zouré, Honorat GM, Mounkaila Noma, Afework H Tekle, Uche V Amazigo, Peter J Diggle, Emanuele Giorgi, and Jan HF Remme. 2014. “Geographic Distribution of Onchocerciasis in the 20 Participating Countries of the African Programme for Onchocerciasis Control: (2) Pre-Control Endemicity Levels and Estimated Number Infected.” Parasites & Vectors 7 (1): 326–26.