Model-based geostatistics for global public health using R

Author

Emanuele Giorgi
Claudio Fronterre

Published

August 9, 2025

Preface

Model-based geostatistics (MBG) has become an established modelling paradigm across several scientific disciplines, including global public health, which is the focus of this book. Much of the early development of MBG took place in data-rich contexts, particularly in the analysis of satellite imagery, where continuous outcomes dominate. In public health applications, however, data often take the form of counts or binary outcomes, for example the number of disease cases in a given place and time, or the infection status of an individual based on a diagnostic test. This shift in data type raises challenges, both methodological and computational, that are less prominent in data-rich settings with continuous responses. This book aims to fill this gap by providing an in-depth illustration of how geostatistical models for different types of outcome data, particularly count data, can and, in our view, should be developed in answer to public health problems.

One of the central issues in analysing geostatistical data in public health is identifying the right level of model complexity. In principle, sophisticated models could better capture the nuances of real-world data, but in practice we often have to adopt simpler models than we might wish, given the constraints of the data and the need for empirical justification. This book offers practical examples and guidance of how to navigate the balance between model complexity and empirical support. The material presented is a synthesis of our cumulative experience of geostatistical modelling through collaborations with epidemiologists and public health scientists, mainly in low- and middle-income countries.

This book is part of the R Series, and our emphasis is on the “how to do it” rather than the theoretical underpinnings. The intended reader is someone who has at least a basic understanding of linear regression models (more details on prerequisites are given in the introduction). You do not need a degree in statistics, but if you have never studied the subject before, you may find this book a difficult read. For quantitatively minded readers, however, the book, together with suggested readings, will provide a strong foundation for the analysis of geostatistical data. Each chapter places theoretical material in its final section, which can be skipped if desired, though reading it will do no harm.

The companion software for this book is the RiskMap package, which we developed in parallel with the text. RiskMap is the successor to PrevMap (Giorgi and Diggle 2017); if you are a loyal PrevMap user, it is probably time to say goodbye and make the transition to RiskMap. RiskMap is robust and offers a range of useful functionalities, though we do not claim it is the most computationally efficient implementation of geostatistical models; several other packages serve that role. Instead, our priority has been to provide one of the easiest interfaces for learning geostatistics in R. RiskMap can be a good starting point, and if speed becomes critical, you are free to move to other implementations. That said, we are continuing to improve RiskMap, and our goal is to make it fast enough that you will not feel the need to switch.

The scope of this first edition of the book is to equip readers with the R skills required to implement what we call a “standard” geostatistical analysis, covering data exploration, model fitting, prediction, and validation. By “standard” we mean the simplest class of geostatistical models, which form the foundation for more advanced approaches. There is little point in learning more complex techniques before mastering these basics. This book can be used alongside its companion, P. J. Diggle and Giorgi (2019), which provides a fuller account of the underlying theory and extends to models beyond what is presented in this first edition of the book.

Our approach throughout is based on maximum likelihood estimation, without the use of priors on model parameters. All the models presented are rooted in Gaussian process or random-effects formulations, but we avoid specifying arbitrarily chosen non-informative priors. In our applied work, we have not encountered situations where informative priors were genuinely available yet. For this reason, we prefer to maximise the likelihood and let the data speak directly. We do not believe that this approach produces meaningfully different results from fully Bayesian methods that are based on the use diffuse priors for the parameters of a geostistical model.

We are first and foremost grateful to our mentor and “statistical father,” Peter Diggle, whose wisdom and inspirational approach to modelling have been the foundation of our work. We owe him a debt of gratitude that can never truly be repaid, and dedicating this book to him is the least we can do. We also thank all of our students—past, present, and future—at MSc, PhD and postdoctoral levels, whose questions and feedback have continually shaped our thinking. Finally, we are grateful to our many colleagues and collaborators, without whom this book would not exist.

Thank you for using this book. We hope you enjoy reading it, and we invite you to get in touch with us if you have questions or comments.