« More than ever in view of the current pandemic, locating clusters of diseases, understanding why some diseases are concentrated in certain areas and finding solutions to them is a matter of utmost importance.
In their article entitled Weighted Normal Spatial Scan Statistic for Heterogeneous Population Data from the Journal of the American Statistical Association, written in 2009, Lan Huang, Ram C. Tiwari, Zhaohui Zou, Martin Kulldorf and Eric J. Feuer offer a statistical solution of the same name to the first step: detecting clusters.
That solution is based on specific continuous measures of that disease. A potential cluster would be found out if those measures are either too high or too low. Unlike other spatial scans, the main interest of this one here is in geographical distribution, in this case, we refer to aggregated data. The cluster detected is a collection of geographic units with high/low regional measures that directly reflect the behavior of the cells, instead of the individuals inside cells.
Remember that considering aggregated data slightly increases the difficulty of the problem to be solved as the variance is no longer the same for all observations. They consider a circular area to simplify, assuming its center is known and the only parameter is its radius. In addition to that, the authors add weights to their study to represent the uncertainty of regional measures or the sample size (number of observed cases). For example, hospitals do not all welcome the same amount of patients and so it can have an effect on the variances. To test their results, they applied several simulations as well as applications on real data namely:
- 1988-2002 stage I and II lung cancer survival data in LA county (diagnosis of survival rate)
- 1999-2003 breast cancer ageadjusted mortality rate date in the U.S.
Several issues are faced here:
• a statistic test is used here because the exhaustive research method, consisting of testing each zone one by one to get the one with the most extreme measures would be way too long and too expensive to implement
• using aggregated data makes the problem even more complex as there is no known statistical law for such data
Throughout our own article, we tried to explain most of their results as well as the issues faced here. First, we explained how it would have worked to study individual data and the exhaustive research method. We then detailed their method and tried to implement it in a real case. We were interested in the rate of premature babies for 94 departments in 2018 in metropolitan France. From the spatial discrepancies in premature births, we aim to detect the environ mental factors that would cause these highrisk births. »