The ever-present ghost of data quality in SDMs

Those who know me are likely well aware of my interest in species distribution models (SDMs). In particular, I’ve been focused for years on how we can enhance these models using higher-resolution data, such as microclimate information or anthropogenic disturbance.

This queeste for increasing SDM-resolution, however, has to overcome a few highly important data-related issues that can’t be fixed by simply increasing the resolution of the maps used as explanatory variables. In a review published just now in Ecography, we discuss these and related issue: sample size, positional uncertainty and sampling bias. Indeed, one can have microclimate data with as high of a resolution as possible, if your species data is suffering from one of these three issues, you can’t get the performance of your model anywhere close to what you might have been hoping for.

Sampling bias, sample size and positional uncertainty – the three characteristics of the looming ghost of data quality that might affect the performance of your SDMs. All three of them are affected by species ecology and the environment.

Positional uncertainty

Case in point: positional uncertainty. When building SDMs, we often think about our species observations as points on the map. Often they are not, however; they are more like smudges. Depending on the data, the observational errors can range from just a few meters (e.g., GPS inaccuracies) up to a kilometer (e.g., aggregated data from global databases) or even more (e.g., historical data with poor location information such as some herbaria). Failing to take into account that uncertainty (i.e., working with the falsely comforting points rather than the smears on your map) could affect the apparent correlations between species observation and environmental data. The size and importance of this error also varies between species. For example, for mobile species it is often much harder to pinpoint an exact location, while deep-sea organisms are often located using less-accurate acoustic positioning.

Three categories of factors driving positional uncertainty: the resolution and configuration of the spatial predictors (e.g., micro- versus macroclimate data – see the paper for more details), recording techniques and data processing (e.g., GPS accuracy) and species ecology and site characteristics (e.g., a lower accuracy for big animals, limited GPS accuracy under forest canopies or in cities)

Sampling bias

A similar issue exists with sampling bias. Often enough, we feel reassured by big numbers, with models built using thousands of points looking soothingly trustworthy. Here again, however, these numbers could create false confidence.

Species observations often have strong spatial bias, with many points located close to each other, and big gaps in between. Typically, positive sampling biases have been reported towards easily accessible areas (e.g. proximity to roads, rivers, and urban settlements), protected areas, more populated areas, and charismatic species, leading to spatial and taxonomic biases. Uneven data-sharing practices make this issue even worse. These issues are not only present when using citizen science data, but at a larger scale also when using data collected by researchers, who are similarly biased towards certain locations that are more reachable, more interesting, or more likely to attract funding.

Clear recommendations

Importantly, our review goes beyond a simple discussion of these problems with our SDM-data. We made a point of creating clear, hands-on suggestions on how to deal with these issues, every step along the way. These suggestions are summarized in the figure below.

With that, we hope this review can become a helpful guide for anyone working in the amazing but treacherous world of species distribution modelling. With our review in hand, the data should not play further unexpected tricks on you!

Read the whole review and its recommendations here in Ecography.

This entry was posted in Science and tagged , , , , , . Bookmark the permalink.

Leave a comment