The dark secrets of dark diversity

Dark diversity. A term that sounds sufficiently dramatic to catch the attention of many an ecologist. But it’s a good theory as well to explore: instead of the common ‘diversity’, which looks at the diversity of species/genes/traits present at a certain location or in a certain region, dark diversity focuses on those species that are NOT there. Even more importantly, it focuses on those species that are not there, but SHOULD have BEEN there. The hidden masses, the forgotten ones, those that have been lost.

The field of dark diversity tries to explain why certain species are not where they should be, in the hopes that this can give us a better understanding of community dynamics, and risks for biodiversity loss.

Intrigued? We were too, so we decided to estimate the dark diversity of our study region in northern Scandinavia, in the area surrounding the Abisko Research Station, as part of DarkDivNet, the dark diversity network. That work – led by master’s student Lore Hostens – now got published (here)!

Which plant species are missing here and why? We put the concept of dark diversity to the test in our favourite research arena in the north of the Scandes mountains

We got to work by monitoring the species that were present, and then scanned the literature for different methods to estimate dark diversity based on those. That’s when things starting to go dark (pun intended). Indeed, there were important decisions to take: which method to choose? There were a whole bunch available, many of them with several additions, adjustments, or nuances around them.

Would it matter which method we chose? Now, YES, it would! Soon enough, this question grew into our main research question: how much did the outcome differ between the different dark diversity estimation methods? Would conclusions still hold when switching from method A to B? Brace yourselves, as we are entering the muddy terrain of incomparable indices.

Schematic overview of three of the main approaches used to estimate dark diversity from the habitat-specific species pool (SP; the sum of species that are present with those that should have been present). (a) Theoretical concept of dark diversity, where the dark diversity is the non-observed set of species in a certain location, after filtering the regional species pool based on abiotic, dispersal and biotic interaction limitations. (b) Dark diversity is calculated using climatic filtering of the regional species pool (e.g., using climatic niche models to estimate which species could occur at a certain location). (c) Commonly used co-occurrence-based methods, which integrate both abiotic and interaction filters, yet don’t always include the dispersal filter, as assessments of dark diversity of a species in a plot are independent of distance to its source population.

Basically, there are (at least) three main methods to estimate dark diversity, depending on what filters one incorporates to estimate which species should and should not be present at a site. These are summarized in the figure above. Theoretically, one would exclude all species that 1) cannot occur there due to a mismatch in their environmental niche (too cold, too warm, too acid…), 2) cannot occur there because they can’t reach the place (too far away), and 3) those that are outcompeted by other species at the site (too weak, or incompatible). Unfortunately, every method has a different way of dealing with those filters.

So, many of these dark diversity estimates are theoretically substantially different. What is more, even when choosing a certain path, there are still a myriad of decisions one has to take that could affect the outcome. It should thus come as no surprise that in our case study, conclusions were entirely overturned by switching from one method to the other.

Variation in dark diversity as explained by different habitat-specific parameters, and analyzed with four different dark-diversity estimation methods. If one squints, one could assume that elevation was on average the dominant driver, but the main take-home message is clearly that every method gives an entirely different result.

The why and how of these differences are explained in detail in the paper, which you can find here.

In this story, I just want to ensure that you take one lesson home from this, and that’s the following: the concept of dark diversity is very intriguing, and intuitively makes a lot of sense, but be extremely wary of the methodological decisions that underlie it. If you use just one method to estimate your dark diversity, you might be building your conclusions entirely on loose sand. What’s more, I wouldn’t be surprised if these cautionary words could not be expanded to many other concepts in ecology that are supported by mathematical indices. Your index only tells you exactly what it measures, nothing more and nothing less, and the decisions you make along the way can have fundamental effects on your outcomes. So please, stay wary of the mathematics underneath your ecology, as they are more important than you might wish!

Note: we are no statisticians. We are humble ecologists enthusiastically coming into a problem and ending up in the mud, and just want to warn others not to get stuck!

Posted in Science, Sweden | Tagged , , , , , , , | Leave a comment

How to get the word out?

Twitter/X, Mastodon or LinkedIn, what works best for communicating scientific findings? I got the numbers for you!

This website is almost ten years old, and still going strong. I am far from done with its main goal, which is the communication of our – I dare say important – scientific findings to a broad audience.

Now, however, for a long time, I have been relying for a significant part on Twitter to get the word out about new stories on this website. Twitter was, at least for scientific collaborators in the broad sense of the term, the best way to reach them.

For a while now, Twitter has gotten into some unwanted turmoil, and many colleagues have jumped the sinking ship. I haven’t jumped yet – not even after its renaming to the ominous ‘X’. Perhaps selfish of me, but I did not want to get off before I found an alternative way of getting the word out. A scientific publication that nobody knows about could better not have been written.

I have kept an eye out for alternatives, and tried a few, just to see what could work. Most promising, initially, was Mastodon, a network that in the days of the ‘free fall’ of Twitter was louded as its best alternative. Unfortunately, although there was a lot of buzz around it, I didn’t really find ‘my people’ there, as reflected in my poor following (see Table below): on Twitter I reach over 2000 people, on Mastodon I have only 111.

Table summarizing the likes and impressions a series of different posts got on Twitter/X, LinkedIn and Mastodon, as well as the number of followers I have on each platform (on the right). The table focusses on a selection of different topics, all linking to a blogpost on http://www.the3dlab.org, and all posted in a similar manner (yet with slightly different messages) on each of the three platforms. The ‘best performance’ for each post has bee underlined. Note that Mastodon doesn’t show the Impressions (and per default hides the likes) to avoid exactly what I’m trying to do here…

Much later, I realized that the best alternative might actually already exist, as many of these new ‘start-up’ social media platforms just don’t find the momentum they need. An existing platform that I had long been overlooking was LinkedIn. While it is mostly seen as an online CV, it also has a community, and a good space for posting updates like this one.

I had been using LinkedIn passively for years, but decided to ramp up my activity on there. Not that much later, I already had my following to 430. Interestingly, while on Twitter/X most of that following consisted of ecologists, on LinkedIn I was connected to the whole spectrum of professional society, largely thanks to the many people I met along the way all the way back from primary school, and through our citizen science projects.

So, it is time for numbers now! For a few months, I decided to promote each and every story on http://www.the3dlab.org on all three of these websites in the same way, and collect the numbers. The output is in that table above, for a various set of posts, ranging from an introduction of myself, a story with pictures from the field, a fascinating microclimate paper, the start of our citizen science project on sound, and the call for data to our growing SoilTemp database.

The conclusion is clear: both for interactions (here I looked at Likes) and impressions, X is still overpowering the others, for almost any kind of content that I want to bring. Despite my efforts, Mastodon has remained a social desert: my posts simply don’t reach anyone who cares (note that Mastodon isn’t showing me the impressions, but even if lots of people would have seen the posts, they did not interact with them).

LinkedIn is a decent second, however. What’s interesting: for the launch of that new project, which is largely in a new scientific field for me and thus less interesting to the ‘crowd’ I have build up on Twitter, LinkedIn even worked better! In general, LinkedIn is at least not that much of a wasteland as Mastodon is.

So, what to conclude:

  1. Although everyone has been saying Twitter is dead and buried, it’s still the best place to reach a wide audience for me. Some of my most viral tweets even happened after Twitter’s informal burier (not in the table above).
  2. LinkedIn is the only option of many that actually has given me the idea that my story gets heard, and picked up by relevant people, even outside of academia. That makes it to me a more exciting platform now than Twitter/X.
  3. Getting a new social media platform off the ground is super hard, even when everyone agrees that they want to get rid of the old establishment. Mastodon is simply not worth the effort for me, and I will abandon it again.
  4. What about Facebook/Instagram/Tiktok? Facebook is too much focused on personal news for me to regularly post on my scientific findings – it bores the audience there too quickly. Instagram and Tiktok are no platforms for sharing links to this blog, they would require me to rework my communication strategy too drastically.
  5. So please, everyone, join me on LinkedIn! I think we could make it into one of the most interesting platforms for science communication, if you all join in!
Posted in General | Tagged , , | 4 Comments

Where are they NOT?

We ecologists and biogeographers all want to know so badly where species are and are not living. This quest lies at the very heart of our discipline, as it provides invaluable insights into how global changes are impacting biodiversity across the planet. For this quest, we are relying on a vast array of models collectively known as ‘Habitat Suitability Models’ (HSM), which serve as our guiding compass in predicting exactly that.

Now, while there are countless ways to improve (or screw up) those models, their efficacy ultimately hinges on the quality of data we input. This, in itself, presents its own set of challenges. Here in this story, we delve into one pivotal problem concerning this data, in light of a new paper (Da Re et al. 2023) that just came out in Methods in Ecology & Evolution (MEE).

The crux of the matter lies in the fact that it is considerably more straightforward to determine where a species currently resides than to pinpoint where it does not. Many of the easiest methods for recording species observations, such as the popular iNaturalist app, primarily furnish information about where species are found.

However, the shadowy realm of where a species is absent poses a greater challenge. To ascertain the areas devoid of a particular species, more intricate monitoring techniques become necessary. These techniques often involve the establishment of vegetation monitoring plots, which allow scientists to systematically survey an area and deduce the absence of the species of interest. Nevertheless, even with these more labour-intensive tools, certainty in declaring the absence of a species can remain elusive – but that’s a separate story in its own right.

Obtaining presence-absence data is much more labor-intensive than presence-only data, as you have to ensure you have looked everywhere. Picture: vegetation monitoring plot in northern Sweden

Distribution models need ‘absences’ to run, however. Thus, in situations where actual absence data from the field is scarce, a common practice is to generate what are known as “pseudo-absences.” Essentially, this entails selecting a set of locations where a species was not observed and treating them as surrogate absence points. However, the pivotal aspect we address in this story today is that the method used to choose these pseudo-absences can significantly impact the quality of your model.

In our recent paper featured in MEE, we introduce a new way to select these pseudo-absences: not just randomly in space. Instead of a haphazard geographical selection, our method, termed the ‘uniform’ sampling approach, strategically identifies absence points in the environmental space.Why? The rationale behind the approach lies in the fact that HSMs explicitly link species observations to environmental conditions (e.g., climate) to predict where a species can and cannot be. Importantly, these environmental variables often exhibit a non-random distribution across the landscape.

The Uniform approach in action, shown here for a ‘virtual’ species, generated for testing (a). We created a PCA of all (or a random sample of all) points in the environmental space (b), and used a kernel around the presences to delineate the environmental space in which te species was present (c). Then we uniformly sampled absences outside that kernel by sampling points within each grid cell of the PCA (d). The result was a set of points with environmental characteristics (e), as well as a physical location in the geographic space (f)

For example, let’s consider a scenario where the climate exhibits remarkable homogeneity across vast lowland areas but presents steep gradients in mountainous terrain. If one were to randomly select points in such a landscape to gather pseudo-absences, there would be a disproportionate oversampling of lowland climatic condition. Consequently, this could lead to a skewed dataset, ultimately compromising the accuracy of the resulting Habitat Suitability Models (HSMs).

Sampling the absences across the range of climatic conditions instead, as we propose here, serves as an effective remedy to this sample location bias (i.e., sampling is skewed towards the most prevalent habitats within the geographical space, as observed in the example mentioned earlier) and reduce so-called class overlap (i.e., overlap between environmental conditions associated with species presences and pseudo-absences).

Easy to say that, of course, but in that freshly published paper we (or mainly: Daniele, Enrico and Manuele, the smart minds behind the paper) put these ideas to the test. The findings resoundingly endorse our approach: the ‘uniform’ environmental sampling method significantly reduces sample location bias and class overlap without sacrificing predictive performance. As such, it ensures that we can gather pseudo-absences adequately representing the environmental conditions available across the study area.

One of several figures in the paper hammering home the message that the Uniform approach is an improvement. Here, the reduction of class overlap is shown as compared with two other sampling methods in the geographical space.

Importantly, we go further than just sharing those insights. We also provide an R-package with the essential functions to implement the Uniform sampling method in your own workfloy. So, if you find yourself grappling with the challenges posed by presence-only species observation data when fueling your models, we encourage you to explore the new ‘USE’-package to collect a fair bunch of pseudo-absences!

Posted in General | Leave a comment

The true thermal niche of forest plant species

I might have mentioned this before*, but microclimate is crucial to improve our estimates of species distributions. As species are reacting to micro- rather than macroclimate, and both are at the local scale only very weakly correlated, ignoring microclimate could give highly erroneous species distribution estimates.

Conceptual representation of why microclimate matters for species distribution modelling. The use of macro- rather than microclimate data introduces a systematic bias (bottom middle), with the actual response curve being significantly different in shape.

Now, these things are easy to say, of course, and easy to argue theoretically. It’s an other thing altogether to actually proof them with real data. There are increasingly many regional studies doing just that**, however not that many are around to say that microclimate also matters at the large scale!

There is an argument for the hypothesis that it wouldn’t matter: across a whole continent like Europe, climatic gradients are so vast, that the difference between macro- and microclimate could perhaps in theory be overwhelmed by that macroclimatic gradient. That would make using microclimate data obsolete.

Nice try, paragraph just above, but the news is out that it actually DOES matter, even at that scale! That news comes in the shape of a new paper by lab member and SoilTemp forest data cruncher Stef Haesen, just published in Ecology Letters.

Forest understory microclimate is driven by both topography and vegetation cover. Picture: a valley with bluebells in a Flemish forest in early spring

What he did was comparing the performance of species distribution models (SDMs) built with micro- and macroclimate data. That microclimate data came from ForestClim, the European-wide high-resolution gridded microclimate product of forest understory temperatures (which he ALSO made, what a hero!).

The Ecology Letters paper now elegantly shows that microclimate-based SDMs at high spatial resolution outperformed models using both macroclimate, and microclimate data at coarser resolution. Additionally, macroclimate-based models introduced a systematic bias in modelled species response curves, which could result in erroneous range shift predictions.

A bit of a funny ‘spaghetti’-plot showing how microclimate-based models outperform macroclimate or aggregated microclimate-based models (with model performance here quantified using the ‘Continuous Boyce Index’, CBI’). Spaghettis depict the performance of models for each forest species that we modelled, the black line is the average.

In practice, the macroclimate models were – as predicted – unable to identify warm and cold refugia at the range edges of species distributions, the areas were microclimate was likely to be most important.

Modelled distribution for the typical forest plant Paris quadrifolia across Europe, with the black dots being its observations, and the blue-green-yellow gradient the modelled probability of occurrence. Circular maps on the right show model predictions at the cold (top) and warm (distributional limit, where the species is occurring more in respectively warmer (top) and cooler (bottom) refugia.

These findings elegantly show that, yes, microclimate is critical for SDMs, even at the continental scale. More importantly, perhaps, is the fact that if we want to use such models to find out where to conserve biodiversity, microclimate data is even more crucial: conservation often targets species at the edge of their distribution (refugia like these identified in the paper are increasingly at the forefront of conservation), where macroclimate-based models are thus performing the worst.

Paris quadrifolia at its northernmost limit in northern Norway, where the species clearly prefers warmer microhabitats

* Just joking; I effectively mention this every week or so! Even more, already in my PhD I had a whole paper dedicated to this point!

** Yes, even I wrote a bunch, like this one!

Posted in General, Science | Tagged , , , , , | Leave a comment

The vanguard

Here and there across the city of Antwerp, curious boxes with funny black noses are starting to appear. While their presence for now remains subtle, it heralds the exciting beginning of a new impending roller coaster ride of discoveries!

These boxes are smart sound sensors, designed to measure the variety of sounds in the urban context. They are the predecessors of a large citizen science project on sound and its impact on our lives; a collaboration between University of Antwerp, the Universitary Hospital (UZA) and media partner De Morgen, and I am proud and happy to be in charge of the scientific roll-out of this project.

The sound sensor is designed by ASAsense, a Belgian company, and its a nice, smart box: it sends data over the internet in real time to our database, but it also has a smart algorithm implemented, which allows it to identify the sources of sound it hears.

We’ve embarked on a preliminary journey ahead of the grand project scheduled for later this year. Our primary goal now is to capture the diverse urban soundscape using these sensors. We aim to collect data encompassing a wide range of sounds. Our team, including a group of dedicated students, will meticulously classify these sounds. This data will then become the basis for training machine learning models within the sensors to recognize specific sound patterns in our main project.

Our first foray into the real world yesterday already garnered media attention, including a detailed article in our media partner, De Morgen, and an engaging interview on regional radio and television.

Now it’s full speed ahead to the main project coming up soon!

Posted in General | Tagged , , , , , | 2 Comments

A correction and a warning

  1. A correction

Finally, we got to publish something that was véry long overdue: the necessary correction to our ‘Global maps of soil temperature’. A correction, indeed, as we had identified an error in the analyses that had to be rectified.

So, what happened? When calculating the monthly mean temperatures of each of the in-situ temperature time series from the SoilTemp database, I accidentally shifted these microclimate time series forward with half a month by using a faulty R-code. Or, in different words: I thought I had found a smart way to summarize the data to monthly values, but I didn’t… As this coding error did not occur when computing the corresponding monthly mean temperatures from the ERA5 macroclimate data, we ended up calculating our temperature offsets with half a month of temporal mismatch. The result was that the microclimatic offsets for let’s say June were calculated using the microclimate data from half of June and half of May instead.

Such a tiny error could have pretty major implications, so the moment we discovered this, we immediately dove back into the data to rerun our analyses. We were both lucky and unlucky. First, lucky: most of the analyses in the paper were at the yearly level, and there the implications of shifting the data with two weeks were minor: the corrected mean annual soil temperature was estimated to be on average only 0.006°C higher than the original one, with a Root Mean Square Error (RMSE) between the old and new map of just 0.330°C (Corrigendum Figure 1). Consequently, all conclusions in the main text of the published paper about biome-specific patterns in mean annual temperature remained unaffected (see below for details). 

Difference between the modeled mean annual temperature in the topsoil layer (SBio1) following the corrected (new) calculation versus the original (old) calculation were fairly minor. (a) Pixel-level differences in temperature (new minus old). (b) Temperature differences (new minus old) as a function of SBio1, showing more consistent lower temperatures in cold climates following the corrected calculations. (c) Histogram of errors in mean annual temperature.

Due to the nature of the error (a half-month shift in soil temperature time series), implications for seasonal bioclimatic variables were larger, however, especially in cold environments. That’s the unlucky part, as we had made our bioclimatic variables openly available, and people were thus using erroneous maps. We made sure to rectify that as soon as possible, and updated our maps on Zenodo, where one should use ‘version 2’.

The difference between the modeled maximum temperature of the warmest month following the corrected (new) calculation versus the original (old) calculation was substantially larger than in the figure above.

The urgency was lower to update the paper, due to the minor impact on the findings, but we wanted to do that as well, so the paper came with the necessary warning associated with it. That corrigendum is now online, bringing this saga to an end.

So how to prevent such errors in the future? I don’t know… This was a paper seen by so many people, this was data and code I had shared with several others. But the error was such a minor thing that looked reasonable at first glance, and the resulting data and patterns all looked so reasonable, that it was hard to spot. I guess I can only say: be as open as you can, share your data, share your code, and let people look at it all. The error came to light after a few back and forths with the lead author of a sister paper (Haesen et al. 2021, which also got a correction), who wanted to redo some calculations using new data for a follow-up analysis, and could not reproduce my numbers. That made me rerun my own numbers, and discover the mistake.

2. A warning

So are the maps now perfect? Far from! I want to take this opportunity to highlight another example of an issue that is still in the maps. It’s less of an error, but more a limitation of our data and analysis, and one that we can only correct by rerunning the analyses with a much larger dataset.

A while ago, a data user contacted us with a question: some parts of the global map of bioclimatic variable 3 (SBIO3) seemed impossible: SBIO3 is the isothermality, which is simply put the mean diurnal range (variation within a day, SBIO2) divided by the annual range (variation over a year, SBIO7).

Due to the nature of that index, it can not go below zero, as that would mean that any of these two ranges is negative, which would suggest a higher minimum than maximum temperature. Impossible!

Bicolor map of SBIO3, highlighting in green where impossible negative values were observed.

Now, it turned out that in a few cases, especially in the tropics, SBIO3 was indeed negative (see the map, around 3% of points across the globe)! This, in turn, was the result of a few negative values in SBIO2, the diurnal range. This can occur in our models in areas with very little difference in daily minimum and maximum, such as in warm and wet regions like the tropics. There, it is most likely the result of an extrapolation of our machine learning models of the underlying variables. Indeed, we did not inform any model of the fact that SBIO2 should never be below 0, as we calculated this range simply based on the separately modelled minima and maxima. Especially in very warm and wet areas – where diurnal ranges are low – it might therefore have extrapolated beyond what is possible in reality.

Such errors are amplified by the fact that SBIO3 is a derivative variable: it is calculated based on SBIO2 and SBIO7, with SBIO2 in turn being calculated based on our modelled minima and maxima. Each layer adds another opportunity for error, with the end result being less trustworthy than the input data. What is more, the models of minima and maxima themselves are the results of in-situ measurements and environmental explanatory layers, all in turn with their own errors.

So, while global modelling has great potential, one should never forget that such assessments – as so many – have inherent errors resulting from amplified uncertainties.

So what to do? The best is that when using SBIO3, one might consider to mask out these areas with impossible values. You can mask those erroneous pixels out directly, or get rid of all areas with potential uncertainties stemming from extrapolation of the model. We provide a mask for this, called ‘PCA_int_ext_5_15cm’ in the repository. When you take a very stringent threshold of 0.95, most of the erroneous areas are masked out, including several more that might have had enough accurate measurements to allow perfect modelling. 0.95 means that the model is doing at least 5% of extrapolation outside of the environmental space covered by the data.

image.png
Areas in green on this map are extrapolating for at least 5% of the environmental predictor layers, which could potentially result in such errors as described above – or at least a higher chance for those.

 

Posted in Science | Tagged , , , , , | Leave a comment