The paper in Nature Microbiology is here: http://go.nature.com/2AhASzB
It is well known that we live in a microbial world – with microbes all over our bodies, in our homes, in the air we breath, and in the ground we walk on. In the soil, these bacteria, fungi and other eukaryotes help plants grow, cycle water and important nutrients, and keep our ecosystems functioning. Researchers all over the world are using high-throughput sequencing to study these important microbes, but keeping the corresponding data catalogued and organized is a challenge - especially if we are to use it to respond to questions of global change. In our recent study, we tackle this challenge by bringing together disparate soil bacterial datasets from over 1900 soil samples collected from 21 countries spread across the world.
The origins of our “mega-meta-analysis” as we came to call it, were not so straightforward and were truly a result of multifaceted collaboration. The idea first began during my non-traditional postdoc position as the Executive Director of the Global Soil Biodiversity Initiative. Led by Prof. Diana Wall (head of the GSBI), I traveled around the world giving a voice to the importance of soil biodiversity for sustainability of ecosystem services. Along the way we realized that global efforts to keep track of soil biodiversity data were lacking and there was no framework to synthesize this data. Thus I applied to the German Synthesis Centre for Biodiversity Sciences (sDIV) in Leipzig, Germany, and was granted the opportunity to organize sOILDIV, a workshop to begin building a framework to improve our understanding of the distribution of global soil biodiversity. While we made progress, after the meeting many questions remained - specifically, how to handle the enormous amount of sequence data. Thus my colleague and co-author Dr. Franciska de Vries picked up the torch and organized a second workshop, this time funded by the British Ecological Society.
Bringing us to May 2015. In sunny, windy, rainy, Manchester, UK, we met with over 20 scientists and more than 40 sequence data sets to figure out if we could answer meaningful ecological questions from merged disparate sequence data? We found what many who have done a meta-analyses have found: raw data are tough to track down. However, Rob Griffiths of the Centre of Ecology and Hydrology, UK spoke up and suggested that instead of using raw sequence data we try to use the unconventional approach of combining data using taxonomy names in the original taxa abundance tables (aka OTU tables).
Enter co-lead author Dr. Chris Knight. Using his expertise in quantitative techniques and computational models Chris combined the OTU tables and, using machine learning models (‘random forests’), began searching for the particular groups of bacteria (taxa) that structured the community, in other words, the taxa most relevant to indicating changes in the soil. We found that some bacteria always show up in soil, no matter where it was collected from. But other bacteria are more picky, and those species are the ones we should pay attention to. These are the bacteria that characterise different sorts of soil and may be able to tell you when your soil is in trouble, or indicate change.
The process for identifying such taxa is not straightforward. When different researchers study different soils in different places around the world, the unifying question is what are the differences in the bacterial communities among the soils? But that’s not the only thing that changes among separate studies, each researcher uses different methodology and technique that influence that data, and make it difficult to extract patterns and answer questions. We wanted to get at the real patterns in community structure going on across all the different soils in different studies, while accounting for study differences. Our approach allows that, and provides important insight in how communities differ across soils, as well as a way of unifying these data. We found that while it is sometimes the large, high-level groupings of bacteria (‘Phyla’) that most obviously differ among studies, it is often lower level groupings (‘Orders’, for instance) that are most important in the community structure of particular soils. So an important outcome of this work was an approach that gave a picture of interesting biological patterns, despite all kinds of technical differences among studies.
This study provides a roadmap for bringing together and analysing microbial sequence data that has been collected by different people using different methods. It also gives insight into the bacteria that make a soil a soil, and the bacteria that might give us clues about how our soils are functioning and responding to global change. But, in addition to these important scientific findings, this study also demonstrates the power of collaborations because researchers who normally aren’t in a project together.
Here is a look back at the fun of working with scientists (especially Franciska):