The ability to understand the microbiological profile of soil, its biochemical properties and fertility is crucial to address global challenges including food security and climate change. Soils are the most complex biological systems on Earth but understanding their microbiomes in detail requires analysis of huge amounts of data. This is a computationally demanding exercise that needs skills in data analytics to provide real insight. Rothamsted Research needed to speed up their metagenomics data processing capability and identify methods that could provide new insight to help understand soil health.
The IBM Research team developed highly-parallelised workflows suitable for processing large scale metagenomics datasets on the Hartree Centre’s high performance computing (HPC) systems. Using their expertise in data science, the team devised methods to distribute datasets, ultimately reducing the run time from days to hours. By further developing APIs – application programming interfaces – to mine a range of diverse public databases, the team created a rich, integrated knowledgebase used to develop machine learning models that were capable of offering a detailed picture of biochemical activities across soil samples and their connection with overall soil health.
This work provided a HPC-enabled solution to the data processing bottleneck experienced in soil metagenomics. The computational and data analytics capabilities of the Hartree Centre complemented the expertise of Rothamsted Research to create a highly-integrated and effective collaboration, providing novel insights towards the understanding of soil health. These tools - developed as part of the Innovation Return on Research (IROR) programme - can now be applied to processing any microbiome datasets, transferable across a
broad range of industry challenges.
"Working with the Hartree Centre has been extremely positive, allowing us to broaden our horizons - imagining more complex, larger studies and helping us establish world-leading research in areas critical to food security and global climate."
"Working within a small, multidisciplinary team allowed us to be agile, trying out several different approaches to tackle problems ranging from data analysis to visualisation. This meant we were able to choose the approach that met our requirements best."
At a glance
- Were able to processes large datasets on high performance computing platforms, reducing run time from days to hours
- Machine learning methods provided a detailed view of soil properties by analysing the microbiome
- Led to a better understanding of factors affecting soil health
- Resulted in a set of transferable tools that can be applied to any microbiome datasets