Using big data to streamline early stage drug development
09 Jan 2020



We're part of ADDoPT; an academic-industrial knowledge base collaboration with the UK pharmaceutical industry to accelerate advanced digital design techniques and streamline development.


ADDOPT case study istock-pill-bottle900.jpg

(Credit: iStock)


Understanding and predicting solubility of drug compounds early in the development cycle will help the pharmaceutical industry to refine candidate selection, offering a way for molecular variants to be ranked against each other without the need for expensive experimentation. This gives greater control and accuracy, speeding up drug design. Solid-state packing - how molecules orient themselves with respect to one another within the crystal - must also be considered for digital modelling to provide an effective insight in to drug development.


Lattice energy predictions - the energy with which molecules are bound within a crystal - provide strong indications of key drug properties, with broader data sets enhanced by atomistic modelling needed to cover strong packing crystals - typically found in drug molecules. From a 2D structure alone, the Hartree Centre team are working with ADDoPT project partners using atomistic modelling, big data and machine learning to analyse a database of 60,000 known crystal structures. They have created a practical model that uses 2D structures to predict 3D crystal properties by linking molecular descriptors of each structure to lattice energy, underlining the importance of solid-state packing in solubility.


The model will ultimately streamline drug development processes by enabling researchers to understand particle and bulk powder properties digitally, refining the number of candidate compounds requiring physical laboratory testing. Pfizer have evaluated and developed the model and applied it to 1500 drug structures, showing strong correlation to crystal structure-calculated lattice energy. This approach to predictive digital design means fewer prototypes, reducing expensive material consumption and drug development timescales. By combining big data tools and adopting a cross-industry approach, researchers can continue to validate and shape the model as drug development evolves, enabling the future of digital drug design.

We’ve been working on lattice energy prediction at Pfizer for several years, more recently through ADDoPT. We can now show the power of big data in this area, which is catalysed by strong industry and academic collaboration.

Robert Docherty

This project has been both challenging - with many hours spent tackling complex problems - and rewarding, with a sense of achievement when solutions have been found. It was great to work in a large team with other researchers in academia and industry, and there’s no doubt that new collaborations will emerge in the future.

Dawn Geatches & Rebecca Mackenzie
Science and Technology Facilities Council​

At a glance

  • Drug compound solubility is important in the early stages of drug development
  • Solubility depends on molecular packing within a crystal and the crystal lattice energy
  • Combining statistical methods with atomistic calculations to build machine learning models
  • Cross-community effort to extract value from large chemical datasets
  • New multi-method modelling blueprint for exploration of many chemical properties

Download as a PDF