Identifying patterns in structural biology

Our new annotator tool pyResid will make it easier for researchers to identify relevant work in the academic literature.

Credit: Dreamstime

The pattern based annotator, pyResid​ has been developed by Robert Firth from our Data Science group and is currently being tested on submissions to Europe PubMed Central via the European Bioinformatics Institute (EBI) in Cambridge. The tool has been used on more than 30,000 papers in the existing archive and it is hoped that it will form part of the submission process for all upcoming papers.

In structural biology, researchers may only be focused on just a couple of interactions or substitutions of any one of the 22 protein creating amino acids that might be at a position on a protein structure that is several thousand residues long. This combined with an increase in the scale of academic literature in this field means that it is becoming increasingly difficult to identify relevant work.

The ability to use an API to interact with open-access full text publications as well as existing databases such as the EBI’s Protein Data Bank Europe means that when residues are found within the text, they can be matched up to the parent protein.

Although pyResid is initially a proof of concept we hope that by taking advantage of advances in natural language parsing speeds and the ability to link up existing datasets, our open source tool can be to help us gain further insight and value by identifying patterns in research.​

This work was carried out as part of the West-Life project ​in collaboration with the Science and Technology Facilities Council’s​ (STFC) Scientific Computing Department.​

Join Newsletter

Provide your details to receive regular updates from the STFC Hartree Centre.