The pattern based annotator, pyResid has been developed by Robert Firth from our Data Science group and is currently being tested on submissions to Europe PubMed Central via the European Bioinformatics Institute (EBI) in Cambridge. The tool has been used on more than 30,000 papers in the existing archive and it is hoped that it will form part of the submission process for all upcoming papers.
In structural biology, researchers may only be focused on just a couple of interactions or substitutions of any one of the 22 protein creating amino acids that might be at a position on a protein structure that is several thousand residues long. This combined with an increase in the scale of academic literature in this field means that it is becoming increasingly difficult to identify relevant work.
The ability to use an API to interact with open-access full text publications as well as existing databases such as the EBI's Protein Data Bank Europe means that when residues are found within the text, they can be matched up to the parent protein.
Although pyResid is initially a proof of concept we hope that by taking advantage of advances in natural language parsing speeds and the ability to link up existing datasets, our open source tool can be to help us gain further insight and value by identifying patterns in research.
This work was carried out as part of the West-Life project in collaboration with the Science and Technology Facilities Council's (STFC) Scientific Computing Department.