Colorado PROFILES, The Colorado Clinical and Translational Sciences Institute (CCTSI)
Last Name

Contact Us
If you have any questions or feedback please contact us.

Automated Literature Mining for Validation of High-Throughput Function Prediction

Collapse Biography 

Collapse Overview 
Collapse abstract
The function of millions of proteins remains unknown, and automated protein function prediction systems have a poor record of performance. We will test hypotheses about protein functional sites by validating high-throughput predictions derived from computational biology techniques through a novel automated system that will mine the literature for targeted information relevant to those predictions. The impact of our work will be to enable large-scale, validated, annotation of protein function and in turn to facilitate progress in tackling drug discovery for treatment of diseases.

High-throughput experiments and bioinformatics techniques are creating an exploding volume of data with which we hope to transcribe the genetic blueprints of life. Targeted experiments are required to validate biomedical discoveries from these sources. Fortunately, the information to confirm or refute a prediction is often already available in an existing publication and the biologist can take advantage of this supporting evidence for validation. However, the sheer volume of predictions from high throughput methods exceeds the capacity of researchers to perform even the necessary literature searches. This gap in capacity must be addressed using automated literature mining methods that perform comparably to a human expert;indeed, development of such methods is a grand challenge of modern Biology.

We will mine the full text literature to validate computational predictions of functional sites in proteins. The innovations in our approach include: (1) using computational predictions as the context for a literature search;(2) information extraction of protein functional sites from full text journal publications;(3) high-throughput text mining;and (4) using primary information in protein databases to evaluate the methods.

Understanding of protein function is a critical bottleneck in the progress of biomedical research. It is time to truly integrate the biological literature into the protein function prediction problem. By doing so, we will enable a critical advance in high-throughput protein function prediction

Collapse sponsor award id

Collapse Time 
Collapse start date
Collapse end date

Copyright © 2022 The Regents of the University of Colorado, a body corporate. All rights reserved. (Harvard PROFILES RNS software version: 2.11.1)