biowww.net
Home /Forums /Molecular /Cell /Genetics /Proteomics /Neuroscience /Immunology /Bioinformatics /Histology /Pharmacology /Jobs /Books /Journals /Blog /Methods /Buffer
Search biowww:

Technique / Bioinformatics / Data mining


BioRAT: extracting biological information from full-length papers



BioRAT: extracting biological information from full-length papers

A free software for text mining by Corney DP et al., University of college, UK. A large and growing amount of information is published every year in scientific journals. In the biomedical field, there are thousands of journals, each publishing many issues annually, any one of which could contain information that is relevant to a researcher. However, no one has time to read every paper. One solution is to use a software tool to search through publications to identify important results in response to the users' queries.

Reference:

BioRAT: extracting biological information from full-length papers.

Corney DP, Buxton BF, Langdon WB, Jones DT.

Bioinformatics Unit, Department of Computer Science, University College London, Gower Street, London, WC1E 6BT, UK. d.corney@cs.ucl.ac.uk

MOTIVATION: Converting the vast quantity of free-format text found in journals into a concise, structured format makes the researcher's quest for information easier. Recently, several information extraction systems have been developed that attempt to simplify the retrieval and analysis of biological and medical data. Most of this work has used the abstract alone, owing to the convenience of access and the quality of data. Abstracts are generally available through central collections with easy direct access (e.g. PubMed). The full-text papers contain more information, but are distributed across many locations (e.g. publishers' web sites, journal web sites and local repositories), making access more difficult. In this paper, we present BioRAT, a new information extraction (IE) tool, specifically designed to perform biomedical IE, and which is able to locate and analyse both abstracts and full-length papers. BioRAT is a Biological Research Assistant for Text mining, and incorporates a document search ability with domain-specific IE. RESULTS: We show first, that BioRAT performs as well as existing systems, when applied to abstracts; and second, that significantly more information is available to BioRAT through the full-length papers than via the abstracts alone. Typically, less than half of the available information is extracted from the abstract, with the majority coming from the body of each paper. Overall, BioRAT recalled 20.31% of the target facts from the abstracts with 55.07% precision, and achieved 43.6% recall with 51.25% precision on full-length papers.

Publication Types:
Evaluation Studies

PMID: 15231534 [PubMed - indexed for MEDLINE]

Last update 27-Feb-2005, Rating n/a of 0 votes.


Write your comment


Your Name
Your Email
Your Comment
Your Rating
Related resource
Genepredictions database for protein function prediction


Data mining in bioinformatics using Weka


Dragon TF Association Miner


FigSearch: a figure legend indexing and classification system


BITOLA - Biomedical Discovery Support System


MILANO - Microarray Literature-based Annotation


iProLINK: integrated Protein Literature, INformation and Knowledge


Data Mining and Text Mining for Bioinformatics European Workshop 2003