• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Revisiting Again Document Length Hypotheses TREC-2004 Genomics Track Experiments at Patolis (0)

by Sumio FUJITA
Add To MetaCart

Tools

Sorted by:
Results 1 - 9 of 9

TREC 2004 genomics track overview

by William R. Hersh, Ravi Teja Bhuptiraju, Laura Ross, Phoebe Johnson, Aaron M. Cohen, Dale F. Kraemer - In Proc. of the 13th Text REtrieval Conference , 2004
"... tasks. The first task was a standard ad hoc retrieval task using topics obtained from real biomedical research scientists and documents from a large subset of the MEDLINE bibliographic database. The second task focused on categorization of full-text documents, simulating the task of curators of the ..."
Abstract - Cited by 32 (2 self) - Add to MetaCart
tasks. The first task was a standard ad hoc retrieval task using topics obtained from real biomedical research scientists and documents from a large subset of the MEDLINE bibliographic database. The second task focused on categorization of full-text documents, simulating the task of curators of the Mouse Genome Informatics (MGI) system and consisting of three subtasks. One subtask focused on the triage of articles likely to have experimental evidence warranting the assignment of GO terms, while the other two subtasks focused on the assignment of the three top-level GO categories. The track had 33 participating groups. 1. Motivations and Background The goal of the TREC Genomics Track is to create

An application of text categorization methods to gene ontology annotation

by Kazuhiro Seki, Javed Mostafa - Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, ACM , 2005
"... This paper describes an application of IR and text categorization methods to a highly practical problem in biomedicine, specifically, Gene Ontology (GO) annotation. GO annotation is a major activity in most model organism database projects and annotates gene functions using a controlled vocabulary. ..."
Abstract - Cited by 8 (1 self) - Add to MetaCart
This paper describes an application of IR and text categorization methods to a highly practical problem in biomedicine, specifically, Gene Ontology (GO) annotation. GO annotation is a major activity in most model organism database projects and annotates gene functions using a controlled vocabulary. As a first step toward automatic GO annotation, we aim to assign GO domain codes given a specific gene and an article in which the gene appears, which is one of the task challenges at the TREC 2004 Genomics Track. We approached the task with careful consideration of the specialized terminology and paid special attention to dealing with various forms of gene synonyms, so as to exhaustively locate the occurrences of the target gene. We extracted the words around the gene occurrences and used them to represent the gene for GO domain code annotation. As a classifier, we adopted a variant of k-Nearest Neighbor (kNN) with supervised term weighting schemes to improve the performance, making our method among the top-performing systems in the TREC official evaluation. Moreover, it is demonstrated that our proposed framework is successfully applied to another task of the Genomics Track, showing comparable results to the best performing system. Categories and Subject Descriptors H.2.4 [Database management]: Systems—Textual databases; H.3.1 [Information storage and retrieval]: Content Analysis and Indexing—Abstracting

Trec genomics special issue overview

by William Hersh, Æ Ellen Voorhees, Ó Springer - Inf. Retr
"... ..."
Abstract - Cited by 8 (0 self) - Add to MetaCart
Abstract not found

An Empirical Study of Tokenization Strategies for Biomedical Information Retrieval

by Jing Jiang, Chengxiang Zhai
"... Due to the great variation of biological names in biomedical text, appropriate tok-enization is an important preprocessing step for biomedical information retrieval. De-spite its importance, there has been little study on the evaluation of various tokeniza-tion strategies for biomedical text. In thi ..."
Abstract - Cited by 4 (0 self) - Add to MetaCart
Due to the great variation of biological names in biomedical text, appropriate tok-enization is an important preprocessing step for biomedical information retrieval. De-spite its importance, there has been little study on the evaluation of various tokeniza-tion strategies for biomedical text. In this work, we conducted a careful, systematic evaluation of a set of tokenization heuristics on all the available TREC biomedical text collections for ad hoc document retrieval, using two representative retrieval methods and a pseudo relevance feedback method. We also studied the effect of stemming and stop word removal on the retrieval performance. As expected, our experiment results show that tokenization can significantly affect the retrieval accuracy; appropriate to-kenization can improve the performance by up to 96%, measured by mean average precision (MAP). In particular, it is shown that different query types require different tokenization heuristics, stemming is effective only for certain queries, and stop word removal in general does not improve the retrieval performance on biomedical text.

Identifying relevant full-text articles for GO annotation without MeSH terms. The Thirteenth Text

by Chih Lee, Wen-juan Hou, Hsin-hsi Chen - Proceedings of the Thirteenth Text Retrieval Conference (TREC , 2004
"... Gene Ontology (GO) is a controlled vocabulary. Given a gene product, GO enables scientists to clearly and unambiguously describe specific molecular functions of the gene product, specific biological processes in which it is involved, and specific cellular components to which it is localized. In this ..."
Abstract - Cited by 3 (1 self) - Add to MetaCart
Gene Ontology (GO) is a controlled vocabulary. Given a gene product, GO enables scientists to clearly and unambiguously describe specific molecular functions of the gene product, specific biological processes in which it is involved, and specific cellular components to which it is localized. In this paper, we present our approach to identifying which papers have experimental evidence warranting annotation with GO codes. The training data set contains 375 relevant full-text articles and 5,462 irrelevant ones, and the test data set contains 420 positive full-text articles and 5,623 negative ones. We regarded this problem as a binary classification problem, and employed Support Vector Machines (SVMs) to distinguish positive articles from negative ones. Title, abstract, figure/table captions, and three standard sections – Results, Discussion, and Conclusion were the targets of feature extraction. Without incorporating MeSH (Medical Subject Headings) terms as part of the features, our system achieved 0.381 in Normalized Utility measure. 1

Enhancing access to the Bibliome: the TREC 2004 Genomics Track

by William R Hersh, Ravi Teja Bhupatiraju, Laura Ross, Phoebe Roberts, Phoebe Roberts, Aaron M Cohen, Dale F Kraemer , 2006
"... Background: The goal of the TREC Genomics Track is to improve information retrieval in the area of genomics by creating test collections that will allow researchers to improve and better understand failures of their systems. The 2004 track included an ad hoc retrieval task, simulating use of a searc ..."
Abstract - Cited by 3 (1 self) - Add to MetaCart
Background: The goal of the TREC Genomics Track is to improve information retrieval in the area of genomics by creating test collections that will allow researchers to improve and better understand failures of their systems. The 2004 track included an ad hoc retrieval task, simulating use of a search engine to obtain documents about biomedical topics. This paper describes the Genomics Track of the Text Retrieval Conference (TREC) 2004, a forum for evaluation of IR research systems, where retrieval in the genomics domain has recently begun to be assessed. Results: A total of 27 research groups submitted 47 different runs. The most effective runs, as measured by the primary evaluation measure of mean average precision (MAP), used a combination of domain-specific and general techniques. The best MAP obtained by any run was 0.4075. Techniques that expanded queries with gene name lists as well as words from related articles had the best efficacy. However, many runs performed more poorly than a simple baseline run, indicating that careful selection of system features is essential. Conclusion: Various approaches to ad hoc retrieval provide a diversity of efficacy. The TREC Genomics Track and its test collection resources provide tools that allow improvement in

text biomedical documents

by Biomed Central, William R Hersh, Aaron M Cohen, William R Hersh , 2005
"... This is an Open Access article distributed under the terms of the Creative Commons Attribution License ..."
Abstract - Add to MetaCart
This is an Open Access article distributed under the terms of the Creative Commons Attribution License

Report on the TREC 2006 Experiment: Genomics Track

by P. Ruch Ac, A. Jimeno Yepes C, F. Ehrler Ac, J. Gobeill Ab, I. Tbahriti Ab
"... In previous TREC Genomics competition, ad hoc experiments were based on MEDLINE corpora (about 4.5 millions in 2005). This year, the collection has been replaced by a collection of about 160000 full-text articles. ..."
Abstract - Add to MetaCart
In previous TREC Genomics competition, ad hoc experiments were based on MEDLINE corpora (about 4.5 millions in 2005). This year, the collection has been replaced by a collection of about 160000 full-text articles.

unknown title

by Dr. Ir. D. Hiemstra, B. G. Van Borssum Waalkes
"... Over the last 20 years genomics research has gained a lot of interest. Every year millions of articles are published and stored in databases. Researchers around the world want to be able to search for information about e.g. genes, diseases and enzymes. As of this moment there are no search methods a ..."
Abstract - Add to MetaCart
Over the last 20 years genomics research has gained a lot of interest. Every year millions of articles are published and stored in databases. Researchers around the world want to be able to search for information about e.g. genes, diseases and enzymes. As of this moment there are no search methods available that give researchers a viable and efficient way to search for information about genomics data. This Report discusses how information can be found using a desktop pc and a widely available database system. It will describe how the documents are found as well as the precision and recall of a query. With the help of several well know Information retrieval methods, such as Boolean retrieval, TF*IDF and stemming, the effects of these searching methods will be tested, and compared to each other. The effects these methods have on the overall results, of the system, will be evaluated and the system will be compared to other systems what are using the same documents and questions. After all the results have been evaluated a few hints will be given for ways to improve the system.
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University