Results 1 -
5 of
5
Evaluation Techniques for Automatic Semantic Extraction: Comparing Syntactic and Window Based Approaches
, 1993
"... As large on-line corpora become more prewlent, a number of attempts have been made to automatically extract thesaurus-like relations directly from text using knowledge poor methods. In the absence of any specific application, comparing the results of these attempts is difficult. Here we propose an e ..."
Abstract
-
Cited by 42 (0 self)
- Add to MetaCart
As large on-line corpora become more prewlent, a number of attempts have been made to automatically extract thesaurus-like relations directly from text using knowledge poor methods. In the absence of any specific application, comparing the results of these attempts is difficult. Here we propose an ewluation method using gold standards, i.e., pre-existing hand-compiled resources, as a means of comparing extraction techniques. Using this ewluation method, we compare two semantic extraction techniques which produce similar word lists, one using syntactic context of words , and the'other using windows of heuristiclly tagged words. The two techniques are very similar except that in one case selective natural language processing, a partial syntactic analysis, is performed. On a 4 megabyte corpus, syntactic contexts produce significantly better results against the gold standards for the most characteristic words in the corpus, while windows produce better results for rare words.
Complementing WordNet with Roget's and Corpus-based Thesauri for Information Retrieval
- EACL'99
, 1999
"... This paper proposes a method to overcome the drawbacks of WordNet when applied to information retrieval by complementing it with Roget's thesaurus and corpus-derived thesauri. Words and relations which axe not included in WordNet can be found in the corpus-derived thesauri. Words and ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
This paper proposes a method to overcome the drawbacks of WordNet when applied to information retrieval by complementing it with Roget's thesaurus and corpus-derived thesauri. Words and relations which axe not included in WordNet can be found in the corpus-derived thesauri. Words and
Re-thinking bargaining theory
- Jour. of Natural Language Processing
, 1997
"... This paper proposes the use of multiple thesaurus types for query expansion in information retrieval. Hand-crafted thesaurus, corpus-based co-occurrence-based thesaurus and syntactic-relation-based thesaurus are combined and used as a tool for query expansion. A simple word sense disambiguation is p ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
This paper proposes the use of multiple thesaurus types for query expansion in information retrieval. Hand-crafted thesaurus, corpus-based co-occurrence-based thesaurus and syntactic-relation-based thesaurus are combined and used as a tool for query expansion. A simple word sense disambiguation is performed to avoid misleading expansion terms. Experiments using TREC-7 collection proved that this method could improve the information retrieval performance significantly. Failure analysis was done on the cases in which the proposed method fail to improve the retrieval effectiveness. We found that queries containing negative statements and multiple aspects might cause problems in the proposed method.
A Semantic Graph Model for Text Representation and Matching in Document Mining
, 2006
"... I hereby declare that I am the sole author of this thesis. This is a true copy of the thesis, including any required final revisions, as accepted by my examiners. I understand that my thesis may be made electronically available to the public. ii The explosive growth in the number of documents produc ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
I hereby declare that I am the sole author of this thesis. This is a true copy of the thesis, including any required final revisions, as accepted by my examiners. I understand that my thesis may be made electronically available to the public. ii The explosive growth in the number of documents produced daily necessitates the development of effective alternatives to explore, analyze, and discover knowledge from documents. Document mining research work has emerged to devise automated means to discover and analyze useful information from documents. This work has been mainly concerned with constructing text representation models, developing distance measures to estimate similarities between documents, and utilizing that in mining processes such as document clustering, document classification, information retrieval, information filtering, and information extraction. Conventional text representation methodologies consider documents as bags of words and ignore the meanings and ideas their authors want to convey. It is this
The Exploration and Analysis of Using Multiple . . .
- JOUR. OF NATURAL LANGUAGE PROCESSING
, 2000
"... This paper proposes the use of multiple thesaurus types for query expansion in information retrieval. Hand-crafted thesaurus, corpus-based co-occurrence-based thesaurus and syntactic-relation-based thesaurus are combined and used as a tool for query expansion. A simple word sense disambiguation is p ..."
Abstract
- Add to MetaCart
This paper proposes the use of multiple thesaurus types for query expansion in information retrieval. Hand-crafted thesaurus, corpus-based co-occurrence-based thesaurus and syntactic-relation-based thesaurus are combined and used as a tool for query expansion. A simple word sense disambiguation is performed to avoid misleading expansion terms. Experiments using TREC-7 collection proved that this method could improve the information retrieval performance significantly. Failure analysis was done on the cases in which the proposed method fail to improve the retrieval effectiveness. We found that queries containing negative statements and multiple aspects might cause problems in the proposed method.

