Results 11 -
12 of
12
GrawlTCQ: Terminology and Corpora Building by Ranking Simultaneously Terms, Queries and Documents using Graph Random Walks
"... In this paper, we present GrawlTCQ, a new bootstrapping algorithm for building specialized terminology, corpora and queries, based on a graph model. We model links between documents, terms and queries, and use a random walk with restart algorithm to compute relevance propagation. We have evaluated G ..."
Abstract
- Add to MetaCart
In this paper, we present GrawlTCQ, a new bootstrapping algorithm for building specialized terminology, corpora and queries, based on a graph model. We model links between documents, terms and queries, and use a random walk with restart algorithm to compute relevance propagation. We have evaluated GrawlTCQ on an AFP English corpus of 57,441 news over 10 categories. For corpora building, GrawlTCQ outperforms the Boot-CaT tool, which is vastly used in the domain. For 1,000 documents retrieved, we improve mean precision by 25%. GrawlTCQ has also shown to be faster and more robust than Boot-CaT over iterations. 1
Reviewing and Evaluating Automatic Term Recognition Techniques
"... Abstract. Automatic Term Recognition (ATR) is defined as the task of identifying domain specific terms from technical corpora. Termhoodbased approaches measure the degree that a candidate term refers to a domain specific concept. Unithood-based approaches measure the attachment strength of a candida ..."
Abstract
- Add to MetaCart
Abstract. Automatic Term Recognition (ATR) is defined as the task of identifying domain specific terms from technical corpora. Termhoodbased approaches measure the degree that a candidate term refers to a domain specific concept. Unithood-based approaches measure the attachment strength of a candidate term constituents. These methods have been evaluated using different, often incompatible evaluation schemes and datasets. This paper provides an overview and a thorough evaluation of state-of-the-art ATR methods, under a common evaluation framework, i.e. corpora and evaluation method. Our contributions are two-fold: (1) We compare a number of different ATR methods, showing that termhood-based methods achieve in general superior performance. (2) We show that the number of independent occurrences of a candidate term is the most effective source for estimating term nestedness, improving ATR performance. Key words: automatic term recognition, ATR, term extraction

