Results 1 -
4 of
4
Designing Statistical Language Learners: Experiments on Noun Compounds
, 1995
"... Statistical language learning research takes the view that many traditional natural language processing tasks can be solved by training probabilistic models of language on a sufficient volume of training data. The design of statistical language learners therefore involves answering two questions: (i ..."
Abstract
-
Cited by 65 (0 self)
- Add to MetaCart
Statistical language learning research takes the view that many traditional natural language processing tasks can be solved by training probabilistic models of language on a sufficient volume of training data. The design of statistical language learners therefore involves answering two questions: (i) Which of the multitude of possible language models will most accurately reflect the properties necessary to a given task? (ii) What will constitute a sufficient volume of training data? Regarding the first question, though a variety of successful models have been discovered, the space of possible designs remains largely unexplored. Regarding the second, exploration of the design space has so far proceeded without an adequate answer. The goal of this thesis is to advance the exploration of the statistical language learning design space. In pursuit of that goal, the thesis makes two main theoretical contributions: it identifies a new class of designs by providing a novel theory of statistical natural language processing, and it presents the foundations for a predictive theory of data requirements to assist in future design explorations. The first of these contributions is called the meaning distributions theory. This theory
Methods of Automatic Term Recognition - A Review
, 1996
"... Following the growing interest in "corpus-based" approaches to computational linguistics, a number of studies have recently appeared on the topic of automatic term recognition or extraction. Because a successful term recognition method has to be based on proper insights into the nature of terms, stu ..."
Abstract
-
Cited by 24 (1 self)
- Add to MetaCart
Following the growing interest in "corpus-based" approaches to computational linguistics, a number of studies have recently appeared on the topic of automatic term recognition or extraction. Because a successful term recognition method has to be based on proper insights into the nature of terms, studies of automatic term recognition not only contribute to the applications of computational linguistics but also to the theoretical foundation of terminology. Many studies on automatic term recognition treat interesting aspects of terms, but most of them are not well founded and described. This paper tries to give an overview of the principles and methods of automatic term recognition. For that purpose, two major trends are examined, i.e. studies in automatic recognition of significant elements for indexing mainly carried out in information retrieval circles, and current research in automatic term recognition in the field of computational linguistics. Keywords Automatic term recognition, au...
A Statistical Translation Tool With Aligned Texts
, 1996
"... "Dilemma" is a simple tool for manual translation of natural language. It is designed to help translators achieve higher quality in their work, with less effort. The data consists of previously translated texts that concerns the same domain as the text to be translated. The general idea is to extrac ..."
Abstract
- Add to MetaCart
"Dilemma" is a simple tool for manual translation of natural language. It is designed to help translators achieve higher quality in their work, with less effort. The data consists of previously translated texts that concerns the same domain as the text to be translated. The general idea is to extract terminologic and syntactic information from the previously translated texts to maintain consistency, which is often a quality criterion, in the new translations. To calculate likely translation candidates, several language independent aspects are being measured. All aspects are based on theories and assumptions about general texts in European languages. This paper describes the current Dilemma system and its capacities; concentrating on the new functionality. It also includes a discussion about possible future improvements. Introduced in this, the second version of Dilemma, are among other things a more general alignment procedure, stop list handling and checking of spelling similarity. Ev...
An Algorithm for Predicting the Relationship between Lemmas and Corpus Size
, 2000
"... Much research on natural language processing (NLP), computational linguistics and lexicography has relied and depended on linguistic corpora. In recent years, many organizations around the world have been constructing their own large corpora to achieve corpus representativeness and/or linguistic com ..."
Abstract
- Add to MetaCart
Much research on natural language processing (NLP), computational linguistics and lexicography has relied and depended on linguistic corpora. In recent years, many organizations around the world have been constructing their own large corpora to achieve corpus representativeness and/or linguistic comprehensiveness. However, there is no reliable guideline as to how large machine readable corpus resources should be compiled to develop practical NLP software and/or complete dictionaries for humans and computational use. In order to shed some new light on this issue, we shall reveal the flaws of several previous researches aiming to predict corpus size, especially those using pure regression or curve-fitting methods. To overcome these flaws, we shall contrive a new mathematical tool: a piecewise curve-fitting algorithm, and next, suggest how to determine the tolerance error of the algorithm for good prediction, using a specific corpus. Finally, we shall illustrate experimentally that the algorithm presented is valid, accurate and very reliable. We are confident that this study can contribute to solving some inherent problems of corpus linguistics, such as corpus predictability, compiling methodology, corpus representativeness and linguistic comprehensiveness.

