Results 1 -
4 of
4
wEBMT: Developing and Validating an Example-Based Machine Translation System using the World Wide Web
- COMPUTATIONAL LINGUISTICS
, 2003
"... ..."
A controlled-corpus experiment in authorship identification by cross-entropy
- Literary and Linguistic Computing
, 2003
"... Abstract. This paper describes an authorship, and more generally document classification, experiment on a preexisting Dutch corpus of university writings. By measuring linguistic distances using a cross-entropy technique, a technique sensitive not only to the distributions of language features, but ..."
Abstract
-
Cited by 15 (3 self)
- Add to MetaCart
Abstract. This paper describes an authorship, and more generally document classification, experiment on a preexisting Dutch corpus of university writings. By measuring linguistic distances using a cross-entropy technique, a technique sensitive not only to the distributions of language features, but also to their relative intersequencing, classification judgments can be made with great sensitivity, significance, confidence, and accuracy. In particular, despite the designed difficulty of the Dutch corpus used, the technique was still able to reliably detect not only authorship, but also subtle features of register, topic, and even the educational attainments of the author. We present evidence suggesting that this technique outperforms more well-known techniques such as function word principal components analysis or linear discriminant analysis, as well as suggest ways in which performance can be improved.
Learning to Translate: A Psycholinguistic Approach to the Induction of Grammars and Transfer Functions
, 1995
"... dentified many constraints on the form and processing of human languages. By incorporating these constraints into a language learning system, it is possible to build a system that learns to translate (infers functions and grammars for machine translation) from an aligned bilingual corpus of sentence ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
dentified many constraints on the form and processing of human languages. By incorporating these constraints into a language learning system, it is possible to build a system that learns to translate (infers functions and grammars for machine translation) from an aligned bilingual corpus of sentences using understandable, symbolic linguistic principles and representations. This work focuses on one particular constraint, the Marker Hypothesis, which is shown to be powerful, understandable, and computationally accessible. This hypothesis has been incorporated into a family of systems that infer such transfer functions using standard multivariate optimization techniques. These systems have been tested on a variety of language pairs and corpora, demonstrating the language and corpus independence of this approach. Furthermore, the design iv principles are in theory independent of any particular inference technique or grammatical representation and reflect only the constraints of the Marke

