Results 1 -
3 of
3
Lattice Based Language Models
, 1997
"... This paper introduces lattice based language models, a new language modeling paradigm. These models construct multi-dimensional hierarchies of partitions and select the most promising partitions to generate the estimated distributions. We discussed a specific two dimensional lattice and propose two ..."
Abstract
-
Cited by 10 (1 self)
- Add to MetaCart
This paper introduces lattice based language models, a new language modeling paradigm. These models construct multi-dimensional hierarchies of partitions and select the most promising partitions to generate the estimated distributions. We discussed a specific two dimensional lattice and propose two primary features to measure the usefulness of each node: the training-set history count and the smoothed entropy of its prediction. Smoothing techniques are reviewed and a generalization of the conventional backoff strategy to multiple dimensions is proposed. Preliminary experimental results are obtained on the SWITCHBOARD corpus which lead to a 6.5 % perplexity reduction over a word trigram model. Project sponsored by the National Security Agency under Grant No. MDA904-97-10006. The United States Government is authorized to reproduce and distribute reprints notwithstanding any copyright notation hereon. y Current address: D'ept. Math., Universit'e Jean Monnet, 23, rue P. Michelon, 42023 S...
Probabilistic Finite-State Machines - Part I
"... Probabilistic finite-state machines are used today in a variety of areas in pattern recognition, or in fields to which pattern recognition is linked: computational linguistics, machine learning, time series analysis, circuit testing, computational biology, speech recognition and machine translatio ..."
Abstract
-
Cited by 9 (1 self)
- Add to MetaCart
Probabilistic finite-state machines are used today in a variety of areas in pattern recognition, or in fields to which pattern recognition is linked: computational linguistics, machine learning, time series analysis, circuit testing, computational biology, speech recognition and machine translation are some of them. In part I of this paper we survey these generative objects and study their definitions and properties. In part II, we will study the relation of probabilistic finite-state automata with other well known devices that generate strings as hidden Markov models and n-grams, and provide theorems, algorithms and properties that represent a current state of the art of these objects.
Lexical Space: Learning and using continuous linguistic representations
, 1996
"... A large part of linguistic knowledge is the knowledge of a representational system for syntactic categories. Traditionally, such a system has been viewed as a system consisting of a fixed number of discrete classes with little or no internal structure. In this thesis, a new method for syntactic cate ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
A large part of linguistic knowledge is the knowledge of a representational system for syntactic categories. Traditionally, such a system has been viewed as a system consisting of a fixed number of discrete classes with little or no internal structure. In this thesis, a new method for syntactic categorization of words is proposed, that is based on the representation of syntactic similarity in a continuous space: the Lexical Space. The continuous nature of the representation offers a new possibility for representing fine-grained lexical information for syntactic processing in a similarity-based framework. Two main questions are being asked in this thesis. Is it possible to acquire a sensible organization of words in the Lexical Space on the basis of simple distributional information? And second, how can the resulting word representations be used in a natural language parser? The acquisition of a syntactic category system is often seen as a bootstrapping from semantic into syntactic cat...

