Results 1  10
of
429
Generation and Synchronous TreeAdjoining Grammars
, 1990
"... Treeadjoining grammars (TAG) have been proposed as a formalism for generation based on the intuition that the extended domain of syntactic locality that TAGs provide should aid in localizing semantic dependencies as well, in turn serving as an aid to generation from semantic representations. We dem ..."
Abstract

Cited by 741 (42 self)
 Add to MetaCart
(Show Context)
Treeadjoining grammars (TAG) have been proposed as a formalism for generation based on the intuition that the extended domain of syntactic locality that TAGs provide should aid in localizing semantic dependencies as well, in turn serving as an aid to generation from semantic representations. We demonstrate that this intuition can be made concrete by using the formalism of synchronous treeadjoining grammars. The use of synchronous TAGs for generation provides solutions to several problems with previous approaches to TAG generation. Furthermore, the semantic monotonicity requirement previously advocated for generation gram mars as a computational aid is seen to be an inherent property of synchronous TAGs.
A maximum likelihood approach to continuous speech recognition
 IEEE Trans. Pattern Anal. Machine Intell
, 1983
"... AbstractSpeech recognition is formulated as a problem of maximum likelihood decoding. This formulation requires statistical models of the speech production process. In this paper, we describe a number of statistical models for use in speech recognition. We give special attention to determining the ..."
Abstract

Cited by 448 (9 self)
 Add to MetaCart
AbstractSpeech recognition is formulated as a problem of maximum likelihood decoding. This formulation requires statistical models of the speech production process. In this paper, we describe a number of statistical models for use in speech recognition. We give special attention to determining the parameters for such models from sparse data. We also describe two decoding methods, one appropriate for constrained artificial languages and one appropriate for more realistic decoding tasks. To illustrate the usefulness of the methods described, we review a number of decoding results that have been obtained with them. Index TermsMarkov models, maximum likelihood, parameter estimation, speech recognition, statistical models. I.
A Maximum Entropy Approach to Adaptive Statistical Language Modeling
 Computer, Speech and Language
, 1996
"... An adaptive statistical languagemodel is described, which successfullyintegrates long distancelinguistic information with other knowledge sources. Most existing statistical language models exploit only the immediate history of a text. To extract information from further back in the document's h ..."
Abstract

Cited by 280 (12 self)
 Add to MetaCart
An adaptive statistical languagemodel is described, which successfullyintegrates long distancelinguistic information with other knowledge sources. Most existing statistical language models exploit only the immediate history of a text. To extract information from further back in the document's history, we propose and use trigger pairs as the basic information bearing elements. This allows the model to adapt its expectations to the topic of discourse. Next, statistical evidence from multiple sources must be combined. Traditionally, linear interpolation and its variants have been used, but these are shown here to be seriously deficient. Instead, we apply the principle of Maximum Entropy (ME). Each information source gives rise to a set of constraints, to be imposed on the combined estimate. The intersection of these constraints is the set of probability functions which are consistent with all the information sources. The function with the highest entropy within that set is the ME solution...
How to Timestamp a Digital Document
 Journal of Cryptology
, 1991
"... The prospect of a world in which all text, audio, picture, and video documents are in digital form on easily modifiable media raises the issue of how to certify when a document was created or last changed. The problem is to timestamp the data, not the medium. We propose computationally practical ..."
Abstract

Cited by 243 (5 self)
 Add to MetaCart
(Show Context)
The prospect of a world in which all text, audio, picture, and video documents are in digital form on easily modifiable media raises the issue of how to certify when a document was created or last changed. The problem is to timestamp the data, not the medium. We propose computationally practical procedures for digital timestamping of such documents so that it is infeasible for a user either to backdate or to forwarddate his document, even with the collusion of a timestamping service. Our procedures maintain complete privacy of the documents themselves, and require no recordkeeping by the timestamping service. Appeared, with minor editorial changes, in Journal of Cryptology, Vol. 3, No. 2, pp. 99111, 1991. 0 Time's glory is to calm contending kings, To unmask falsehood, and bring truth to light, To stamp the seal of time in aged things, To wake the morn, and sentinel the night, To wrong the wronger till he render right. The Rape of Lucrece, l. 941 1 Introduction ...
The Power of Amnesia: Learning Probabilistic Automata with Variable Memory Length
 Machine Learning
, 1996
"... . We propose and analyze a distribution learning algorithm for variable memory length Markov processes. These processes can be described by a subclass of probabilistic finite automata which we name Probabilistic Suffix Automata (PSA). Though hardness results are known for learning distributions gene ..."
Abstract

Cited by 208 (17 self)
 Add to MetaCart
(Show Context)
. We propose and analyze a distribution learning algorithm for variable memory length Markov processes. These processes can be described by a subclass of probabilistic finite automata which we name Probabilistic Suffix Automata (PSA). Though hardness results are known for learning distributions generated by general probabilistic automata, we prove that the algorithm we present can efficiently learn distributions generated by PSAs. In particular, we show that for any target PSA, the KLdivergence between the distribution generated by the target and the distribution generated by the hypothesis the learning algorithm outputs, can be made small with high confidence in polynomial time and sample complexity. The learning algorithm is motivated by applications in humanmachine interaction. Here we present two applications of the algorithm. In the first one we apply the algorithm in order to construct a model of the English language, and use this model to correct corrupted text. In the second ...
Natural language processing (almost) from scratch. arXiv:1103.0398v1
, 2011
"... We propose a unified neural network architecture and learning algorithm that can be applied to various natural language processing tasks including partofspeech tagging, chunking, named entity recognition, and semantic role labeling. This versatility is achieved by trying to avoid taskspecific eng ..."
Abstract

Cited by 193 (15 self)
 Add to MetaCart
We propose a unified neural network architecture and learning algorithm that can be applied to various natural language processing tasks including partofspeech tagging, chunking, named entity recognition, and semantic role labeling. This versatility is achieved by trying to avoid taskspecific engineering and therefore disregarding a lot of prior knowledge. Instead of exploiting manmade input features carefully optimized for each task, our system learns internal representations on the basis of vast amounts of mostly unlabeled training data. This work is then used as a basis for building a freely available tagging system with good performance and minimal computational requirements.
Two decades of statistical language modeling: Where do we go from here
 Proceedings of the IEEE
, 2000
"... Statistical Language Models estimate the distribution of various natural language phenomena for the purpose of speech recognition and other language technologies. Since the first significant model was proposed in 1980, many attempts have been made to improve the state of the art. We review them here ..."
Abstract

Cited by 192 (1 self)
 Add to MetaCart
(Show Context)
Statistical Language Models estimate the distribution of various natural language phenomena for the purpose of speech recognition and other language technologies. Since the first significant model was proposed in 1980, many attempts have been made to improve the state of the art. We review them here, point to a few promising directions, and argue for a Bayesian approach to integration of linguistic theories with data. 1. OUTLINE Statistical language modeling (SLM) is the attempt to capture regularities of natural language for the purpose of improving the performance of various natural language applications. By and large, statistical language modeling amounts to estimating the probability distribution of various linguistic units, such as words, sentences, and whole documents. Statistical language modeling is crucial for a large variety of language technology applications. These include speech recognition (where SLM got its start), machine translation, document classification and routing, optical character recognition, information retrieval, handwriting recognition, spelling correction, and many more. In machine translation, for example, purely statistical approaches have been introduced in [1]. But even researchers using rulebased approaches have found it beneficial to introduce some elements of SLM and statistical estimation [2]. In information retrieval, a language modeling approach was recently proposed by [3], and a statistical/information theoretical approach was developed by [4]. SLM employs statistical estimation techniques using language training data, that is, text. Because of the categorical nature of language, and the large vocabularies people naturally use, statistical techniques must estimate a large number of parameters, and consequently depend critically on the availability of large amounts of training data.
Grammatical Category Disambiguation by Statistical Optimization
 COMPUTATIONAL LINGUISTICS
, 1988
"... [This paper focuses on the]... task of [partofspeech] disambiguation, and particularly on a new algorithm called VOLSUNGA, which avoids syntacticlevel analysis, yields about 96% accuracy, and runs in far less time and space than previous attempts. The most recent previous algorithm runs in NP (No ..."
Abstract

Cited by 179 (0 self)
 Add to MetaCart
[This paper focuses on the]... task of [partofspeech] disambiguation, and particularly on a new algorithm called VOLSUNGA, which avoids syntacticlevel analysis, yields about 96% accuracy, and runs in far less time and space than previous attempts. The most recent previous algorithm runs in NP (NonPolynomial) time, while VOLSUNGA runs in linear time. This is provably optimal; no improvements in the order of its execution time and space are possible. VOLSUNGA is also robust in cases of ungrammaticality. Improvements to this accuracy may be made, perhaps the most potentially significant being to include some higherlevel information. With such additions, the accuracy of statisticallybased algorithms will approach 100%; and the few remaining cases may be largely those with which humans also find difficulty. In subsequent sections we examine several disambiguation algorithms. Their techniques, accuracies, and efficiencies are analyzed. After presenting the research carried out to date, a discussion of VOLSUNGA's application to the Brown Corpus...
Universal prediction
 IEEE TRANSACTIONS ON INFORMATION THEORY
, 1998
"... This paper consists of an overview on universal prediction from an informationtheoretic perspective. Special attention is given to the notion of probability assignment under the selfinformation loss function, which is directly related to the theory of universal data compression. Both the probabili ..."
Abstract

Cited by 161 (14 self)
 Add to MetaCart
This paper consists of an overview on universal prediction from an informationtheoretic perspective. Special attention is given to the notion of probability assignment under the selfinformation loss function, which is directly related to the theory of universal data compression. Both the probabilistic setting and the deterministic setting of the universal prediction problem are described with emphasis on the analogy and the differences between results in the two settings.