Results 1 - 10
of
21
Bootstrapping Parsers via Syntactic Projection across Parallel Texts
- Natural Language Engineering
, 2005
"... Broad coverage, high quality parsers are available for only a handful of languages. A prerequisite for developing broad coverage parsers for more languages is the annotation of text with the desired linguistic representations (also known as “treebanking”). However, syntactic annotation is a labor in ..."
Abstract
-
Cited by 61 (2 self)
- Add to MetaCart
Broad coverage, high quality parsers are available for only a handful of languages. A prerequisite for developing broad coverage parsers for more languages is the annotation of text with the desired linguistic representations (also known as “treebanking”). However, syntactic annotation is a labor intensive and time-consuming process, and it is difficult to find linguistically annotated text in sufficient quantities. In this article, we explore using parallel text to help solving the problem of creating syntactic annotation in more languages. The central idea is to annotate the English side of a parallel corpus, project the analysis to the second language, and then train a stochastic analyzer on the resulting noisy annotations. We discuss our background assumptions, describe an initial study on the “projectability ” of syntactic relations, and then present two experiments in which stochastic parsers are developed with minimal human intervention via projection from English. 1
Discriminative language modeling with conditional random fields and the perceptron algorithm
- In Proc. ACL
, 2004
"... This paper describes discriminative language modeling for a large vocabulary speech recognition task. We contrast two parameter estimation methods: the perceptron algorithm, and a method based on conditional random fields (CRFs). The models are encoded as deterministic weighted finite state automata ..."
Abstract
-
Cited by 41 (4 self)
- Add to MetaCart
This paper describes discriminative language modeling for a large vocabulary speech recognition task. We contrast two parameter estimation methods: the perceptron algorithm, and a method based on conditional random fields (CRFs). The models are encoded as deterministic weighted finite state automata, and are applied by intersecting the automata with word-lattices that are the output from a baseline recognizer. The perceptron algorithm has the benefit of automatically selecting a relatively small feature set in just a couple of passes over the training data. However, using the feature set output from the perceptron algorithm (initialized with their weights), CRF training provides an additional 0.5 % reduction in word error rate, for a total 1.8 % absolute reduction from the baseline of 39.2%. 1
Whole-Sentence Exponential Language Models: A Vehicle for Linguistic-Statistical Integration
- Computers, Speech and Language
, 2001
"... We introduce an exponential language model which models a whole sentence or utterance as a single unit. By avoiding the chain rule, the model treats each sentence as a "bag of features", where features are arbitrary computable properties of the sentence. The new model is computationally more effici ..."
Abstract
-
Cited by 12 (0 self)
- Add to MetaCart
We introduce an exponential language model which models a whole sentence or utterance as a single unit. By avoiding the chain rule, the model treats each sentence as a "bag of features", where features are arbitrary computable properties of the sentence. The new model is computationally more efficient, and more naturally suited to modeling global sentential phenomena, than the conditional exponential (e.g. Maximum Entropy) models proposed to date. Using the model is straightforward. Training the model requires sampling from an exponential distribution. We describe the challenge of applying Monte Carlo Markov Chain (MCMC) and other sampling techniques to natural language, and discuss smoothing and step-size selection. We then present a novel procedure for feature selection, which exploits discrepancies between the existing model and the training corpus. We demonstrate our ideas by constructing and analyzing competitive models in the Switchboard domain, incorporating lexical and syntact...
Statistical Modelling in Continuous Speech Recognition (CSR)
- IN CONFERENCE ON UNCERTAINTY IN ARTIFICIAL INTELLIGENCE
, 2001
"... Automatic continuous speech recognition (CSR) is sufficiently ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
Automatic continuous speech recognition (CSR) is sufficiently
Semantic N-Gram Language Modeling With The Latent Maximum Entropy Principle
- In IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP03
, 2003
"... In this paper, we describe a unified probabilistic framework for statistical language modeling -- the latent maximum entropy principle -- which can effectively incorporate various aspects of natural language, such as local word interaction, syntactic structure and semantic document information. Unli ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
In this paper, we describe a unified probabilistic framework for statistical language modeling -- the latent maximum entropy principle -- which can effectively incorporate various aspects of natural language, such as local word interaction, syntactic structure and semantic document information. Unlike previous work on maximum entropy methods for language modeling, which only allow explicit features to be modeled, our framework also allows relationships over hidden features to be captured, resulting in a more expressive language model. We describe efficient algorithms for marginalization, inference and normalization in our extended models. We then present experimental results for our approach on the Wall Street Journal corpus.
Exploiting Syntactic, Semantic and Lexical Regularities in Language Modeling via Directed Markov Random Fields
- In Proceedings of ICML 2005
, 2005
"... We present a directed Markov random field (MRF) model that combines n-gram models, probabilistic context free grammars (PCFGs) and probabilistic latent semantic analysis (PLSA) for the purpose of statistical language modeling. Even though the composite directed MRF model potentially has an exponenti ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
We present a directed Markov random field (MRF) model that combines n-gram models, probabilistic context free grammars (PCFGs) and probabilistic latent semantic analysis (PLSA) for the purpose of statistical language modeling. Even though the composite directed MRF model potentially has an exponential number of loops and becomes a context sensitive grammar, we are nevertheless able to estimate its parameters in cubic time using an efficient modified EM method, the generalized inside-outside algorithm, which extends the inside-outside algorithm to incorporate the effects of the n-gram and PLSA language models. We generalize various smoothing techniques to alleviate the sparseness of n-gram counts in cases where there are hidden variables. We also derive an analogous algorithm to calculate the probability of initial subsequence of a sentence, generated by the composite language model. Our experimental results on the Wall Street Journal corpus show that we obtain significant reductions in perplexity compared to the state-of-the-art baseline trigram model with Good-Turing and Kneser-Ney smoothings. 1.
Semantic structured language models
- In: ICSLP
, 2002
"... In this study, we propose two novel semantic language modeling techniques for spoken dialog systems. These methods are called semantic concept based language modeling and semantic structured language modeling. In the concept based language modeling, we propose to use long span semantic units to mode ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
In this study, we propose two novel semantic language modeling techniques for spoken dialog systems. These methods are called semantic concept based language modeling and semantic structured language modeling. In the concept based language modeling, we propose to use long span semantic units to model meaning sequences in spoken utterances. In the latter technique, we use statistical semantic parsers to extract information from a sentence. This information is then utilized in a maximum entropy based language model. The language models are trained and evaluated in the air travel reservation domain. We obtain improvement over a sophisticated class based N-gram language model both in terms of recognition accuracy and perplexity. Interpolation of the proposed techniques with the class-based N-gram LM provides additional improvement. 1.
Maximum Entropy Language Modeling with Non-Local Dependencies -- Dissertation Proposal
, 2000
"... ..."
A two-level schema for detecting recognition errors
- In Proc. ICSLP
, 2004
"... This paper proposes a two-level schema for the automatic detection of possible errors in speech recognition hypotheses. Given the recognition hypothesis of an utterance, the first level in our schema applies an utterance classifier (UC) to decide if the hypothesis is error-free or erroneous. In the ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
This paper proposes a two-level schema for the automatic detection of possible errors in speech recognition hypotheses. Given the recognition hypothesis of an utterance, the first level in our schema applies an utterance classifier (UC) to decide if the hypothesis is error-free or erroneous. In the latter case, the utterance is passed on to the second level in our schema for further processing. A word classifier (WC) is applied to each of the word hypotheses in the utterance to decide whether or not it is a misrecognition. Hence the two-level schema can locate error-containing regions in the recognition hypotheses. These are the target regions to which we can apply more sophisticated and expensive language models for error correction as a next step. We have developed UC and WC based on Support Vector Machines (SVM). Experiments on Mandarin Chinese speech recognition using the Speech-Lab-In-A-Box corpora showed that the UC has a detection error rate of 16.5 % for misrecognized utterances; the WC has a detection error rate of 19.8 % for erroneous word hypotheses; and the overall two-level schema can catch 44.5 % of the erroneous word hypotheses. 1.
Joint Morphological-Lexical Language Modeling for Processing Morphologically Rich Languages with Application to Dialectal Arabic
, 2007
"... Abstract — Language modeling for an inflected language such as Arabic poses new challenges for speech recognition and machine translation due to its rich morphology. Rich morphology results in large increases in out-of-vocabulary (OOV) rate and poor language model parameter estimation in the absence ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Abstract — Language modeling for an inflected language such as Arabic poses new challenges for speech recognition and machine translation due to its rich morphology. Rich morphology results in large increases in out-of-vocabulary (OOV) rate and poor language model parameter estimation in the absence of large quantities of data. In this study, we present a joint morphological-lexical language model (JMLLM) that takes advantage of Arabic morphology. JMLLM combines morphological segments with the underlying lexical items and additional available information sources with regards to morphological segments and lexical items in a single joint model. Joint representation and modeling of morphological and lexical items reduces the OOV rate and provides smooth probability estimates while keeping the predictive power of whole words. Speech recognition and machine translation experiments in dialectal-Arabic show improvements over word and morpheme based trigram language models. We also show that as the tightness of integration between different information sources increases, both speech recognition and machine translation performances improve.

