Results 1 -
8 of
8
Inducing Features of Random Fields
- IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
, 1997
"... We present a technique for constructing random fields from a set of training samples. The learning paradigm builds increasingly complex fields by allowing potential functions, or features, that are supported by increasingly large subgraphs. Each feature has a weight that is trained by minimizing the ..."
Abstract
-
Cited by 465 (14 self)
- Add to MetaCart
We present a technique for constructing random fields from a set of training samples. The learning paradigm builds increasingly complex fields by allowing potential functions, or features, that are supported by increasingly large subgraphs. Each feature has a weight that is trained by minimizing the Kullback-Leibler divergence between the model and the empirical distribution of the training data. A greedy algorithm determines how features are incrementally added to the field and an iterative scaling algorithm is used to estimate the optimal values of the weights. The random field models and techniques introduced in this paper differ from those common to much of the computer vision literature in that the underlying random fields are non-Markovian and have a large number of parameters that must be estimated. Relations to other learning approaches, including decision trees, are given. As a demonstration of the method, we describe its application to the problem of automatic word classifica...
Hidden-Variable Models for Discriminative Reranking
- In Proceedings of HLTEMNLP
, 2005
"... We describe a new method for the representation of NLP structures within reranking approaches. We make use of a conditional log–linear model, with hidden variables representing the assignment of lexical items to word clusters or word senses. The model learns to automatically make these assignments b ..."
Abstract
-
Cited by 26 (2 self)
- Add to MetaCart
We describe a new method for the representation of NLP structures within reranking approaches. We make use of a conditional log–linear model, with hidden variables representing the assignment of lexical items to word clusters or word senses. The model learns to automatically make these assignments based on a discriminative training criterion. Training and decoding with the model requires summing over an exponential number of hidden– variable assignments: the required summations can be computed efficiently and exactly using dynamic programming. As a case study, we apply the model to parse reranking. The model gives an F – measure improvement of ≈ 1.25 % beyond the base parser, and an ≈ 0.25% improvement beyond the Collins (2000) reranker. Although our experiments are focused on parsing, the techniques described generalize naturally to NLP structures other than parse trees. 1
The SRI March 2000 Hub-5 conversational speech transcription system
- In Proceedings of the NIST Speech Transcription Workshop
, 2000
"... We describe SRI’s large vocabulary conversational speech recognition system as used in the March 2000 NIST Hub-5E evaluation. The system performs four recognition passes: (1) bigram recognition with phone-loop-adapted, within-word triphone acoustic models, (2) lattice generation with transcription-m ..."
Abstract
-
Cited by 26 (6 self)
- Add to MetaCart
We describe SRI’s large vocabulary conversational speech recognition system as used in the March 2000 NIST Hub-5E evaluation. The system performs four recognition passes: (1) bigram recognition with phone-loop-adapted, within-word triphone acoustic models, (2) lattice generation with transcription-mode-adapted models, (3) trigram lattice recognition with adapted cross-word triphone models, and (4) N-best rescoring and reranking with various additional knowledge sources. The system incorporates two new kinds of acoustic model: triphone models conditioned on speaking rate, and an explicit joint model of within-word phone durations. We also obtained an unusually large improvement from modeling crossword pronunciation variants in “multiword ” vocabulary items. The language model (LM) was enhanced with an “anti-LM ” representing acoustically confusable word sequences. Finally, we applied a generalized ROVER algorithm to combine the N-best hypotheses from several systems based on different acoustic models. 1.
Statistical Learning of Harmonic Movement
- JOURNAL OF NEW MUSIC RESEARCH
, 1999
"... We explore the application of statistical techniques, borrowed from natural language processing, to music. A probabilistic method is used to capture and generalise from the local harmonic movement of a corpus of seventeenth-century dance music. The probabilistic grammars so generated are then use ..."
Abstract
-
Cited by 18 (3 self)
- Add to MetaCart
We explore the application of statistical techniques, borrowed from natural language processing, to music. A probabilistic method is used to capture and generalise from the local harmonic movement of a corpus of seventeenth-century dance music. The probabilistic grammars so generated are then used for experiments in generation (composition). The corpus
Learning bias and phonological rule induction
- Computational Linguistics
, 1996
"... A fundamental debate in the machine learning of language has been the role of prior knowledge in the learning process. Purely nativist approaches, such as the Principles and Parameters model, build parameterized linguistic generalizations directly into the learning system. Purely empirical approache ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
A fundamental debate in the machine learning of language has been the role of prior knowledge in the learning process. Purely nativist approaches, such as the Principles and Parameters model, build parameterized linguistic generalizations directly into the learning system. Purely empirical approaches use a general, domain-independent learning rule (Error Back-Propagation, Instance-Based Generalization, Minimum Description Length) to learn linguistic generalizations directly from the data. In this paper we suggest that an alternative to the purely nativist or purely empiricist learning paradigms is to represent the prior knowledge of language as a set of abstract learning biases, which guide an empirical inductive learning algorithm. We test our idea by examining the machine learning of simple Sound Pattern of English (SPE)-style phonological rules. We represent phonological rules as finite state transducers which accept underlying forms as input and generate surface forms as output. We show that OSTIA, a general-purpose transducer induction algorithm, was incapable of learning simple phonological rules like flapping. We then augmented OSTIA with three kinds of learning biases which are specific to natural language phonology, and are assumed explicitly or implicitly by every theory of phonology: Faithfulness (underlying segments tend
A Maximum Entropy Model for Prepositional Phrase Attachment
, 1994
"... this paper methods for constructing statistical models for computing the probability of attachment decisions. These models could be then integrated into scoring the probability of an overall parse. We present our methods in the context of prepositional phrase (PP) attachment. Earlier work [11] on PP ..."
Abstract
- Add to MetaCart
this paper methods for constructing statistical models for computing the probability of attachment decisions. These models could be then integrated into scoring the probability of an overall parse. We present our methods in the context of prepositional phrase (PP) attachment. Earlier work [11] on PP-attachment for verb phrases (whether the PP attaches to the preceding noun phrase or to the verb phrase) used statistics on co-occurences of two bigrams: the main verb ( ) and preposition ( ) bigram and the main noun in the object noun phrase ( 1 ) and preposition bigram. In this paper, we explore the use of more features to help in modeling the distribution of the binary PP-attachment decision. We also describe a search procedure for selecting a "good" subset of features from a much larger pool of features for PP-attachment. Obviously, the feature search cannot be Jeff Reynar, from University of Pennsylvania, worked on this project as a summer student at I.B.M
Trained Trigger Language Model for Sentence Retrieval in QA: Bridging the Vocabulary Gap
"... We propose a novel language model for sentence retrieval in Question Answering (QA) systems called trained trigger language model. This model addresses the word mismatch problem in information retrieval. The proposed model captures pairs of trigger and target words while training on a large corpus. ..."
Abstract
- Add to MetaCart
We propose a novel language model for sentence retrieval in Question Answering (QA) systems called trained trigger language model. This model addresses the word mismatch problem in information retrieval. The proposed model captures pairs of trigger and target words while training on a large corpus. The word pairs are extracted based on both unsupervised and supervised approaches while different notions of triggering are used. In addition, we study the impact of corpus size and domain for a supervised model. All notions of the trained trigger model are finally used in a language model-based sentence retrieval framework. Our experiments on TREC QA collection verify that the proposed model significantly improves the sentence retrieval performance compared to the state-of-the-art translation model and class model which address the same problem.

