Results 1 -
7 of
7
An application of the submodular principal partition to training data subset selection
- in NIPS Workshop on Discrete Optimization in Machine Learning: Submodularity, Sparsity & Polyhedra
, 2010
"... We address the problem of finding a subset of a large training data set (corpus) that is useful for accurately and rapidly prototyping novel and computationally expensive machine learning architectures. To solve this problem, we express it as an minimization problem over a weighted sum of modular fu ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
We address the problem of finding a subset of a large training data set (corpus) that is useful for accurately and rapidly prototyping novel and computationally expensive machine learning architectures. To solve this problem, we express it as an minimization problem over a weighted sum of modular functions and submodular functions. Quantities such as number of classes (or quality) in a set of samples, or quality of a bundle of classes are submodular functions which make finding the optimal solutions possible. We apply the principal partition to our problem such that solutions for all possible trade-offs between a modular function and a submodular function can be found efficiently. We show results for speech recognition on the Switchboard-I speech recognition corpus, demonstrating improved results over previous techniques for this purpose. We also demonstrate the variety of the resulting corpora that may be produced using our method. 1
QUERY LANGUAGE MODELING FOR VOICE SEARCH
"... The paper presents an empirical exploration of google.com query stream language modeling. We describe the normalization of the typed query stream resulting in out-of-vocabulary (OoV) rates below 1 % for a one million word vocabulary. We present a comprehensive set of experiments that guided the desi ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
The paper presents an empirical exploration of google.com query stream language modeling. We describe the normalization of the typed query stream resulting in out-of-vocabulary (OoV) rates below 1 % for a one million word vocabulary. We present a comprehensive set of experiments that guided the design decisions for a voice search service. In the process we re-discovered a less known interaction between Kneser-Ney smoothing and entropy pruning, and found empirical evidence that hints at non-stationarity of the query stream, as well as strong dependence on various English locales—USA, Britain and Australia. Index Terms — language modeling, voice search, query stream
Optimal Selection of Limited Vocabulary Speech Corpora
"... We address the problem of finding a subset of a large speech data corpus that is useful for accurately and rapidly prototyping novel and computationally expensive speech recognition architectures. To solve this problem, we express it as an optimization problem over submodular functions. Quantities s ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
We address the problem of finding a subset of a large speech data corpus that is useful for accurately and rapidly prototyping novel and computationally expensive speech recognition architectures. To solve this problem, we express it as an optimization problem over submodular functions. Quantities such as vocabulary size (or quality) of a set of utterances, or quality of a bundle of word types are submodular functions which make finding the optimal solutions possible. We, moreover, are able to express our approach using graph cuts leading to a very fast implementation even on large initial corpora. We show results on the Switchboard-I corpus, demonstrating improved results over previous techniques for this purpose. We also demonstrate the variety of the resulting corpora that may be produced using our method. Index Terms: corpus subset selection, submodularity, LVCSR 1.
Hierarchical Phrase-Based Translation Representations
"... This paper compares several translation representations for a synchronous context-free grammar parse including CFGs/hypergraphs, finite-state automata (FSA), and pushdown automata (PDA). The representation choice is shown to determine the form and complexity of target LM intersection and shortest-pa ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
This paper compares several translation representations for a synchronous context-free grammar parse including CFGs/hypergraphs, finite-state automata (FSA), and pushdown automata (PDA). The representation choice is shown to determine the form and complexity of target LM intersection and shortest-path algorithms that follow. Intersection, shortest path, FSA expansion and RTN replacement algorithms are presented for PDAs. Chinese-to-English translation experiments using HiFST and HiPDT, FSA and PDA-based decoders, are presented using admissible (or exact) search, possible for HiFST with compact SCFG rulesets and HiPDT with compact LMs. For large rulesets with large LMs, we introduce a two-pass search strategy which we then analyze in terms of search errors and translation performance. 1
SRILM at Sixteen: Update and Outlook
"... Abstract—We review developments in the SRI Language Modeling Toolkit (SRILM) since 2002, when a previous paper on SRILM was published. These developments include measures to make training from large data sets more efficient, to implement additional language modeling techniques (such as for adaptatio ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Abstract—We review developments in the SRI Language Modeling Toolkit (SRILM) since 2002, when a previous paper on SRILM was published. These developments include measures to make training from large data sets more efficient, to implement additional language modeling techniques (such as for adaptation and smoothing), and for client/server operation. In addition, the functionality for lattice processing has been greatly expanded. We also highlight several external contributions and notable applications of the toolkit, and assess SRILM’s impact on the research community. I.
The Imagination of Crowds: Conversational AAC Language Modeling using Crowdsourcing and Large Data Sources
"... Augmented and alternative communication (AAC) devices enable users with certain communication disabilities to participate in everyday conversations. Such devices often rely on statistical language models to improve text entry by offering word predictions. These predictions can be improved if the lan ..."
Abstract
- Add to MetaCart
Augmented and alternative communication (AAC) devices enable users with certain communication disabilities to participate in everyday conversations. Such devices often rely on statistical language models to improve text entry by offering word predictions. These predictions can be improved if the language model is trained on data that closely reflects the style of the users ’ intended communications. Unfortunately, there is no large dataset consisting of genuine AAC messages. In this paper we demonstrate how we can crowdsource the creation of a large set of fictional AAC messages. We show that these messages model conversational AAC better than the currently used datasets based on telephone conversations or newswire text. We leverage our crowdsourced messages to intelligently select sentences from much larger sets of Twitter, blog and Usenet data. Compared to a model trained only on telephone transcripts, our best performing model reduced perplexity on three test sets of AAC-like communications by 60– 82 % relative. This translated to a potential keystroke savings in a predictive keyboard interface of 5–11%. 1
Low Rank Language Models for Small Training Sets
"... Abstract—Several language model smoothing techniques are available that are effective for a variety of tasks; however, training with small data sets is still difficult. This letter introduces the low rank language model, which uses a low rank tensor representation of joint probability distributions ..."
Abstract
- Add to MetaCart
Abstract—Several language model smoothing techniques are available that are effective for a variety of tasks; however, training with small data sets is still difficult. This letter introduces the low rank language model, which uses a low rank tensor representation of joint probability distributions for parameter-tying and optimizes likelihood under a rank constraint. It obtains lower perplexity than standard smoothing techniques when the training setissmallandalsoleadstoperplexityreductionwhenusedin domain adaptation via interpolation with a general, out-of-domain model. Index Terms—Language model, low rank tensor. I.

