Results 1 
9 of
9
Rational kernels: Theory and algorithms
 Journal of Machine Learning Research
, 2004
"... Many classification algorithms were originally designed for fixedsize vectors. Recent applications in text and speech processing and computational biology require however the analysis of variablelength sequences and more generally weighted automata. An approach widely used in statistical learning ..."
Abstract

Cited by 61 (8 self)
 Add to MetaCart
(Show Context)
Many classification algorithms were originally designed for fixedsize vectors. Recent applications in text and speech processing and computational biology require however the analysis of variablelength sequences and more generally weighted automata. An approach widely used in statistical learning techniques such as Support Vector Machines (SVMs) is that of kernel methods, due to their computational efficiency in highdimensional feature spaces. We introduce a general family of kernels based on weighted transducers or rational relations, rational kernels, that extend kernel methods to the analysis of variablelength sequences or more generally weighted automata. We show that rational kernels can be computed efficiently using a general algorithm of composition of weighted transducers and a general singlesource shortestdistance algorithm. Not all rational kernels are positive definite and symmetric (PDS), or equivalently verify the Mercer condition, a condition that guarantees the convergence of training for discriminant classification algorithms such as SVMs. We present several theoretical results related to PDS rational kernels. We show that under some general conditions these kernels are
Tiburon: A Weighted Tree Automata Toolkit
, 2006
"... The availability of weighted finitestate string automata toolkits made possible great advances in natural language processing. However, recent advances in syntaxbased NLP model design are unsuitable for these toolkits. To combat this problem, we introduce a weighted finitestate tree automata to ..."
Abstract

Cited by 32 (7 self)
 Add to MetaCart
The availability of weighted finitestate string automata toolkits made possible great advances in natural language processing. However, recent advances in syntaxbased NLP model design are unsuitable for these toolkits. To combat this problem, we introduce a weighted finitestate tree automata toolkit, which incorporates recent developments in weighted tree automata theory and is useful for natural language applications such as machine translation, sentence compression, question answering, and many more.
The AT&T WATSON speech recognizer
 in Proceedings of ICASSP
, 2005
"... This paper describes the AT&T WATSON realtime speech recognizer, the product of several decades of research at AT&T. The recognizer handles a wide range of vocabulary sizes and is based on continuousdensity hidden Markov models for acoustic modeling and finite state networks for language m ..."
Abstract

Cited by 19 (3 self)
 Add to MetaCart
(Show Context)
This paper describes the AT&T WATSON realtime speech recognizer, the product of several decades of research at AT&T. The recognizer handles a wide range of vocabulary sizes and is based on continuousdensity hidden Markov models for acoustic modeling and finite state networks for language modeling. The recognition network is optimized for efficient search. We identify the algorithms used for highaccuracy, realtime and lowlatency recognition. We present results for small and large vocabulary tasks taken from the AT&T VoiceTone R ○ service, showing word accuracy improvement of about 5 % absolute and realtime processing speedup by a factor between 2 and 3. 1.
Statistical Modeling for Unit Selection in Speech Synthesis
 Proceedings of the Conference
"... Traditional concatenative speech synthesis systems use a number of heuristics to define the target and concatenation costs, essential for the design of the unit selection component. In contrast to these approaches, we introduce a general statistical modeling framework for unit selection inspired by ..."
Abstract

Cited by 8 (3 self)
 Add to MetaCart
Traditional concatenative speech synthesis systems use a number of heuristics to define the target and concatenation costs, essential for the design of the unit selection component. In contrast to these approaches, we introduce a general statistical modeling framework for unit selection inspired by automatic speech recognition. Given appropriate data, techniques based on that framework can result in a more accurate unit selection, thereby improving the general quality of a speech synthesizer. They can also lead to a more modular and a substantially more efficient system. We present a new unit selection system based on statistical modeling. To overcome the original absence of data, we use an existing highquality unit selection system to generate a corpus of unit sequences. We show that the concatenation cost can be accurately estimated from this corpus using a statistical ngram language model over units. We used weighted automata and transducers for the representation of the components of the system and designed a new and more efficient composition algorithm making use of string potentials for their combination. The resulting statistical unit selection is shown to be about 2.6 times faster than the last release of the AT&T Natural Voices Product while preserving the same quality, and offers much flexibility for the use and integration of new and more complex components.
Statistical LatticeBased Spoken Document Retrieval
"... Recent research efforts on spoken document retrieval have tried to overcome the low quality of 1best automatic speech recognition transcripts, especially in the case of conversational speech, by using statistics derived from speech lattices containing multiple transcription hypotheses as output by ..."
Abstract

Cited by 7 (0 self)
 Add to MetaCart
Recent research efforts on spoken document retrieval have tried to overcome the low quality of 1best automatic speech recognition transcripts, especially in the case of conversational speech, by using statistics derived from speech lattices containing multiple transcription hypotheses as output by a speech recognizer. We present a method for latticebased spoken document retrieval based on a statistical ngram modeling approach to information retrieval. In this statistical latticebased retrieval (SLBR) method, a smoothed statistical model is estimated for each document from the expected counts of words given the information in a lattice, and the relevance of each document to a query is measured as a probability under such a model. We investigate the efficacy of our method under various parameter settings of the speech recognition and lattice processing engines, using the Fisher English Corpus of conversational telephone speech. Experimental results show that our method consistently achieves better retrieval performance than using only the 1best transcripts in statistical retrieval, outperforms a recently proposed latticebased vector space retrieval method, and also compares favorably with a latticebased retrieval method based on the Okapi BM25 model.
Estimating Document Frequencies in a Speech Corpus
"... Abstract—Inverse Document Frequency (IDF) is an important quantity in many applications, including Information Retrieval. IDF is defined in terms of document frequency, df(w), the number of documents that mention w at least once. This quantity is relatively easy to compute over textual documents, bu ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
(Show Context)
Abstract—Inverse Document Frequency (IDF) is an important quantity in many applications, including Information Retrieval. IDF is defined in terms of document frequency, df(w), the number of documents that mention w at least once. This quantity is relatively easy to compute over textual documents, but spoken documents are more challenging. This paper considers two baselines: (1) an estimate based on the 1best ASR output and (2) an estimate based on expected term frequencies computed from the lattice. We improve over these baselines by taking advantage of repetition. Whatever the document is about is likely to be repeated, unlike ASR errors, which tend to be more random (Poisson). In addition, we find it helpful to consider an ensemble of language models. There is an opportunity for the ensemble to reduce noise, assuming that the errors across language models are relatively uncorrelated. The opportunity for improvement is larger when WER is high. This paper considers a pairing task application that could benefit from improved estimates of df. The pairing task inputs conversational sides from the English Fisher corpus and outputs estimates of which sides were from the same conversation. Better estimates of df lead to better performance on this task. I.
SEMANTIC DATA SELECTION FOR VERTICAL BUSINESS VOICE SEARCH
"... Local business voice search is a popular application for mobile phones, where handsfree interaction and speed are critical to users. However, speech recognition accuracy is still not satisfactory when the number of businesses and locations is extended nationwide. For mobile users, searching a local ..."
Abstract
 Add to MetaCart
(Show Context)
Local business voice search is a popular application for mobile phones, where handsfree interaction and speed are critical to users. However, speech recognition accuracy is still not satisfactory when the number of businesses and locations is extended nationwide. For mobile users, searching a local business directory is often related to the fulfillment of specific tasks “onthemove”, such as finding a restaurant, a movie theater, or a retailer chain. Restricting the local search to specific domains improves the quality of search results. In this paper, we present a new approach to data selection for bootstrapping and optimizing language models for vertical business sectors by exploiting semantic knowledge encoded in the business database and in the business category taxonomy. We demonstrate that, in the case of queries in the restaurant domain and without collecting new data, speech recognition word accuracy improves by 9.5 % relative when compared with a generic local business language model. Index Terms — Local business search, voice search, language modeling.