Results 1 - 10
of
15
Scaling to Very Very Large Corpora for Natural Language Disambiguation
, 2001
"... The amount of readily available online text has reached hundreds of billions of words and continues to grow. Yet for most core natural language tasks, algorithms continue to be optimized, tested and compared after training on corpora consisting of only one million words or less. In this pape ..."
Abstract
-
Cited by 82 (3 self)
- Add to MetaCart
The amount of readily available online text has reached hundreds of billions of words and continues to grow. Yet for most core natural language tasks, algorithms continue to be optimized, tested and compared after training on corpora consisting of only one million words or less. In this paper, we evaluate the performance of different learning methods on a prototypical natural language disambiguation task, confusion set disambiguation, when trained on orders of magnitude more labeled data than has previously been used. We are fortunate that for this particular application, correctly labeled training data is free. Since this will often not be the case, we examine methods for effectively exploiting very large corpora when labeled data comes at a cost.
LANDMARK-BASED SPEECH RECOGNITION: REPORT OF THE 2004 Johns Hopkins Summer Workshop
, 2005
"... ..."
Dynaspeak: SRI’s scalable speech recognizer for embedded and mobile systems
- in Proceedsings of HLT
, 2002
"... We introduce SRI’s new speech recognition engine, DynaSpeak TM, which is characterized by its scalability and flexibility, high recognition accuracy, memory and speed efficiency, adaptation capability, efficient grammar optimization, support for natural language parsing functionality, and operation ..."
Abstract
-
Cited by 6 (3 self)
- Add to MetaCart
We introduce SRI’s new speech recognition engine, DynaSpeak TM, which is characterized by its scalability and flexibility, high recognition accuracy, memory and speed efficiency, adaptation capability, efficient grammar optimization, support for natural language parsing functionality, and operation based on integer arithmetic. These features are designed to address the needs of the fast-developing and changing domain of embedded and mobile computing platforms.
Improved Modeling and Efficiency for Automatic Transcription of Broadcast News
, 2000
"... Over the last few years, the DARPA-sponsored Hub4 continuous speech recognition evaluations have pushed speech recognition technology for the very interesting and difficult task of automatically transcribing broadcast news. In this paper, we report on our research and progress on this problem. We fo ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
Over the last few years, the DARPA-sponsored Hub4 continuous speech recognition evaluations have pushed speech recognition technology for the very interesting and difficult task of automatically transcribing broadcast news. In this paper, we report on our research and progress on this problem. We focus on individual techniques we developed, rather than on descriptions of our evaluation systems. We provide comparative experimental results showing the improvements obtained with the novel approaches we developed. 1 Introduction In recent years there has been increasing interest in developing large-vocabulary continuous speech recognition (LVCSR) systems for speech found in real sources. Broadcast news, in particular, has been the testbed for the DARPA-sponsored Hub4 continuous speech recognition (CSR) evaluations over the last few years, and represents a significant challenge to speech recognition researchers. Many interesting problems are associated with the automatic recognition of b...
Parakeet: A continuous speech recognition system for mobile touchscreen devices
- In Proc. IUI ’09, 237–246. ACM
"... We present Parakeet, a system for continuous speech recognition on mobile touch-screen devices. The design of Parakeet was guided by computational experiments and validated by a user study. Participants had an average text entry rate of 18 words-per-minute (WPM) while seated indoors and 13 WPM while ..."
Abstract
-
Cited by 5 (5 self)
- Add to MetaCart
We present Parakeet, a system for continuous speech recognition on mobile touch-screen devices. The design of Parakeet was guided by computational experiments and validated by a user study. Participants had an average text entry rate of 18 words-per-minute (WPM) while seated indoors and 13 WPM while walking outdoors. In an expert pilot study, we found that speech recognition has the potential to be a highly competitive mobile text entry method, particularly in an actual mobile setting where users are walking around while entering text. Author Keywords Continuous speech recognition, mobile text entry, text input, touch-screen interface, error correction, speech input, word confusion network, predictive keyboard ACM Classification Keywords H5.2. User Interfaces: Voice I/O
Near Minimal Weighted Word Graphs For Post-Processing Speech
- In 1999 Int. Workshop on Automatic Speech Recognition and Understanding
, 1999
"... Large vocabulary speech recognition applications can benefit from an efficient data structure for representing large numbers of acoustic hypotheses compactly. Word graphs or lattices generated by acoustic recognition engines are generally not compact and must be post-processed to keep lattice sizes ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
Large vocabulary speech recognition applications can benefit from an efficient data structure for representing large numbers of acoustic hypotheses compactly. Word graphs or lattices generated by acoustic recognition engines are generally not compact and must be post-processed to keep lattice sizes small; however, algorithms designed for this task need to reduce the size of the lattice without either eliminating hypotheses or distorting their relative acoustic probabilities. In this paper, we will discuss the relevant criteria for measuring graph size, compare the advantages of two different structures for graphs, and introduce a new data structure and compression algorithm which give additional graph compression and maintain exact hypothesis path scores by storing probability information on both nodes and arcs within the graph. 1. INTRODUCTION Many recognition systems use word lattices or graphs as mechanisms for representing sentence hypotheses and interfacing with additional knowle...
Lattice Compression in the Consensual Post-Processing Framework
- In Proceedings of the Third World Multiconference on Systemics, Cybernetics and Informatics joint with the Fifth International Conference on Information Systems Analysis and Synthesis
, 2000
"... Word Lattices are used by most speech recognizers as a compact representation of a set of alternative hypotheses. In large-vocabulary, multi-pass recognition systems it is important to generate word lattices incorporating a large number of hypotheses but at the same time keeping the size of the repr ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
Word Lattices are used by most speech recognizers as a compact representation of a set of alternative hypotheses. In large-vocabulary, multi-pass recognition systems it is important to generate word lattices incorporating a large number of hypotheses but at the same time keeping the size of the representation as small as possible. Previously we presented a method for identifying mutually supporting and competing word hypotheses in a recognition lattice. In this paper we show how the outcome of this method can be used for compressing lattices. The success of the new technique comes from the ability to discard links with low a posteriori probability and recombine the remaining ones to create a new set of hypotheses. Experiments on the Switchboard corpus show that this method results in better compression results than the conventionally used technique.
Code Breaking for Automatic Speech Recognition
"... Code Breaking is a divide and conquer approach for sequential pattern recognition tasks where we identify weaknesses of an existing system and then use specialized decoders to strengthen the overall system. We study the technique in the context of Automatic Speech Recogniton. Using the lattice cutti ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Code Breaking is a divide and conquer approach for sequential pattern recognition tasks where we identify weaknesses of an existing system and then use specialized decoders to strengthen the overall system. We study the technique in the context of Automatic Speech Recogniton. Using the lattice cutting algorithm, we first analyze lattices generated by a state-of-the-art speech recognizer to spot possible errors in its first-pass hypothesis. We then train specialized decoders for each of these problems and apply them to refine the first-pass hypothesis. We study the use of Support Vector Machines (SVMs) as discriminative models over each of these problems. The estimation of a posterior distribution over hypoth-esis in these regions of acoustic confusion is posed as a logistic regression problem. GiniSVMs, a variant of SVMs, can be used as an approximation technique to estimate the parameters of the logistic regression problem. We first validate our approach on a small vocabulary recognition task, namely, alphadigits. We show that the use of GiniSVMs can substantially improve the per-formance of a well trained MMI-HMM system. We also find that it is possible to derive reliable confidence scores over the GiniSVM hypotheses and that these can be used to good effect in hypothesis combination. We will then analyze lattice cutting in terms of its ability to reliably identify, and provide good alternatives for incorrectly hypothesized words in the Czech MALACH domain, a large vocabulary task. We describe a procedure to train and apply SVMs to strengthen the first pass system, resulting in small but statistically significant recog-nition improvements. We conclude with a discussion of methods including clustering for obtaining further improvements on large vocabulary tasks.
The Effect of Pruning and Compression on Graphical Representations of the Output of a Speech Recognizer
- Origins and Dtrectioto, CH
, 2003
"... Larr vocabular y continuous speech reech ition can benefitfre an e#cient data strR turfor rrR/sentingalarE number of acoustic hypotheses compactly. Wor gr1:1 or lattices have been chosen as such an e#cientinter face between acousticroust ition engines and subsequent languageprguag ing modules. This ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Larr vocabular y continuous speech reech ition can benefitfre an e#cient data strR turfor rrR/sentingalarE number of acoustic hypotheses compactly. Wor gr1:1 or lattices have been chosen as such an e#cientinter face between acousticroust ition engines and subsequent languageprguag ing modules. This paper firR investigates the e#ect ofprEI/-- dur ing acoustic decoding on the quality ofwor lattices and shows that by combiningdi#erEE pre ing options (at the model level and wor level), we can obtain wor lattices withcompar bleaccurE/ to theorRE/ al lattices and a manageable size. In orer to use the wor lattices as the inputfor a post-prt-RI ing language module, they shouldprx--:/1 thetar/E hypotheses andtheir scor while being as small as possible. In this paper weintr oduce awor grC comprmpR/-- algor thm that significantlyrnt ces the number ofwor-- in thegrRxEE alrRx---- entation without eliminatingutter ance hypothesesor distortRI their acousticscort . Wecompar this wor grR comprCx/)R algor thm withsever lother latticesize-rRI cing appr aches and demon strnR thereRx1C-- strx gth of the new wor gr1/ comprw sionalgor:I+ for decr: ing the number ofworC in thereR/) entation. ExperR entsar conductedacrRI corRI/ and vocabular sizes todeterE/R the consistency of theprR/--) and comprC sionrnRIIC) # 2003 Elsevier Science Ltd. AllrlRI srEIE ved. 1.I5k4 Wor latticesar often chosen as theinter/C1 between an acousticrusticRx-- and a subsequent prubsequ using amor complex language model (LM)or mor specific acoustic model because of www.elsevierw.elsevi te/csl COMPUTER SPEECH AND LANGUAGE * Corr)R)R)Rr author Tel.: +1-765-494-3652; fax: +1-765-494-3371. E-mailaddr9(--)b harRxC/1:Rwxxx/Rrx+ yangl@ecn.purxxx/Rr (M.P.Har.RIC mike.johnson@marrx+Rwxx (M.T. Johnson),lhj@ecn.pur)xRwEE...
Experiments with Lattice-based PPRLM Language Identification
"... In this paper we describe experiments conducted during the development of a lattice-based PPRLM language identification system as part of the NIST 2005 language recognition evaluation campaign. In experiments following LRE05 the PPRLM-lattice sub-system presented here achieved a 30s/primary conditio ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
In this paper we describe experiments conducted during the development of a lattice-based PPRLM language identification system as part of the NIST 2005 language recognition evaluation campaign. In experiments following LRE05 the PPRLM-lattice sub-system presented here achieved a 30s/primary condition EER of 4.87%, making it the single best performing recognizer developed by the MIT-LL team. Details of implementation issues and experimental results are presented and interactions with backend score normalization are explored. 1 1.

