Results 1 -
4 of
4
PocketSphinx: A free, real-time continuous speech recognition system for hand-held devices
- in Proceedings of ICASSP
, 2006
"... The availability of real-time continuous speech recognition on mobile and embedded devices has opened up a wide range of research opportunities in human-computer interactive applications. Unfortunately, most of the work in this area to date has been confined to proprietary software, or has focused o ..."
Abstract
-
Cited by 12 (0 self)
- Add to MetaCart
The availability of real-time continuous speech recognition on mobile and embedded devices has opened up a wide range of research opportunities in human-computer interactive applications. Unfortunately, most of the work in this area to date has been confined to proprietary software, or has focused on limited domains with constrained grammars. In this paper, we present a preliminary case study on the porting and optimization of CMU SPHINX-II, a popular open source large vocabulary continuous speech recognition (LVCSR) system, to hand-held devices. The resulting system operates in an average 0.87 times real-time on a 206MHz device, 8.03 times faster than the baseline system. To our knowledge, this is the first hand-held LVCSR system available under an open-source license. 1.
SRILM at Sixteen: Update and Outlook
"... Abstract—We review developments in the SRI Language Modeling Toolkit (SRILM) since 2002, when a previous paper on SRILM was published. These developments include measures to make training from large data sets more efficient, to implement additional language modeling techniques (such as for adaptatio ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Abstract—We review developments in the SRI Language Modeling Toolkit (SRILM) since 2002, when a previous paper on SRILM was published. These developments include measures to make training from large data sets more efficient, to implement additional language modeling techniques (such as for adaptation and smoothing), and for client/server operation. In addition, the functionality for lattice processing has been greatly expanded. We also highlight several external contributions and notable applications of the toolkit, and assess SRILM’s impact on the research community. I.
IraqComm and FlexTrans: A Speech Translation System and Flexible Framework
"... bidirectional speech-to-speech machine translation between English and Iraqi Arabic in the domains of force protection, municipal and medical services, and training. The system was developed primarily under DARPA's TRANSTAC Program and includes: speech recognition components using SRI's Dynaspeak ® ..."
Abstract
- Add to MetaCart
bidirectional speech-to-speech machine translation between English and Iraqi Arabic in the domains of force protection, municipal and medical services, and training. The system was developed primarily under DARPA's TRANSTAC Program and includes: speech recognition components using SRI's Dynaspeak ® engine; MT components using SRI's Gemini ™ and SRInterp; and speech synthesis from Cepstral, LLC. The communication between these components is coordinated by SRI's Flexible Translation (FlexTrans) Framework, which has an intuitive easy-to-use graphical user interface and an eyes-free hands-free mode, and is highly configurable and adaptable to user needs. It runs on a variety of standard portable hardware platforms and was designed to make it as easy as possible to build systems for other languages, as shown by the rapid development of an analogous system in English/Malay.
NAME-AWARE SPEECH RECOGNITION FOR INTERACTIVE QUESTION ANSWERING
"... In this work we show how interactivity in a voice-enabled question answering application may improve speech recognition. We allow the user to provide a target named entity before asking the question. Then we build a named entity specific language model using the documents containing the named entity ..."
Abstract
- Add to MetaCart
In this work we show how interactivity in a voice-enabled question answering application may improve speech recognition. We allow the user to provide a target named entity before asking the question. Then we build a named entity specific language model using the documents containing the named entity. The question-specific model is obtained by merging the named entity specific model with the model built on a set of questions. We present a set of experiments using the TREC question set on the AQUAINT corpus. The question-specific language model is compared with the baseline model built by merging a model of the AQUAINT corpus and past TREC questions. The question-specific model achieves 32.2 % reduction in word error rate from the baseline using the questions where pronominal references are resolved. Index Terms — Spoken question answering, speech recognition, spoken dialog systems

