Results 11 -
13 of
13
A Grapheme Based Speech Recognition System for Russian, Specom 2004
, 2004
"... With the increasing availability and deployment of speech recognition technology in real world environments fast and affordable adaptation of speech recognition systems to new languages and/or domains becomes more and more important. One of the most expensive components of a recognition system is th ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
With the increasing availability and deployment of speech recognition technology in real world environments fast and affordable adaptation of speech recognition systems to new languages and/or domains becomes more and more important. One of the most expensive components of a recognition system is the pronunciation dictionary that maps the orthography of the words in the search vocabulary onto a sequence of sub-units. Often phonemes act as such sub-units. Human expert knowledge is usually required for crafting the pronunciation dictionary, thus making it an expensive and time consuming task. Even automatic tools for creating such dictionaries often require hand labeled amounts of training material and rely on manual revision. In order to address the problem of creating a dictionary in a time and cost efficient way we have examined recognition systems at our lab that rely soly on graphemes rather than phonemes as subunits. The mapping in the dictionary thus becomes trivial, since now every word is simply segmented into its letters. Therefore no expert knowledge is needed anymore. Our experiments on different languages have shown that the quality of the resulting recognizer significantly depends on the grapheme-to-phoneme relation of the underlying language. Since Russian is a language with an alphabetic script with a fairly close graphemeto-phoneme relation it is very well suited to be a candidate for this approach. In this paper we present our results on creating a grapheme based Russian recognizer trained on the GlobalPhone corpus that covers fifteen different languages. We compare the performance of the resulting system to a phoneme based recognition system that was trained in the course of the GlobalPhone project, and compare the performance of two grapheme based systems whose context-dependent models were clustered with two different procedures. 1
Towards a Unified Framework for Sub-lexical and Supra-lexical Linguistic Modeling
, 2002
"... Conversational interfaces have received much attention as a promising natural communication channel between humans and computers. A typical conversational interface consists of three major systems: speech understanding, dialog management and spoken language generation. In such a conversational inter ..."
Abstract
- Add to MetaCart
Conversational interfaces have received much attention as a promising natural communication channel between humans and computers. A typical conversational interface consists of three major systems: speech understanding, dialog management and spoken language generation. In such a conversational interface, speech recognition as the front-end of speech understanding remains to be one of the fundamental challenges for establishing robust and effective human/computer communications. On the one hand, the speech recognition component in a conversational interface lives in a rich system environment. Diverse sources of knowledge are available and can potentially be beneficial to its robustness and accuracy. For example, the natural language understanding component can provide linguistic knowledge in syntax and semantics that helps constrain the recognition search space. On the other hand, the speech recognition component also faces the challenge of spontaneous speech, and it is important to address the casualness of speech using the knowledge sources available. For example, sub-lexical linguistic information would be very useful in providing linguistic support for previously unseen words, and dynamic reliability modeling may help improve recognition robustness for poorly articulated speech.
Natural Language Engineering 6 (3-4): 305-322. Printed in the United Kingdom
"... This article provides a global overview of the main aspects of current practice in the design, implementation and evaluation of speech recognition components for Spoken Language Dialog Systems (SLDSs) and presents the results of the DISC European project related to speech recognition. DISC and its s ..."
Abstract
- Add to MetaCart
This article provides a global overview of the main aspects of current practice in the design, implementation and evaluation of speech recognition components for Spoken Language Dialog Systems (SLDSs) and presents the results of the DISC European project related to speech recognition. DISC and its successor DISC-2 are efforts towards the definition of best practice guidelines for SLDS development and evaluation. SLDSs aim at using natural spoken input for performing an information processing task such as automated standards, call routing or travel planning and reservations. The main functionality of an SLDS are speech recognition, natural language understanding, dialog management, database access and interpretation, response generation and speech synthesis. Speech recognition, which transforms the acoustic signal into a string of words, is a key technology in any SLDS.

