Results 1 -
5 of
5
Quantization of cepstral parameters for speech recognition over the World Wide Web
- IEEE J. Select. Areas Commun
, 1999
"... We examine alternative architectures for a client-server model of speech-enabled applications over the World Wide Web. We compare a server-only processing model, where the client encodes and transmits the speech signal to the server, to a model where the recognition front end runs locally at the cli ..."
Abstract
-
Cited by 24 (2 self)
- Add to MetaCart
We examine alternative architectures for a client-server model of speech-enabled applications over the World Wide Web. We compare a server-only processing model, where the client encodes and transmits the speech signal to the server, to a model where the recognition front end runs locally at the client and encodes and transmits the cepstral coefficients to the recognition server over the Internet. We follow a novel encoding paradigm, trying to maximize recognition performance instead of perceptual reproduction, and we find that by transmitting the cepstral coefficients we can achieve significantly higher recognition performance at a fraction of the bit rate required when encoding the speech signal directly. We find that the required bit rate to achieve the recognition performance of high-quality unquantized speech is just 2000 bits per second. 1
Driving Synthetic Mouth Gestures: Phonetic Recognition for FaceMe!
- in proceedings of EUROSPEECH’97
, 1997
"... The goal of this work is to use phonetic recognition to drive a synthetic image with speech. Phonetic units are identified by the phonetic recognition engine and mapped to mouth gestures, known as visemes, the visual counterpart of phonemes. The acoustic waveform and visemes are then sent to a synth ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
The goal of this work is to use phonetic recognition to drive a synthetic image with speech. Phonetic units are identified by the phonetic recognition engine and mapped to mouth gestures, known as visemes, the visual counterpart of phonemes. The acoustic waveform and visemes are then sent to a synthetic image player, called FaceMe! where they are rendered synchronously. This paper provides background for the core technologies involved in this process and describes asynchronous and synchronous prototypes of a combined phonetic recognition/FaceMe! system which we use to render mouth gestures on an animated face. 1. Introduction This paper addresses the problem of driving an animated face using audio data. We present a phonetic recognition system as a front-end process to generate visemes, the visual analog of phonemes. The paper provides background for the core phonetic recognition technology, based on Statistical Trajectory Modeling [1] and the core FaceMe! animation technology. It th...
Server-assisted speech recognition over the internet
- In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing
, 2000
"... We propose a new architecture for deploying speech recognition over the Internet. The client performs the recognition, but is assisted by the server who computes the speech parameters. To demonstrate the architecture, we developed a Java-based Web-navigation system where the precomputed HMM models o ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
We propose a new architecture for deploying speech recognition over the Internet. The client performs the recognition, but is assisted by the server who computes the speech parameters. To demonstrate the architecture, we developed a Java-based Web-navigation system where the precomputed HMM models of the hyperlinked words are stored on the Web page and downloaded by the client. We tested the system on a digit-recognition example. The results show thatwith quantization and compression of the speech parameters, good recognition can be achieved in acceptable download and calculation time even on clients with modest connection speeds and computational powers. 1.
Speech Recognition Over The Internet Using Java
- IEEE International Conference on Acoustics, Speech, and Signal Processing
, 1999
"... A speech recognition system based on an Internet clientserver model is presented in this paper. A Java applet records the voice at the client computer, sends the recorded speech file over the Internet, and the server computer recognizes the speech and displays the recognized text back to the us ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
A speech recognition system based on an Internet clientserver model is presented in this paper. A Java applet records the voice at the client computer, sends the recorded speech file over the Internet, and the server computer recognizes the speech and displays the recognized text back to the user. Using this structure, an isolated digit recognition application was realized. 1. INTRODUCTION With the explosive growth of the Internet technology, both speech researchers and computer software engineers have been putting a great deal of effort to integrate speech functions into Internet applications [1-3]. For simple applications, voice playback and voice recording functions may be sufficient. For complex applications, however, speech recognition and synthesis functions are needed. This paper presents a client-server based, speech recognition system. In this system, a user can use a World Wide Web (WWW) browser such as the Netscape Communicator to do speech recognition by visiting a...
TREN- TURKISH SPEECH RECOGNITION PLATFORM
"... TREN (Turkish Recognition ENgine) is a modular, HMMbased (Hidden Markov Model) and speaker-independent speech recognition system whose system software architecture is based on Distributed Component Object Model (DCOM). TREN contains specialized modules that allow a full interoperable platform includ ..."
Abstract
- Add to MetaCart
TREN (Turkish Recognition ENgine) is a modular, HMMbased (Hidden Markov Model) and speaker-independent speech recognition system whose system software architecture is based on Distributed Component Object Model (DCOM). TREN contains specialized modules that allow a full interoperable platform including a Turkish speech recognizer, feature extractor, end-point detector and a performance monitoring module. TREN has basically two layers: First layer is the central server that distributes the recognition calls to the appropriate remote servers according to their current CPU load of the recognition process after some speech signal preprocessing and the second layer consists of the remote servers which performs the critical recognition task. This component-based architecture enables TREN applicable to distributed environments. TREN is also trained by considering a wide variety of very common words those best represent the Turkish language. In order to obtain a such database a very large word corpus is collected and statistically the widest span of triphones representing Turkish is examined. TREN has been used to assist speech technologies which require a modular and multithreaded recognizer with dynamic load sharing facilities. 1.

