Results 1 -
3 of
3
Quantization of cepstral parameters for speech recognition over the World Wide Web
- IEEE J. Select. Areas Commun
, 1999
"... We examine alternative architectures for a client-server model of speech-enabled applications over the World Wide Web. We compare a server-only processing model, where the client encodes and transmits the speech signal to the server, to a model where the recognition front end runs locally at the cli ..."
Abstract
-
Cited by 24 (2 self)
- Add to MetaCart
We examine alternative architectures for a client-server model of speech-enabled applications over the World Wide Web. We compare a server-only processing model, where the client encodes and transmits the speech signal to the server, to a model where the recognition front end runs locally at the client and encodes and transmits the cepstral coefficients to the recognition server over the Internet. We follow a novel encoding paradigm, trying to maximize recognition performance instead of perceptual reproduction, and we find that by transmitting the cepstral coefficients we can achieve significantly higher recognition performance at a fraction of the bit rate required when encoding the speech signal directly. We find that the required bit rate to achieve the recognition performance of high-quality unquantized speech is just 2000 bits per second. 1
Automatic Person Recognition by Using Acoustic and Geometric Features
, 1993
"... The paper describes a multisensorial person identification system: visual and acoustic cues are used jointly for person identification. A simple approach, based on the fusion of the lists of scores produced independently by a speaker recognition system and a face recognition system, is presented. Ex ..."
Abstract
-
Cited by 19 (1 self)
- Add to MetaCart
The paper describes a multisensorial person identification system: visual and acoustic cues are used jointly for person identification. A simple approach, based on the fusion of the lists of scores produced independently by a speaker recognition system and a face recognition system, is presented. Experiments are reported which show that integration of visual and acoustic information enhances both performance and reliability of the separate systems. Finally two network architectures, based on radial basis function theory, are proposed to describe integration at different levels of abstraction. Keywords: face recognition, speaker identification, classification 1. Introduction This paper describes an automatic person recognition system 1 which uses both acoustic features, derived from the analysis of a given speech signal, and visual ones, related to distinctive parameters of the face of the person who uttered that speech signal. Visual and acoustic cues are used jointly for person id...
Optimization of a Vector Quantization Codebook for Objective Evaluation of Surgical Skill
"... Surgical robotic systems and virtual reality simulators have introduced an unprecedented precision of measurement for both tool-tissue and toolsurgeon interaction; thus holding promise for more objective analyses of surgical skill. Integrative or averaged metrics such as path length, timeto -task, s ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
Surgical robotic systems and virtual reality simulators have introduced an unprecedented precision of measurement for both tool-tissue and toolsurgeon interaction; thus holding promise for more objective analyses of surgical skill. Integrative or averaged metrics such as path length, timeto -task, success/failure percentages, etc., have often been employed towards this end but these fail to address the processes associated with a surgical task as a dynamic phenomena. Stochastic tools such as Markov modeling using a `white-box' approach have proven amenable to this type of analysis. While such an approach reveals the internal structure of the of the surgical task as a process, it requires a task decomposition based on expert knowledge, which may result in a relatively large/complex model. In this work, a `black box' approach is developed with generalized cross-procedural applications., the model is characterized by a compact topology, abstract state definitions, and optimized codebook size. Data sets of isolated tasks were extracted from the Blue DRAGON database consisting of 30 surgical subjects stratified into six training levels. Vector quantization (VQ) was employed on the entire database, thus synthesizing a lexicon of discrete, task-independent surgical tool/tissue interactions. VQ has successfully established a dictionary of 63 surgical code words and displayed non-temporal skill discrimination. VQ allows for a more cross-procedural analysis without relying on a thorough study of the procedure, links the results of the black-box approach to observable phenomena, and reduces the computational cost of the analysis by discretizing a complex, continuous data space.

