Results 1 -
5 of
5
Olga - A Dialogue System With An Animated Talking Agent
- In Proceedings of Eurospeech '97
, 1997
"... The object of the Olga project is to develop an interactive 3D animated talking agent. A futuristic application scenario is interactive digital TV, where the alga agent would guide naive users through the various services available on the network. The current application is a consumer information se ..."
Abstract
-
Cited by 13 (4 self)
- Add to MetaCart
The object of the Olga project is to develop an interactive 3D animated talking agent. A futuristic application scenario is interactive digital TV, where the alga agent would guide naive users through the various services available on the network. The current application is a consumer information service for microwave ovens. alga required the development of a system with components from many different fields: multimodal interfaces, dialogue management, speech recognition, speech synthesis, graphics, animation, facilities for direct manipulation and database handling. To integrate all knowledge sources alga is implemented with separate modules communicaring with a central dialogue interaction manager. In this paper we mainly describe the talking animated agent and the dialogue manager. There is also a short description of the preliminary speech recogniser used in the project.
Automatic Detection of Mispronunciation in non-native Swedish Speech
, 1998
"... This contribution presents part of the work initiated at CTT on the development of speech technology to assist non-native speakers in learning Swedish. This study mainly focuses on the automatic evaluation of mispronunciations at a phonetic level. We describe a new database we have collected for thi ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
This contribution presents part of the work initiated at CTT on the development of speech technology to assist non-native speakers in learning Swedish. This study mainly focuses on the automatic evaluation of mispronunciations at a phonetic level. We describe a new database we have collected for this work. Then we report the reliability of several phonetic scores to locate automatically segmental problems in student utterances. 1 Introduction One of the greatest problems in the integration of immigrants is the language barrier. Even when having a fairly good knowledge of a foreign language, most adults will not lose their foreign accent, which represents a hurdle to social relation and employment. In the last decade speech technology improvements has opened new possibilities in interactive language teaching systems [9, 5, 8]. Very recently a growing number of studies addressed the problem of rating automatically non-native speakers providing measurements that correlate with human judg...
Creating Unseen Triphones By Phone Concatenation In The Spectral, Cepstral And Formant Domains
"... A technique for predicting triphones by concatenation of diphone or monophone models is studied. The models are connected using linear interpolation between endpoints of piece-wise linear parameter trajectories. Three types of spectral representation are compared: formants, filter amplitudes and cep ..."
Abstract
- Add to MetaCart
A technique for predicting triphones by concatenation of diphone or monophone models is studied. The models are connected using linear interpolation between endpoints of piece-wise linear parameter trajectories. Three types of spectral representation are compared: formants, filter amplitudes and cepstmm coefficients. The proposed technique lowers the spectral distortion of the phones for all three representations when different speakers are used for training and evaluation. The average error of the created triphones is lower in the filter and cepstmm domains than for formants. This is explained to be caused by limitations in the Analysis-bySynthesis formant tracking algorithm. A small improvement with the proposed technique is achieved for all representations in the task of reordering N-best sentence recognition candidate lists.
The Free Speech Journal, Issue 5(1997)
"... This paper presents new methods for training large neural networks for phoneme probability estimation. An architecture combining time-delay windows and recurrent connections is used to capture the important dynamic information of the speech signal. Because the number of connections in a fully con ..."
Abstract
- Add to MetaCart
This paper presents new methods for training large neural networks for phoneme probability estimation. An architecture combining time-delay windows and recurrent connections is used to capture the important dynamic information of the speech signal. Because the number of connections in a fully connected recurrent network grows super-linear with the number of hidden units, schemes for sparse connection and connection pruning are explored. It is found that sparsely connected networks outperform their fully connected counterparts with an equal number of connections. The implementation of the combined architecture and training scheme is described in detail. The networks are evaluated in a hybrid HMM/ANN system for phoneme recognition on the TIMIT database, and for word recognition on the WAXHOLM database. The achieved phone error-rate, 27.8%, for the standard 39 phoneme set on the core test-set of the TIMIT database is in the range of the lowest reported. All training and simulation software used is made freely available by the author, and detailed information about the software and the training process is given in an Appendix.
Automatic Content-Based Filtering of Television News
, 2005
"... With the ever-increasing flow of information, the need for computer-automated tools for handling information becomes greater and greater. News shows and other television broadcasts carry vast amounts of information, but generally in forms that are not reachable through conventional information retri ..."
Abstract
- Add to MetaCart
With the ever-increasing flow of information, the need for computer-automated tools for handling information becomes greater and greater. News shows and other television broadcasts carry vast amounts of information, but generally in forms that are not reachable through conventional information retrieval techniques. During the last couple of years, researchers around the world have given considerable attention to the problem of extracting semantic information from television and re-representing it in a form suitable for automatic indexing and searching. Detailed content information in a television broadcast is typically found in teletext subtitles (if such are available), text embedded in the video image, and in the spoken dialogue. Extracting it involves using techniques associated with signal processing, image analysis, artificial intelligence, speech recognition, etc. A reliable filter system for use in e.g. Scandinavia, where only a few of the television broadcasts are teletext-subtitled, must take advantage of information in all three forms, or modalities. The presented report contains a survey of current research projects in this area and a theoretical design of a modular, multi-modal, content-based television filter, based on findings in the

