Results 1 - 10
of
20
Polynomial Splines and Their Tensor Products in Extended Linear Modeling
- Ann. Statist
, 1997
"... ANOVA type models are considered for a regression function or for the logarithm of a probability function, conditional probability function, density function, conditional density function, hazard function, conditional hazard function, or spectral density function. Polynomial splines are used to m ..."
Abstract
-
Cited by 121 (14 self)
- Add to MetaCart
ANOVA type models are considered for a regression function or for the logarithm of a probability function, conditional probability function, density function, conditional density function, hazard function, conditional hazard function, or spectral density function. Polynomial splines are used to model the main effects, and their tensor products are used to model any interaction components that are included. In the special context of survival analysis, the baseline hazard function is modeled and nonproportionality is allowed. In general, the theory involves the L 2 rate of convergence for the fitted model and its components. The methodology involves least squares and maximum likelihood estimation, stepwise addition of basis functions using Rao statistics, stepwise deletion using Wald statistics, and model selection using BIC, cross-validation or an independent test set. Publically available software, written in C and interfaced to S/S-PLUS, is used to apply this methodology to...
Hazard Regression
- Journal of the American Statistical Association
, 1995
"... An automatic procedure that uses linear splines and their tensor products is proposed for tting a regression model to data involving a polychotomous response variable and one or more predictors. The tted model can be used for multiple classi cation. The automatic tting procedure involves maximum lik ..."
Abstract
-
Cited by 70 (15 self)
- Add to MetaCart
An automatic procedure that uses linear splines and their tensor products is proposed for tting a regression model to data involving a polychotomous response variable and one or more predictors. The tted model can be used for multiple classi cation. The automatic tting procedure involves maximum likelihood estimation, stepwise addition, stepwise deletion, and model selection by AIC, cross-validation or an independent test set. A modi ed version of the algorithm has been constructed that is applicable to large data sets, and it is illustrated using a phoneme recognition data set with 250,000 cases, 45 classes and 63 predictors.
Transcription Of Broadcast News
- LIMSI Nov96 Hub4 System," Proc. ARPA Speech Recognition Workshop
, 1997
"... In this paper we report on our recent work in transcribing broadcast news shows. Radio and television broadcasts contain signal segments of various linguistic and acoustic natures. The shows contain both prepared and spontaneous speech. The signal may be studio quality or have been transmitted over ..."
Abstract
-
Cited by 21 (16 self)
- Add to MetaCart
In this paper we report on our recent work in transcribing broadcast news shows. Radio and television broadcasts contain signal segments of various linguistic and acoustic natures. The shows contain both prepared and spontaneous speech. The signal may be studio quality or have been transmitted over a telephone or other noisy channel (ie., corrupted by additive noise and nonlinear distorsions), or may contain speech over music. Transcription of this type of data poses challenges in dealing with the continuous stream of data under varying conditions. Our approach to this problem is to segment the data into a set of categories, which are then processed with category specific acoustic models. We describe our 65k speech recognizer and experiments using different sets of acoustic models for transcription of broadcast news data. The use of prior knowledge of the segment boundaries and types is shown to not crucially affect the performance. 1. INTRODUCTION The goal of this research is to au...
The LIMSI ARISE System
, 1998
"... The LIMSI ARISE system provides vocal access by telephone to rail travel information for main French intercity connections, including timetables, simulated fares and reservations, reductions and services. Our goal is to obtain high dialog success rates with a very open interaction, where the user ..."
Abstract
-
Cited by 14 (0 self)
- Add to MetaCart
The LIMSI ARISE system provides vocal access by telephone to rail travel information for main French intercity connections, including timetables, simulated fares and reservations, reductions and services. Our goal is to obtain high dialog success rates with a very open interaction, where the user is free to ask any question or to provide any information at any point in time. In order to improve performance with such an open dialog strategy, we make use of implicit confirmation using the callers wording (when possible), and change to a more constrained dialog level when the dialog is not going well.
Towards Multi-Domain Speech Understanding with Flexible and Dynamic Vocabulary
, 2001
"... In developing telephone-based conversational systems, we foresee future systems capable of supporting multiple domains and flexible vocabulary. Users can pursue several topics of interest within a single telephone call, and the system is able to switch transparently among domains within a single dia ..."
Abstract
-
Cited by 14 (3 self)
- Add to MetaCart
In developing telephone-based conversational systems, we foresee future systems capable of supporting multiple domains and flexible vocabulary. Users can pursue several topics of interest within a single telephone call, and the system is able to switch transparently among domains within a single dialog. This system is able to detect the presence of any out-of-vocabulary (OOV) words, and automatically hypothesizes each of their pronunciation, spelling and meaning. These can be confirmed with the user and the new words are subsequently incorporated into the recognizer lexicon for future use. This thesis
The LIMSI 1997 Hub-4E Transcription System
"... In this paper we report on the LIMSI system used in the Nov'97 Hub-4E benchmark test on transcription of American English broadcast news shows. There are two main differences from the LIMSI system developed for the Nov'96 evaluation. The first concerns the preprocessing stages for partitioning the d ..."
Abstract
-
Cited by 7 (6 self)
- Add to MetaCart
In this paper we report on the LIMSI system used in the Nov'97 Hub-4E benchmark test on transcription of American English broadcast news shows. There are two main differences from the LIMSI system developed for the Nov'96 evaluation. The first concerns the preprocessing stages for partitioning the data, and the second concerns a reduction in the number of acoustic model sets used to deal with the various acoustic signal characteristics. The LIMSI system for the November 1997 Hub-4E evaluation is a continuous mixture density, tied-state cross-word contextdependent HMM system. The acoustic models were trained on the 1995 and 1996 official Hub-4E training data containing about 80 hours of transcribed speech material. The 65K word trigram language models are trained on 155 million words of newspaper texts and 132 million words of broadcast news transcriptions. The test data is segmented and labeled using Gaussian mixture models, and non-speech segments are rejected. The speech segments ar...
Speech-to-text conversion in French
, 1994
"... Speech-to-text conversion of French necessitates that both the acoustic level recognition and language modeling be tailored to the French language. Work in this area was initiated at LIMSI over 10 years ago. In this paper a summary of the ongoing research in this direction is presented. Included are ..."
Abstract
-
Cited by 6 (6 self)
- Add to MetaCart
Speech-to-text conversion of French necessitates that both the acoustic level recognition and language modeling be tailored to the French language. Work in this area was initiated at LIMSI over 10 years ago. In this paper a summary of the ongoing research in this direction is presented. Included are studies on distributional properties of French text materials; problems specific to speech-to-text conversion particular to French; studies in phoneme-to-grapheme conversion, for continuous, error-free phonemic strings; past work on isolated-word speech-totext conversion; and more recent work on continuous-speech speech-to-text conversion. Also demonstrated is the use of phone recognition for both language and speaker identification. The
Large Vocabulary Continuous Speech Recognition: from Laboratory Systems towards Real-World Applications
, 1996
"... This paper provides an overview of the state-of-the-art in laboratory speaker-independent, large vocabulary continuous speech recognition (LVCSR) systems with a view towards adapting such technology to the requirements of real-world applications. While in speech recognition the principal concern is ..."
Abstract
-
Cited by 6 (4 self)
- Add to MetaCart
This paper provides an overview of the state-of-the-art in laboratory speaker-independent, large vocabulary continuous speech recognition (LVCSR) systems with a view towards adapting such technology to the requirements of real-world applications. While in speech recognition the principal concern is to transcribe the speech signal as a sequence of words, the same core technology can be applied to domains other than dictation. The main topics addressed are acoustic-phonetic modeling, lexical representation, language modeling, decoding and model adaptation. After a brief summary of experimental results some directions towards usable systems are given. In moving from laboratory systems towards real-world applications, different constraints arise which influence the system design. The application imposes limitations on computational resources, constraints on signal capture, requirements for noise and channel compensation, and rejection capability. The difficulties and costs of adapting existing technology to new languages and application need to be assessed. Near term applications for LVCSR technology are likely to grow in somewhat limited domains such as spoken language systems for information retrieval, and limited domain dictation. Perspectives on some unresolved problems are given, indicating areas for future research
Text Normalization And Speech Recognition In French
- Proc. ESCA Eurospeech'97
, 1997
"... In this paper we present a quantitative investigation into the impact of text normalization on lexica and language models for speech recognition in French. The text normalization process defines what is considered to be a word by the recognition system. Depending on this definition we can measure di ..."
Abstract
-
Cited by 6 (5 self)
- Add to MetaCart
In this paper we present a quantitative investigation into the impact of text normalization on lexica and language models for speech recognition in French. The text normalization process defines what is considered to be a word by the recognition system. Depending on this definition we can measure different lexical coverages and language model perplexities, both of which are closely related to the speech recognition accuracies obtained on read newspaper texts. Different text normalizations of up to 185M words of newspaper texts are presented along with corresponding lexical coverage and perplexity measures. Some normalizations were found to be necessary to achieve good lexical coverage, while others were more or less equivalent in this regard. The choice of normalization to create language models for use in the recognition experiments with read newspaper texts was based on these findings. Our best system configuration obtained a 11.2% word error rate in the AUPELF `French-speaking' spee...
Transcribing Broadcast News Shows
- In Proc. ICASSP'97
, 1997
"... While significant improvements have been made over the last 5 years in large vocabulary continuous speech recognition of large read-speech corpora such as the ARPA Wall Street Journal-based CSR corpus (WSJ) for American English and the BREF corpus for French, these tasks remain relatively artificial ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
While significant improvements have been made over the last 5 years in large vocabulary continuous speech recognition of large read-speech corpora such as the ARPA Wall Street Journal-based CSR corpus (WSJ) for American English and the BREF corpus for French, these tasks remain relatively artificial. In this paper we report on our development work in moving from laboratory read speech data to real-world speech data in order to build a system for the new ARPA broadcast news transcription task. The LIMSI Nov96 speech recognizer makes use of continuous density HMMs with Gaussian mixture for acoustic modeling and n- gram statistics estimated on newspaper texts. The acoustic models are trained on the WSJ0/WSJ1, and adapted using MAP estimation with task-specific training data. The overall word error on the Nov96 partitioned evaluation test was 27.1%. INTRODUCTION Over the last 5 years significant advances have been made in large vocabulary, continuous speech recognition, which has been a...

