• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Tools

Sorted by:
Try your query at:
Semantic Scholar Scholar Academic
Google Bing DBLP
Results 1 - 10 of 109,936
Next 10 →

Table 3. Comparison of recognition results for monophone and monograph acoustic sub-word models (VERBMOBIL only).

in Context-Dependent Acoustic Modeling Using Graphemes For Large Vocabulary Speech Recognition
by S. Kanthak, H. Ney 2002
"... In PAGE 2: ...1. Context-Independent Sub-Word Models Table3 compares recognition results on the German VERB- MOBIL corpus for phonetic and graphemic sub-word units if Table 2. Baseline recognition results using manually designed phonetic pronunciation lexica with variants and state-tying based on context-dependent triphones.... In PAGE 2: ... the duration of the graphemic HMM state sequences for German sounds like sch or ch is much longer than a single phone and does not match the true acoustic lengths anymore. This observation is emphasized by the large amount of deletions in Table3 and the larger number of running graphemes as can be seen in Table 1. Table 3.... ..."
Cited by 8

Table 1: Sub-word recognition units in text-prompted SV experiments on YOHO

in CAVE - Speaker Verification in Banking and Telecommunications
by David James, Hans-peter Hutter, Fréderic Bimbot
"... In PAGE 6: ... Varying amounts of enrolment speech were used to train speaker-specific models, ranging from just the data from a single enrolment session of that speaker, to all the data from all four sessions. The particular form of model depended on the SV approach taken; in text-independent mode, a single model was trained for each speaker using many differing utterances; in text-prompted mode, a set of 17 speaker-specific sub-word models, as illustrated in Table1 , was required, to allow... ..."

Table 1: Sub-word recognition units in text-prompted SV experiments on YOHO

in CAVE -- Speaker Verification in Banking and Telecommunications
by David James, Hans-peter Hutter, Fréderic Bimbot
"... In PAGE 6: ... Varying amounts of enrolment speech were used to train speaker-specific models, ranging from just the data from a single enrolment session of that speaker, to all the data from all four sessions. The particular form of model depended on the SV approach taken; in text-independent mode, a single model was trained for each speaker using many differing utterances; in text-prompted mode, a set of 17 speaker-specific sub-word models, as illustrated in Table1 , was required, to allow... ..."

Table 4: Distributional properties of word and subword units.

in BREF, a Large Vocabulary Spoken Corpus for French
by Lori F. Lamel, Jean-luc Gauvain, Maxine Eskenazi, Maxine Eskenazi Limsi-cnrs 1991
"... In PAGE 2: ... This unit has been successfully used for speech recognition and speech synthesis in French[7, 8], in part because French vowels are acoustically relatively stable over time. Counts for the different units are given in Table4 . Of the almost 4.... ..."
Cited by 53

Table 1: Speech recognition performance for name dial-

in A User-Configurable System for Voice Label Recognition
by R. C. Rose, E. Lleida, G. W. Erhart, R. V. Grubbe
"... In PAGE 2: ... The phonemes r j;n 2Pthat are pro- duced as part of the phonetic baseform for class j are taken from the set of 43 vocabulary independent phones described above. The optimum phonetic baseform is obtained by R j = arg max R2P Pn28Y j j Rn29 : n281n29 Speech recognition performance was measured for sev- eral din0berent conn0cgurations of a name dialing system and displayed in Table1 . Fortyn7bthree subword acoustic mod- els were trained using three state leftn7bton7bright HMM apos;s from the corpus described above.... In PAGE 2: ... Fortyn7bthree subword acoustic mod- els were trained using three state leftn7bton7bright HMM apos;s from the corpus described above. Table1 describes each system in terms of the number of mixtures per state, the size of the speaker dependent speech recognition vocabulary, and the procedure used for obtaining phonetic baseforms for the vocabulary words. Since each of the 56 speakers in the trial created a separate recognition vocabulary, speech recogni- tion performance was measured individually using a sepa- rate lexicon for each speaker and then averaged.... In PAGE 2: ... Since each of the 56 speakers in the trial created a separate recognition vocabulary, speech recogni- tion performance was measured individually using a sepa- rate lexicon for each speaker and then averaged. The n0crst rowof Table1 gives recognition performance when the full vocabulary for each speaker is active during recognition and phonetic baseforms were obtained n5cauto- matically quot; from an average of three enrollment utterances per word. A level of 96.... In PAGE 2: ... In Section 4, several techniques are investigated for verifying the presence of a keyword within an utterance by den0cning a reduced vocabulary of n0cvewords per speaker, and con- sidering the remaining words as n5coutn7bofn7bvocabulary. quot; The third rowof Table1 shows that errorn7brate decreased over 60n25 when using the smaller vocabulary. A discussion of the word verin0ccation results will be given in Section 4.... In PAGE 2: ... A discussion of the word verin0ccation results will be given in Section 4. Fi- nally, the last rowof Table1 describes the performance of a system which obtains phonetic baseforms for vocabulary words using the pronunciation engine from the Bell Labs textn7bton7bspeech system n5b10n5d. A single phonetic expansion was obtained for eachword.... ..."

Table 1: Recognition results obtained with the DRM-constrained log-area ratios with the two different acoustic estimators.

in Lpc Modeling With Speech Production Constraints
by Sacha Krstulovic
"... In PAGE 3: ... with the same order than the DRM but a different repartition of lengths produces a higher modeling error. Speech recognition accuracy Table1 shows the word error rates obtained on a medium vocabulary, speaker independent speech recognition task [Krs99]. Results indicate that Log Area Ratios (LAR) inheriting the DRM constraints perform better than LAR corresponding to an 8th order LPC model for both reflection... ..."

Table 2 Examples of indexing terms for di erent subword units. Subword Unit Indexing Terms

in Subword-based Approaches for Spoken Document Retrieval
by Kenney Ng, Victor W. Zue
"... In PAGE 13: ... The subwords are derived from clean phonetic transcriptions of the spoken documents. phone sequence subword units for the phrase \weather forecast quot; are given in Table2 . For large enough n, we see that cross-word constraints can be captured by these units (e.... In PAGE 15: ... In particular, the threshold had to be increased in order to include the ay and oy phones in the c=20 class set. Ex- amples of some broad class subword units (class c=20, length n=4) are given in Table2 . For the NPR spoken document set, the number of unique broad class subword units (c=20, n=4) derived from clean phonetic transcriptions of the speech is 35265 out of a total of cn =204 = 160000 possibilities.... In PAGE 17: ...nd words, i.e., syllables [5]. Syllabic units were generated for the speech mes- sages and queries using these rules, treating the message/query as one long phone sequence with no word boundary information. Examples of some syl- labic subword units are given in Table2 . For the NPR spoken document set, the number of unique syllable units derived from clean phonetic transcriptions of the speech is 5475.... ..."

Table 7: Hungarian crosslingual speech recognition results with the semi-multilingual source acoustic models.

in Crosslingual
by Zdravko Kačič, Klara Vicsi, Gyorgy Szaszak, Frank Diehl, Jozef Juhar, Slavomir Lihan
"... In PAGE 4: ...Table 7: Hungarian crosslingual speech recognition results with the semi-multilingual source acoustic models. The semi-multilingual source acoustic models performed worse ( Table7 ) than the monolingual source acoustic models. The probable cause for this decrease in performance was that only context-independent acoustic models were used.... ..."

Table 4. Word accuracy for academic presentation speech acoustic model

in Benchmark Test For Speech Recognition Using The Corpus . . .
by Tatsuya Kawahara, Hiroaki Nanjo, Takahiro Shinozaki, Sadaoki Furui 2003
"... In PAGE 2: ... Thus, we set up three test-sets. 3 The ID list of the test-sets is given in Table4 and Table 5. 4.... In PAGE 3: ... 5. RECOGNITION RESULTS Word accuracies for academic presentation speech (test-sets 1 and 2) are given in Table4 and those for extemporaneous public speech (test-set 3) are in Table 5. For academic presentations, the gender-independent model trained only with speech samples of the same style achieves the best performance on average.... ..."
Cited by 18

Table 1: The corpus partitioning into training, check (held-out), and test sets, used in the large-vocabulary continuous speech (LVCSR) and connected digit (DIGIT) recognition experiments of Section 5 (number of utterances, duration (in hours), and number of subjects are shown).

in Joint Audio-Visual Speech Processing for Recognition and Enhancement
by Gerasimos Potamianos, Chalapathy Neti, Sabine Deligne 2003
"... In PAGE 6: ...e., (3), (4), and (5), with the stream exponents set to global values, estimated on the held-out sets of Table1 . Joint stream HMM training is also considered.... In PAGE 6: ... LVCSR results are speaker- independent, whereas DIGIT recognition is multi-speaker. For the bimodal enhancement of audio features, stereo pair data consisting of noisy audio-visual and clean audio observa- tions are available on the training sets of Table1 . For the linear approach of Section 4.... ..."
Cited by 2
Next 10 →
Results 1 - 10 of 109,936
Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University