## Segmental minimum Bayes-risk decoding for automatic speech recognition (2003)

Venue: | IEEE Transactions on Speech and Audio Processing |

Citations: | 32 - 9 self |

### BibTeX

@INPROCEEDINGS{Goel03segmentalminimum,

author = {Vaibhava Goel and Shankar Kumar and William Byrne},

title = {Segmental minimum Bayes-risk decoding for automatic speech recognition},

booktitle = {IEEE Transactions on Speech and Audio Processing},

year = {2003}

}

### Years of Citing Articles

### OpenURL

### Abstract

Abstract—Minimum Bayes-Risk (MBR) speech recognizers have been shown to yield improvements over the conventional maximum a-posteriori probability (MAP) decoders through N-best list rescoring and search over word lattices. We present a Segmental Minimum Bayes-Risk decoding (SMBR) framework that simplifies the implementation of MBR recognizers through the segmentation of the N-best lists or lattices over which the recognition is to be performed. This paper presents lattice cutting procedures that underly SMBR decoding. Two of these procedures are based on a risk minimization criterion while a third one is guided by word-level confidence scores. In conjunction with SMBR decoding, these lattice segmentation procedures give consistent improvements in recognition word error rate (WER) on the Switchboard corpus. We also discuss an application of risk-based lattice cutting to multiple-system SMBR decoding and show that it is related to other system combination techniques such as ROVER. This strategy combines lattices produced from multiple ASR systems and is found to give WER improvements in a Switchboard evaluation system. Index Terms—ASR system combination, extended-ROVER, lattice cutting, minimum Bayes-risk decoding, segmental minimum

### Citations

2755 |
Applied Dynamic Programming
- Bellman, Dreyfus
- 1962
(Show Context)
Citation Context ... paths and � � � � . We first note that if contained the alignment of only one word string � � � against , we could find the desired optimal alignment through a standard dynamic programm=-=ing procedure [14], [15], [16] -=-that traverses the nodes of � � in topologically sorted order and retains backpointers to the optimal partial paths to all August 6, 2003 DRAFT right ε . 1/1 �s� � � nodes. However, since... |

396 |
Time Warps, String Edits and Macromolecules: the Theory and Practice of Sequence Comparisons
- Sankoff, Kruskal
- 1983
(Show Context)
Citation Context ... and � � � � . We first note that if contained the alignment of only one word string � � � against , we could find the desired optimal alignment through a standard dynamic programming pr=-=ocedure [14], [15], [16] that t-=-raverses the nodes of � � in topologically sorted order and retains backpointers to the optimal partial paths to all August 6, 2003 DRAFT right ε . 1/1 �s� � � nodes. However, since conta... |

344 | A post-processing system to yield reduced word error rates: Recognizer output voting error reduction (rover
- Fiscus
- 1997
(Show Context)
Citation Context ...edure segmental MBR voting. This simplification has been utilized in several recently developed Nbest list and lattice based hypothesis selection procedures to improve the recognition word error rate =-=[5], -=-[6], [7]. This summarizes the relationship between SMBR decoding, MAP decoding and segmental MBR voting. From Equation 8, if no lattice cutting had been done, MBR decoding under the 0/1 � loss funct... |

191 | Finding consensus in speech recognition: word error minimization and other applications of confusion networks - Mangu, Brill, et al. - 2000 |

152 | Weighted finite-state transducers in speech recognition. Computer Speech and Language
- Mohri, Pereira, et al.
- 2002
(Show Context)
Citation Context ...y a compact representation for very large N-best lists and their likelihoods. Formally, it is a directed acyclic graph, or an acyclic weighted finite state ¤ acceptor ¤�� � (WFSA) �¦¥���=-=�¨§�©��¨���¨� � [8] with a finite set of states (nodes) ¤ ¤ � ¤ -=-��� � §�© ¥ � ¤���� �¨§�� � ��� ¥ � � , a set of transition labels , an initial state , the set of final states , and a finite set of transitions . The set... |

102 | The design principles of a weighted finite-state transducer library
- Mohri, Pereira, et al.
- 2000
(Show Context)
Citation Context ...ling � To compute the Levenshtein distance, the possible single-symbol edit operations (insertion, deletion, substitution) and their costs can be readily represented by a simple weighted transducer =-=¢ [13]. ¢ is c-=-onstructed to respect the position of words in � � (See Figure 3). Furthermore, we can reduce the size of this transducer by including only transductions that map words on the transitions � � ... |

96 |
Binary codes capable of correcting spurious insertions and deletions of ones. Probl
- Levenshtein
- 1965
(Show Context)
Citation Context ...escribes the cost incurred when an utterance ��� � � ��� � � belonging to language ��� is mistranscribed as . An example loss function, the one that we focus on in this paper=-=, is Levenshtein distance [1] that me-=-asures the minimum string edit distance (word error rate) between � and ��� . This loss function is defined as the minimum number of substitutions, insertions and deletions needed to transform... |

70 | Explicit word error minimization in n-best list rescoring
- Stolcke, Koenig, et al.
- 1997
(Show Context)
Citation Context ...te state operations are performed using the AT&T Finite State Toolkit [9]. It is conceptually possible to enumerate all lattice paths and explicitly compute the MBR hypothesis according to Equation 4 =-=[10]. H-=-owever, for most large vocabulary ASR systems it is computationally intractable to do so. Goel et. al. [11] described an £ � search algorithm that utilizes the lattice structure to search for the M... |

59 | Posterior probability decoding, confidence estimation and system combina
- Evermann, Woodland
(Show Context)
Citation Context ...ental MBR voting. This simplification has been utilized in several recently developed Nbest list and lattice based hypothesis selection procedures to improve the recognition word error rate [5], [6], =-=[7]. -=-This summarizes the relationship between SMBR decoding, MAP decoding and segmental MBR voting. From Equation 8, if no lattice cutting had been done, MBR decoding under the 0/1 � loss function would ... |

58 |
Minimum Bayes-risk automatic speech recognition
- Goel, Byrne
- 2000
(Show Context)
Citation Context ...erate all lattice paths and explicitly compute the MBR hypothesis according to Equation 4 [10]. However, for most large vocabulary ASR systems it is computationally intractable to do so. Goel et. al. =-=[11] de-=-scribed an £ � search algorithm that utilizes the lattice structure to search for the MBR word string. Building on that approach, we present lattice node based segmentation procedures in which each... |

49 | Using Word Probabilities as Confidence Measures
- Wessel, Macherey, et al.
- 1998
(Show Context)
Citation Context ...¥ ��� ¨ � . In this case, the marginal probability of � � paths that pass through � � � will be obtained by summing over all lattice � . This is the well known lattice forward-ba=-=ckward probability of [19]. � Havings� �-=- ��� £�� obtained , the SMBR hypothesis can be computed using £ � the search procedure described by Goel et. al. [11]. Alternatively, an N-best list can be generated from each segment an... |

30 | Towards language independent acoustic modeling
- Beyerlein
- 1999
(Show Context)
Citation Context ...RING PERFORMANCE. 1) N-best List Combination: The experiments involving combination of N-best lists from multiple systems were performed on a multi-lingual language independent acoustic modeling task =-=[23]-=-. This task consisted of combining recognition outputs in Czech language from three systems : a triphone system trained on one hour of Czech voice of America (Cz-VOA) database (Sys1); a triphone syste... |

28 | General-purpose finite-state machine software tools - Mohri, Pereira, et al. - 1997 |

24 | Discriminative linear transforms for feature normalization and speaker adaptation in HMM estimation
- Tsakalidis, Doumpiotis, et al.
- 2005
(Show Context)
Citation Context ...MI acoustic models were used to generate an initial set of lattices under the SRI 33K trigram language model [20]. These lattices were then rescored with DLLT acoustic models and DSAT acoustic models =-=[26]-=- to yield two other sets of lattices. These three sets of lattices were then used for system combination as described in Section V. The performance of the lattice combination experiments is reported i... |

21 | Explicit word error minimization using word hypothesis posterior probabilities
- Wessel, Schluter, et al.
- 2001
(Show Context)
Citation Context ... the lattice from Figure 7 (Period = 2). 0.7 4.0 3.9 3.6 !sent_end/0.9 !sent_end/0.7 !sent_end/0.7 tence hypotheses that avoids computing the alignment corresponding to the exact Levenshtein distance =-=[18]. As b-=-efore, we begin by identifying the MAP lattice path. We compute the confidence score of ¡ ¢ � the link lattice link � on that path as follows: 1) Compute the lattice forward-backward probability... |

12 | Segmental minimum bayes-risk ASR voting strategies
- Goel, Kumar, et al.
- 2000
(Show Context)
Citation Context ...s the original search problem into a series of search problems, which due to their smaller sizes, can be more easily solved. These strategies are collectively referred to as segmental MBR recognition =-=[3]. For rigor, we introduce a s-=-egmentation rule ¡ � � intossegments of zero or more words each. We denote the ¡£¢ this way, we impose a segmentation of the space � into segment sets � � � ¤ � � � � � � ... |

9 | Confidence based lattice segmentation and minimum Bayes-risk decoding
- Goel, Kumar, et al.
- 2001
(Show Context)
Citation Context ... risk-based lattice cutting on the lattice from Figure 7. C. Cut Set Selection Based on Word Confidence Our next procedure to identify good lattice cutting node sets uses word level confidence scores =-=[17]-=-. In this procedure word boundary times are used to derive alignment between senAugust 6, 2003 DRAFT 17shello/0.7 how/0.9 now/0.7 now/0.9 2.9 2.9 well/0.9 3.1 o/0.9 how/0.9 2.7 1.2 1.6 are/0.7 are/0.9... |

8 | Risk based lattice cutting for segmental minimum bayes risk decoding
- Kumar, Byrne
- 2002
(Show Context)
Citation Context ... on Total Risk Our first lattice cutting procedure is motivated by the observation that under an ideal segmentation the conditional risk of each hypothesis word string is unchanged after segmentation =-=[12]. The conditional risk after the segme-=-ntation is computed under the marginal distribution of Equation 6. Consequently the total conditional risk of all lattice hypotheses ¡ � ¤ � ¨������ � ¨���� � ���... |

8 | Edit-Distance of Weighted Automata
- Mohri
- 2003
(Show Context)
Citation Context ... � � � . We first note that if contained the alignment of only one word string � � � against , we could find the desired optimal alignment through a standard dynamic programming procedure =-=[14], [15], [16] that travers-=-es the nodes of � � in topologically sorted order and retains backpointers to the optimal partial paths to all August 6, 2003 DRAFT right ε . 1/1 �s� � � nodes. However, since contains al... |

6 |
The JHU March 2001 Hub-5 Conversational Speech Transcription System
- Byrne
- 2001
(Show Context)
Citation Context ...ed version of SRI 33K trigram language model and then again using SAT acoustic models with unsupervised MLLR on the test set. Details of the system are given in JHU 2001 LVCSR Hub5 system description =-=[22]-=-. Lattices were segmented using the three procedures described in this article: risk based lattice cutting (Section III-B.5), periodic risk based lattice cutting (Section III-B.6), and confidence base... |

2 |
Recognizer output voting and DMC in minimum Bayes-risk framework
- Goel, Byrne
- 2000
(Show Context)
Citation Context ... � � � ��� distributes over the segmentation, i.e. � ��� � � � ¤ � � � � ¨ � � � � � ¡ � ¡ � � � � � � (7) � ��� August 6, 2003 DR=-=AFTsUnder this assumption, we can now state the following proposition [4]. Proposition. An utterance level MBR recognizer given by � £�� ¤s� � can be implemented as a concatenation where � �-=-��s����� � � ¨������ � � ¤�� � ¨������ ¡ � ¤������������ � � � ¨���� � � ¨���� � ��� ... |

2 |
The JHU 2002 large vocabulary speech recognition system
- Byrne, Doumpiotis, et al.
- 2002
(Show Context)
Citation Context ...nd their SMBR decoding were carried out on the development set of the LVCSR RT-02 evaluation. A description of the acoustic and language models used is given in the JHU LVCSR RT-02 system description =-=[25]-=-. In this system, MMI acoustic models were used to generate an initial set of lattices under the SRI 33K trigram language model [20]. These lattices were then rescored with DLLT acoustic models and DS... |

2 |
Lattice segmentation and minimum Bayes-risk discriminative training
- Doumpiotis, Tsakalidis, et al.
- 2003
(Show Context)
Citation Context ...s consistent improvements as the final stage of an LVCSR evaluation system. In addition, the risk based cutting procedure has been shown to form the basis for novel discriminative training procedures =-=[27]-=-. We note that the two cutting procedures give similar WER performance although the risk based cutting procedure is more suited to system combination since it does not rely on word boundary times whic... |