## Discounted likelihood linear regression for rapid speaker adaptation (2001)

### Cached

### Download Links

- [www.clsp.jhu.edu]
- [www.clsp.jhu.edu]
- [svr-www.eng.cam.ac.uk]
- [mi.eng.cam.ac.uk]
- [www.clsp.jhu.edu]
- DBLP

### Other Repositories/Bibliography

Venue: | Comp. Spch. & Lang |

Citations: | 17 - 2 self |

### BibTeX

@ARTICLE{Byrne01discountedlikelihood,

author = {William Byrne and Asela Gunawardana},

title = {Discounted likelihood linear regression for rapid speaker adaptation},

journal = {Comp. Spch. & Lang},

year = {2001},

pages = {15--38}

}

### OpenURL

### Abstract

Rapid adaptation schemes that employ the EM algorithm may suffer from overtraining problems when used with small amounts of adaptation data. An algorithm to alleviate this problem is derived within the information geometric framework of Csiszár and Tusnády, and is used to improve MLLR adaptation on NAB and Switchboard adaptation tasks. It is shown how this algorithm approximately optimizes a discounted likelihood criterion. 1.

### Citations

8167 | Maximum likelihood from incomplete data via the EM algorithm
- Dempster, Laird, et al.
- 1977
(Show Context)
Citation Context ...s. Maximum a posteriori (MAP) adaptation schemes [5, 6] can also be viewed as re-estimating adapted model parameters from speaker independent parameters. These adaptation schemes use the EM algorithm =-=[7]-=-. This well-known maximum likelihood iterative parameter estimation procedure may provide unreliable estimates when the amount of adaptation data is small. As an instance of this we present rapid adap... |

950 |
The EM Algorithm and Extensions
- McLachlan, Krishnan
- 1997
(Show Context)
Citation Context ... solutions for training sets of all sizes. Moment interpolation can be incorporated into a MAP adaptation procedure to make it more robust in the small data case. An EM based MAP estimation procedure =-=[13, 14]-=- can be formulated as alternating minimization under the penalized divergence DMAP(PX; )=D(PX; ),log ( ); where ( ) is the prior. Since the penalty term depends only on the parameter , it does not aff... |

496 | Maximum a Posteriori Estimation for Multivariate Gaussian Mixture Observations of Markov Chains
- Gauvain, Lee
- 1994
(Show Context)
Citation Context ...its iterates improve discounted likelihood criteria. 5.2. Moment Interpolation and MAP Although there is a superficial similarity between the reestimation formulas of the moment interpolation and MAP =-=[5, 13, 6]-=-, the procedures differ in that the moment interpolation algorithm presented here optimizes a maximum likelihood criterion rather than a maximum a posteriori criterion. In fact, the moment interpolati... |

136 |
Information geometry and alternating minimization procedures. Statistics and Decisions
- CSISZAR, TUSNADY
- 1984
(Show Context)
Citation Context ...ection 3 the moment interpolation variant of the EM algorithm. Our analysis of this variant makes use of the information geometric interpretation of the EM algorithm, presented by Csiszár and Tusnády =-=[8]-=- and reviewed in Section 2. In Section 5, it is shown that it approximately optimizes a discounted likelihood criterion [9] intended for use in estimation from small data sets. This is a general modif... |

89 | Fast Speaker Adaptation Using Constrained Estimation of Gaussian Mixtures
- Digalakis, Rtiscbev, et al.
- 1995
(Show Context)
Citation Context ...these adaptation techniques can be thought of as estimating constrained acoustic model parameters. For example, maximum likelihood linear regression (MLLR) [1] re-estimates constrained Gaussian means =-=[2]-=-. In fact, many adaptation schemes such as those of Padmanabhan et al [3] and of McDonough [4] which nominally transform the acoustic data to better match the SI models can be reformulated as transfor... |

58 |
Speaker adaptation of continuous density HMMs using multivariate linear regression
- Leggetter, Woodland
- 1994
(Show Context)
Citation Context ...rameterizations of the acoustic models, and these adaptation techniques can be thought of as estimating constrained acoustic model parameters. For example, maximum likelihood linear regression (MLLR) =-=[1]-=- re-estimates constrained Gaussian means [2]. In fact, many adaptation schemes such as those of Padmanabhan et al [3] and of McDonough [4] which nominally transform the acoustic data to better match t... |

37 | Bayesian Learning for Hidden Markov Model with Gaussian Mixture State Observation Densities
- Gauvain
- 1992
(Show Context)
Citation Context ...better match the SI models can be reformulated as transformations of the acoustic models, and as such can also be treated as model re-estimation schemes. Maximum a posteriori (MAP) adaptation schemes =-=[5, 6]-=- can also be viewed as re-estimating adapted model parameters from speaker independent parameters. These adaptation schemes use the EM algorithm [7]. This well-known maximum likelihood iterative param... |

36 | Speaker clustering and transformation for speaker adaptation in speech recognition systems
- Padmanabhan, Bahl, et al.
- 1998
(Show Context)
Citation Context ...coustic model parameters. For example, maximum likelihood linear regression (MLLR) [1] re-estimates constrained Gaussian means [2]. In fact, many adaptation schemes such as those of Padmanabhan et al =-=[3]-=- and of McDonough [4] which nominally transform the acoustic data to better match the SI models can be reformulated as transformations of the acoustic models, and as such can also be treated as model ... |

29 |
Structural MAP speaker adaptation using hierarchical priors
- Shinoda, Lee
- 1997
(Show Context)
Citation Context ...better match the SI models can be reformulated as transformations of the acoustic models, and as such can also be treated as model re-estimation schemes. Maximum a posteriori (MAP) adaptation schemes =-=[5, 6]-=- can also be viewed as re-estimating adapted model parameters from speaker independent parameters. These adaptation schemes use the EM algorithm [7]. This well-known maximum likelihood iterative param... |

15 | Speaker normalization with all-pass transforms
- McDonough, Byrne, et al.
- 1998
(Show Context)
Citation Context ...ers. For example, maximum likelihood linear regression (MLLR) [1] re-estimates constrained Gaussian means [2]. In fact, many adaptation schemes such as those of Padmanabhan et al [3] and of McDonough =-=[4]-=- which nominally transform the acoustic data to better match the SI models can be reformulated as transformations of the acoustic models, and as such can also be treated as model re-estimation schemes... |

8 | Generalization and maximum likelihood from small data sets
- Byrne
- 1993
(Show Context)
Citation Context ...ometric interpretation of the EM algorithm, presented by Csiszár and Tusnády [8] and reviewed in Section 2. In Section 5, it is shown that it approximately optimizes a discounted likelihood criterion =-=[9]-=- intended for use in estimation from small data sets. This is a general modification to EM that can be used in many applications. In particular, it can be used to alleviate overtraining in the adaptat... |

6 | Rapid Speech Recognizer Adaptation to New Speakers
- Digalakis, Berkowitz, et al.
- 1999
(Show Context)
Citation Context ...hion, and the adapted models were tested on data that did not overlap with the adaptation data according to the protocol defined in the Rapid Speaker Adaptation project at the 1999 JHU LVCSR workshop =-=[12]-=-. This protocol determines how well transformations learned on the small adaptation data generalize to unseen data, and also provides a test set large enough to obtain reliable measurements of perform... |

4 |
Convergence of EM variants
- Gunawardana, Byrne
- 1999
(Show Context)
Citation Context ... Figure 2. By using the convexity of the divergence, it can be shown that this choice of P (p+1) X D(P (p+1) X ; guarantees that (p) ) D(P (p) X ; (p) ) with equality only if P (p) X = Q XjY =^y; (p) =-=[10]-=-. This can be seen to be a GAM procedure as described by Gunawardana and Figure 2. A schematic illustration of the moment interpolation algorithm. Byrne [10, 11]; the algorithm converges to the same p... |