@MISC{Donovan96trainablespeech, author = {R. Donovan}, title = {Trainable Speech Synthesis}, year = {1996} }
Years of Citing Articles
Bookmark
OpenURL
Abstract
dressed through improvements to its transcription, clustering, and segmentation capabilities. The LP synthesis scheme was replaced by a TD-PSOLA scheme which synthesised speech by concatenating waveform segments selected to represent each clustered state. The final system produced speech which, though in a monotone, was natural sounding, remarkably fluent, and highly intelligible. The segmental intelligibility was measured using the Modified Rhyme Test, and a 5.0% error rate obtained. The speech produced by the system mimicked the voice of the speaker used to record the training database. The system could be retrained on a new voice in less than 48 hours, and has been successfully trained on four voices. Acknowledgements There are a very large number of people to thank in connection with this work. I shall begin at the beginning, by thanking my original supervisor, the late Professor Frank Fallside. To him I am deeply grateful, both for letting me join the CUED Speec