Results 1 -
3 of
3
Automatic Discovery of Non-Compositional Compounds in Parallel Data
, 1997
"... Automatic segmentation of text into minimal content-bearing units is an unsolved problem even for languages like English. Spaces between words offer an easy first approximation, but this approximation is not good enough for machine translation (MT), where many word sequences are not translated word- ..."
Abstract
-
Cited by 58 (1 self)
- Add to MetaCart
Automatic segmentation of text into minimal content-bearing units is an unsolved problem even for languages like English. Spaces between words offer an easy first approximation, but this approximation is not good enough for machine translation (MT), where many word sequences are not translated word-for-word. This paper presents an efficient automatic method for discover- ing sequences of words that are translated as a unit. The method proceeds by comparing pairs of statistical translation models induced from parallel texts in two languages. It can discover hundreds of noncompositional compounds on each iteration, and constructs longer compounds out of shorter ones. Objective evaluation on a simple machine translation task has shown the method's potential to improve the quality of MT output. The method makes few assumptions about the data, so it can be applied to parallel data other than parallel texts, such as word spellings and pronunci- ations.
Automatic phonetic baseform determination
- in Proceedings IEEE International Conference on Acoustics, Speech and Signal Processing
, 1991
"... Phonetic baseforms are the basic recognition units in most large vocabulary speech recognition systems. These base-forms are usually determined by hand once a vocabulary is chosen and not modified thereafter. However, many applica-tions of speech recognition, such as dictation transcription, are ham ..."
Abstract
-
Cited by 12 (0 self)
- Add to MetaCart
Phonetic baseforms are the basic recognition units in most large vocabulary speech recognition systems. These base-forms are usually determined by hand once a vocabulary is chosen and not modified thereafter. However, many applica-tions of speech recognition, such as dictation transcription, are hampered by a fixed vocabulary and require the user be able to add new words to the vocabulary. At least one phonetic base-form must be assigned to each new word to properly integrate the word into the recognition system. Dictionary lookup is of-ten unsuccessful in determining a phonetic baseform because new words are often names or task-specific jargon; also, talk-ers tend to have idiosyncratic pronunciations for a substantial fraction of words. This paper describes a series of experiments in which the phonetic baseform is deduced automatically for new words by utilizing actual utterances of the new word in conjunction with a set of automatically derived spelling-to-sound rules. We evaluated recognition performance on new words spoken by two different talkers when the phonetic base-forms were extracted via the above approach. The error rates on these new words were found to be comparable to or better than when the phonetic baseforms were derived by hand, thus validating the basic approach. 1
Pronunciation Modeling in Speech Synthesis
, 1998
"... iii ACKNOWLEDGMENTS I am very pleased to have had the encouragement and support of a committee of three linguists for whom I have the greatest respect and admiration: Mark Liberman, William Labov and Eugene Buckley. Each of them made my transition back to Penn pleasant after what seemed like a long ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
iii ACKNOWLEDGMENTS I am very pleased to have had the encouragement and support of a committee of three linguists for whom I have the greatest respect and admiration: Mark Liberman, William Labov and Eugene Buckley. Each of them made my transition back to Penn pleasant after what seemed like a long absence. It was a great pleasure to have Mark Randolph both as an external reader and as a colleague at Motorola. Mark’s work at MIT a decade ago has served as an inspiration to me. Orhan Karaali made this dissertation possible in this millennium. As my manager for over two years at Motorola, Orhan insisted on making my dissertation a priority at work. Harry Bliss provided his voice to this project and our whole group is very grateful for his patience and cooperation. My colleagues at Motorola listened to my ideas and provided technical and theoretical assistance at every turn: Noel

