2014c. Predictive Translation Memory: A mixed-initiative system for human language translation
Cached
Download Links
Venue: | In UIST |
Citations: | 1 - 1 self |
BibTeX
@INPROCEEDINGS{Green_2014c.predictive,
author = {Spence Green and Jason Chuang and Jeffrey Heer and Christopher D Manning},
title = {2014c. Predictive Translation Memory: A mixed-initiative system for human language translation},
booktitle = {In UIST},
year = {}
}
OpenURL
Abstract
ABSTRACT The standard approach to computer-aided language translation is post-editing: a machine generates a single translation that a human translator corrects. Recent studies have shown this simple technique to be surprisingly effective, yet it underutilizes the complementary strengths of precision-oriented humans and recall-oriented machines. We present Predictive Translation Memory, an interactive, mixed-initiative system for human language translation. Translators build translations incrementally by considering machine suggestions that update according to the user's current partial translation. In a largescale study, we find that professional translators are slightly slower in the interactive mode yet produce slightly higher quality translations despite significant prior experience with the baseline post-editing condition. Our analysis identifies significant predictors of time and quality, and also characterizes interactive aid usage. Subjects entered over 99% of characters via interactive aids, a significantly higher fraction than that shown in previous work. Author Keywords Language translation; interface design; mixed-initiative; empirical study. ACM Classification Keywords H.5.2 Information Interfaces: User Interfaces; I.2.7 Natural Language Processing: Machine Translation Language translation has all the makings of a mixed-initiative task We present Predictive Translation Memory (PTM), an interactive, mixed-initiative system for language translation. Translation memory is a standard term that refers to a set of bilingual string-string mappings usually consulted via text queries. Our system can be seen as an intelligent translation memory that interactively suggests translations based on user activity. The interface provides source (input language) term lookups, local target (output language) suggestions at the point of text entry If a principal problem in the design of interactive knowledgebased systems is the transfer of expertise from human to machine To test the system we conducted the largest published interactive MT user study to date. We hired 32 professional FrenchEnglish and English-German translators, all of whom were regular users of existing computer-aided translation (CAT) tools. We compared our system to post-editing, which is a strong baseline RELATED WORK The idea of a "human-machine" partnership for language translation-a mixed-initiative design-was proposed as early as 1960 Theorized Interactive MT Systems Bisbey and Kay In a survey of qualitative studies, Church and Hovy [14] concluded that users regarded post-editing as "an extremely boring, tedious, and unrewarding chore." They proposed a "superfast typewriter" with an autocomplete key that could fill in the remainder of a word or phrase. Our system draws heavily on their idea of interactive MT as target-text completion. Evaluated Interactive MT Systems Early interactive MT systems focused on source pre-editing rather than target generation. Loh and Kong [35] presented a Chinese-to-English system in which human translators annotate the input extensively (phrase boundaries, word senses, etc.). Unpublished results showed greatly reduced post-editing effort to achieve human quality To our knowledge, TransType was the first interactive system TransType2 [16] added a playback mechanism for reviewing user sessions Caitra Casmacat [2] is the successor of Caitra. It shares the same backend MT engine, but has a new UI [1] that supports post-editing, text completion, and term lookup. However, the interface is the standard two-column layout and the full MT suggestion is not always available for gisting, a feature that users have found useful in previous studies The system of Barrachina et al. Collaborative Translation Collaborative translation can be seen as an alternate mode of interactive assistance, albeit a slow one. Morita and Ishida Hu et al. Mixed-Initiative Interaction Principles We believe that the failure of previous interactive MT systems (in user studies) may result from known pitfalls of mixedinitiative design. For example, consider Horvitz's [24] principle #2: considering uncertainty about a user's goals. Most previous systems violate this principle by assuming that users need either source or target aids, but not both, or neither. Early interactive systems assumed that pre-editing (source) was most useful Also relevant is Horvitz's principle #8: minimizing the cost of poor guesses about action and timing. Later systems like Caitra expose portions of the MT system such as translation rules and associated scores directly on the interface. Confidence is usually coded with color. However, MT systems almost certainly contain a very different internal representation of the translation process than humans. Human translators may not understand why, for example, MT systems can propose non-grammatical and incorrect translations like avec⇒them with with high confidence. The translation model is full of these noisy rules that can be very useful to the machine, but uninterpretable to the human. Our interface applies rules to aggregated k-best predictions to select human-interpretable, high-confidence suggestions. The design of PTM draws on additional principles of mixedinitiative design. As a baseline, generating automatic machine translations follows Horvitz's principle #1: developing significant value-added automation. PTM users can also select alternate translations from a drop-down menu or simply type the desired target text, both in keeping with principle #5: employing dialog to resolve key uncertainties. Following principle #6: allowing efficient direct invocation and termination, interactive translation aids are easily toggled on and off with the Escape key, and source word lookups are invoked only upon mouse hover of source text. Real-time updates of machine translations in response to user input enact principle #9: providing mechanisms for efficient agent-user collaboration to refine results. Finally, visualizing source coverage of translated words supports principle #11: maintaining working memory of recent interactions. PREDICTIVE TRANSLATION MEMORY The Predictive Translation Memory system is designed for expert, bilingual translators. Previous studies have shown that professional translators work quickly-they are paid by source words translated-and are usually touch typists The system has three components. The client UI is written in JavaScript and runs entirely in a web browser. The UI communicates via a RESTful API with the web service, which is written in Python and backed by a SQL database. The web service manages translation sessions, serving source documents and recording user actions. The web service also forwards translation requests to the MT service, which is a Java servlet running in a J2EE web server. The MT service runs the open source Phrasal MT system, which we heavily modified to support PTM In this section, we focus on the UI design decisions. We applied an iterative design process using paper prototyping, rapid prototyping of the client UI connected to the live MT service, a small-scale pilot study, and finally the large-scale user study described in this paper. Many UI design decisions required significant backend engineering which, in turn, enabled novel interactions. For example, real-time suggestion updating requires the MT service to generate translations at nearly human typing speed. UI Overview and Walkthrough We categorized interactions into three groups: source comprehension, target gisting, and target generation. The following outline summarizes the interactions, which are detailed in the following sections. Although the specific design of each feature is novel, those in bold have, to our knowledge, never appeared in a translation workbench: Human and machine translations appear together in the target text box. During prototyping we found that users were very sensitive to updates in the text box. They wanted to edit the machine suggestions using conventional text manipulation (cut/paste, etc.) rather than the autocomplete interactions. To clarify ownership of regions of the textbox, we adopted the following target text convention: Black text belongs to the human translator and is never modified by the machine. Gray text belongs to the machine and is never modified by the human translator. Interactions allow the user to accept portions of the gray text, which becomes black. Subsequent tests showed that users learned to trust that black text is inviolate, and that gray text is only accessible through certain interactions. Source Comprehension Word Lookup Users often trace the source with the mouse cursor while reading Source Coverage The interface predicts which source words have already been translated and shades them in blue In pilot experiments we found that the raw alignments were too noisy to show to users. We thus developed MT rule-level heuristics that filter the alignments returned to the interface. Target Gisting The most common use of MT output is gisting [31, p.21]. A rough translation is often sufficient to convey meaning. Translators find MT useful as an initial draft Full Best Translation The gray text below each black source input shows the best MT system output Real-time Updating When Joe starts working on a source sentence, the gray text will update to the most probable completion Target Generation The target textbox shows both the user and machine state simultaneously. This allows Joe to accept parts of the machine suggestion without touching the mouse. The black portion is a text editor: Joe can cut, copy, paste, or otherwise manipulate the black text. However, the gray text is immutable. It cannot be highlighted with the cursor or changed. Joe accesses it through three interactions. Autocomplete Dropdown The autocomplete dropdown at the point of text entry is the main translation aid The suggestion length is based on the syntax of the source language. As an offline, pre-processing step, we create syntactic parses of the source input with Stanford CoreNLP Target Reordering So far we have assumed a left-to-right generation scheme, but that design fails for long-distance reordering. For example, in English-to-German translation, some verbs will need to be moved to the very end of a sentence. To that end, the UI supports keyboard-based reordering. Suppose that Joe sees the (partially correct) suggestion Wirtschaftliche Offences 'economic offences' in the gray text ( Insert Complete Translation At any time, Joe can accept the full completion by pressing the Control+Enter hot key. Notice that if the user presses this hot key immediately, the full suggestion is inserted, and the interface is effectively a post-editor. This feature greatly accelerates translation when the MT is mostly correct, and the user only wants to make a few changes. Layout and Typographical Design Carl [12, p.11] showed that translators spend up to 20% of any translation session reading source text and revising target text, and that harder translations can significantly increase this fraction. However, we noticed that most translator workbenches are optimized for typing, and conform to a tabular, two-column spreadsheet layout-source and target are aligned by row. A spreadsheet design may not be optimal for reading text passages. Our UI is based on a single-column layout so that the text appears as it would in a document. Sentences are offset from one another primarily because current MT systems process input at the sentence-level. We interleave target-text typing boxes with the source input to minimize gaze shift between source and target. Contrast this with a two-column layout in which the source and target focus positions are nearly always separated by the width of a column. The compact, single-column layout can obscure the boundaries between source and target, especially for languages with similar writing systems. We found that rendering source and target in different typefaces restored legibility. In our UI, source is rendered in a serifed font, which is commonly used for body text The target text appears in a monospaced, sans-serif font. Monospaced fonts are conventional for text entry forms. We chose the Paratype 2 font family, which features a large x-height for more readable type Summary of MT Service Statistical MT systems come in two general flavors: phrasebased and hierarchical/syntactic. Phrase-based systems decode input (i.e., search for translations) left-to-right and can run in O(n) time. Hierarchical/syntactic systems are not restricted to left-to-right processing, but decode with the slower O(n 3 ) CKY parsing algorithm. Although the left-to-right constraint may not necessarily correspond to the human translation process, we found in pilot studies that users tended to value speed and responsiveness, hence we chose a phrase-based system.