Results 1 - 10
of
26
Finding Terminology Translations From Non-Parallel Corpora
, 1997
"... this paper, we present an initial algorithm for translating technical terms using a pair of non-parallel corpora. Evalution results show translation precisions at around 30% when only the top candidate is considered. While this precision is lower than that achieved with parallel corpora, we show tha ..."
Abstract
-
Cited by 34 (3 self)
- Add to MetaCart
this paper, we present an initial algorithm for translating technical terms using a pair of non-parallel corpora. Evalution results show translation precisions at around 30% when only the top candidate is considered. While this precision is lower than that achieved with parallel corpora, we show that top 20 candidate output from our algorithm allows translators to increase their accuracy by 50.9%. In the following sections, we first describe a pair of non-parallel corpora we use for experiments, and then we introduce the Word Relation Matrix (WoRM), a statistical word feature representation for technical term translation from non-parallel corpora. We evaluate the effectiveness of this feature with two sets of experiments, using English/English, and English/Japanese non-parallel corpora. 2. BACKGROUND
Methods of Automatic Term Recognition - A Review
, 1996
"... Following the growing interest in "corpus-based" approaches to computational linguistics, a number of studies have recently appeared on the topic of automatic term recognition or extraction. Because a successful term recognition method has to be based on proper insights into the nature of terms, stu ..."
Abstract
-
Cited by 24 (1 self)
- Add to MetaCart
Following the growing interest in "corpus-based" approaches to computational linguistics, a number of studies have recently appeared on the topic of automatic term recognition or extraction. Because a successful term recognition method has to be based on proper insights into the nature of terms, studies of automatic term recognition not only contribute to the applications of computational linguistics but also to the theoretical foundation of terminology. Many studies on automatic term recognition treat interesting aspects of terms, but most of them are not well founded and described. This paper tries to give an overview of the principles and methods of automatic term recognition. For that purpose, two major trends are examined, i.e. studies in automatic recognition of significant elements for indexing mainly carried out in information retrieval circles, and current research in automatic term recognition in the field of computational linguistics. Keywords Automatic term recognition, au...
Empirical Observation of Term Variations and Principles for their Description
, 2000
"... Contents 1 Introduction 2 1.1 Do terms vary? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2 A Symbolic Framework for the Study of Terminological Variation . . . . . . . . . . . . . . . 4 2 The Most Common Types of English Two-word Terms 7 2.1 Adjective N ..."
Abstract
-
Cited by 23 (0 self)
- Add to MetaCart
Contents 1 Introduction 2 1.1 Do terms vary? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2 A Symbolic Framework for the Study of Terminological Variation . . . . . . . . . . . . . . . 4 2 The Most Common Types of English Two-word Terms 7 2.1 Adjective Noun (A N) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.2 Noun Noun (N 2 N 1 ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.3 Noun Preposition Noun (N 1 P N 2 ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 3 Observing and Representing Term Variants 9 3.1 An Observation of Term Variants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 3.2 A Two-level Lexico-syntactic Description of Terms . . . . . . . . . . . . . . . . . . . . . . . 11 3.3 Two Families of Grammatical Rules . .
Term Extraction and Automatic Indexing
, 2003
"... This chapter presents a new domain of research and development in Natural Language Processing (NLP) that is concerned with the representation, acquisition, and recognition of terms. Terms are pervasive in scientific and technical documents; their identification is a crucial issue for any applicatio ..."
Abstract
-
Cited by 22 (0 self)
- Add to MetaCart
This chapter presents a new domain of research and development in Natural Language Processing (NLP) that is concerned with the representation, acquisition, and recognition of terms. Terms are pervasive in scientific and technical documents; their identification is a crucial issue for any application dealing with the analysis, understanding, generation, or translation of such documents. In particular, the ever-growing mass of specialized documentation available on-line, in industrial and governmental archives or in digital libraries, calls for advances in terminology processing for such purposes as information retrieval, cross-language querying, indexing of multimedia documents, translation aids, document routing and summarization, etc. This chapter introduces the basic linguistic characteristics of terms. It presents the main methods in NLP for recognizing or discovering terms and their interrelationships in large corpora. It is divided into three sections: an introduction to the bas...
Multilingual Document Production From Support for Translating to Support for Authoring
- Machine Translation
, 1996
"... . In this paper, we look at the current scenario in multilingual documentation generation and the types of tools currently being used in support of the translation task, and discuss their shortcomings. We examine emergent trends in the document industry, observing a reorganisation of the workflow w ..."
Abstract
-
Cited by 20 (1 self)
- Add to MetaCart
. In this paper, we look at the current scenario in multilingual documentation generation and the types of tools currently being used in support of the translation task, and discuss their shortcomings. We examine emergent trends in the document industry, observing a reorganisation of the workflow which mirrors a shift of attention from translating to authoring and from the ergonomics of post-editing the target text to the ergonomics of producing the source text. We argue that these trends invite the design and development of new tools for the task of producing multilingual texts, and that multilingual generation provides the appropriate technology, shifting attention to an even earlier stage in the authoring process, that of specifying the semantics of the text to be produced. We describe a prototype system which exploits this technology to meet the expressed needs of authors and translators by supporting them in the drafting of multilingual instructions. We suggest that, in the futur...
Giving a virtual voice to the silent language of culture: The Cultura Project
- Technology
, 2001
"... THE CULTURA PROJECT This paper presents a Web-based, cross-cultural, curricular initiative entitled Cultura, designed to develop foreign language students ' understanding of foreign cultural attitudes, concepts, beliefs, and ways of interacting and looking at the world. Our focus will be on the peda ..."
Abstract
-
Cited by 13 (0 self)
- Add to MetaCart
THE CULTURA PROJECT This paper presents a Web-based, cross-cultural, curricular initiative entitled Cultura, designed to develop foreign language students ' understanding of foreign cultural attitudes, concepts, beliefs, and ways of interacting and looking at the world. Our focus will be on the pedagogy of electronic media, with particular emphasis on the ways in which the Web can be used to reveal those invisible aspects of a foreign culture, thereby giving a voice to the elusive "silent language " 1 and empowering students to construct their own approach to crosscultural literacy. We examine these new areas of cultural knowledge which the Web now renders accessible and attempt to redefine the meaning of foreign language "teaching " in the new world of networked communication. This article is written by four of the instructors who have been using Cultura in their classes, two of them teaching at the Massachusetts Institute of Technology in Cambridge, and two at the Institut National des Télécommunications in Evry, France (one has since changed universities). This "four-voiced " approach serves to illustrate the multi-faceted aspects of the project and the different types of readings to which Cultura lends itself, and explains the shifts in perspective the reader will encounter. Cultura was first developed in the summer of 1997. Since then we have continued to experiment with and develop it, using it in university level courses. Last year, it was used experimentally at the secondary school level as well. This particular paper focuses mostly on the work done during the spring and fall semesters of 1999 between MIT and INT.
What Is The Tree That We See Through The Window: A Linguistic Approach To Windowing And Term Variation
"... Windowing techniques play a key role in information retrieval. Previous works have suggested that the quality of access to information relies heavily on the characteristics of the windows. This study provides a linguistic approach to text windowing through an extraction of term variants with the hel ..."
Abstract
-
Cited by 9 (4 self)
- Add to MetaCart
Windowing techniques play a key role in information retrieval. Previous works have suggested that the quality of access to information relies heavily on the characteristics of the windows. This study provides a linguistic approach to text windowing through an extraction of term variants with the help of a partial parser. The syntactic grounding of the method ensures that words observed within restricted spans are lexically related and that spurious word co-occurrences are ruled out with a good level of confidence. The system is computationally tractable on large corpora and large lists of terms. Illustrative examples of term variations from a large medical corpus are given. An experimental evaluation of the method shows that only a small proportion of co-occurring words are lexically related and motivates the call for natural language parsing techniques in text windowing. 1. INTRODUCTION The notion of text window -- a span of contiguous words within a document -- is crucial for severa...
Multipurpose Design and Creation of GSL Dictionaries
- In Proceedings of the Workshop on the Representation and Processing of Sign Languages “From SignWriting to Image Processing. Information
, 2004
"... In this paper we present the methodology of data collection and implementation of databases with the purpose to create extensive lexical and terminological resources for the Greek Sign Language (GSL). The focus is on issues of linguistic content validation, multipurpose design and reusability of res ..."
Abstract
-
Cited by 6 (5 self)
- Add to MetaCart
In this paper we present the methodology of data collection and implementation of databases with the purpose to create extensive lexical and terminological resources for the Greek Sign Language (GSL). The focus is on issues of linguistic content validation, multipurpose design and reusability of resources, exemplified by the multimedia dictionary products of the projects NOEMA (1999-2001) and PROKLISI (2002-2004). As far as data collection methodology, DB design and resources development are concerned, a clear distinction is made between general language lexical items and terms, since the creation of resources for the two types of data undergoes different methodological principles, lexeme formation and usage conditions. There is also reference to content and interface evaluation mechanisms, as well as to basic linguistic research carried out for the support of lexicographical work. 1.
TRUCKS: a model for automatic multi-word term recognition
, 2000
"... This paper examines the use of linguistic techniques in the area of automatic term recognition. It describes the TRUCKS model, which makes use of dierent types of contextual information - syntactic, semantic, terminological and statistical - seeking particularly to identify those parts of the contex ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
This paper examines the use of linguistic techniques in the area of automatic term recognition. It describes the TRUCKS model, which makes use of dierent types of contextual information - syntactic, semantic, terminological and statistical - seeking particularly to identify those parts of the context which are most relevant to terms. From an initial corpus of sublanguage texts, this identi es, disambiguates and ranks candidate terms. The system is evaluated with respect to the statistical approach on which it is built, and with respect to its expected theoretical performance.
Technical Terminology as a Critical Resource
, 1903
"... Technical documentation is riddled with domain specific terminology which needs to be detected and properly organized in order to be meaningfully used. In this paper we describe how we coped with the problem of terminology detection for a specific type of document and how the extracted terminology w ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
Technical documentation is riddled with domain specific terminology which needs to be detected and properly organized in order to be meaningfully used. In this paper we describe how we coped with the problem of terminology detection for a specific type of document and how the extracted terminology was used within the context of our Answer Extraction System.

