Results 1 - 10
of
12
The German Text-to-Speech synthesis system MARY: A tool for research, development and teaching
- International Journal of Speech Technology
, 2001
"... Abstract. This paper introduces the German text-to-speech synthesis system MARY. The system’s main features, namely a modular design and an XML-based system-internal data representation, are pointed out, and the properties of the individual modules are briefly presented. An interface allowing the us ..."
Abstract
-
Cited by 42 (13 self)
- Add to MetaCart
Abstract. This paper introduces the German text-to-speech synthesis system MARY. The system’s main features, namely a modular design and an XML-based system-internal data representation, are pointed out, and the properties of the individual modules are briefly presented. An interface allowing the user to access and modify intermediate processing steps without the need for a technical understanding of the system is described, along with examples of how this interface can be put to use in research, development and teaching. The usefulness of the modular and transparent design approach is further illustrated with an early prototype of an interface for emotional speech synthesis.
Learning Semantic Lexicons from a Part-of-Speech and Semantically Tagged Corpus using Inductive Logic Programming
- Journal of Machine Learning Research
, 2003
"... This paper describes an inductive logic programming learning method designed to acquire from a corpus specific Noun-Verb (N-V) pairs---relevant in information retrieval applications to perform index expansion---in order to build up semantic lexicons based on Pustejovsky's generative lexicon (GL) pri ..."
Abstract
-
Cited by 14 (3 self)
- Add to MetaCart
This paper describes an inductive logic programming learning method designed to acquire from a corpus specific Noun-Verb (N-V) pairs---relevant in information retrieval applications to perform index expansion---in order to build up semantic lexicons based on Pustejovsky's generative lexicon (GL) principles (Pustejovsky, 1995). In one of the components of this lexical model, called the qualia structure, words are described in terms of semantic roles. For example, the telic role indicates the purpose or function of an item (cut for knife), the agentive role its creation mode (build for house), etc. The qualia structure of a noun is mainly made up of verbal associations, encoding relational information. The learning method enables us to automatically extract, from a morphosyntactically and semantically tagged corpus, N-V pairs whose elements are linked by one of the semantic relations defined in the qualia structure in GL. It also infers rules explaining what in the surrounding context distinguishes such pairs from others also found in sentences of the corpus but which are not relevant. Stress is put here on the learning efficiency that is required to be able to deal with all the available contextual information, and to produce linguistically meaningful rules.
Indexing By Statistical Tagging
, 2000
"... Lexical ambiguity is a fundamental problem in Information Retrieval (IR), especially in the medical domain. Many systems use a subset of the words contained in the document to represent the content, but they are faced with the problem of ambiguity. In this paper, we propose a method for disambiguati ..."
Abstract
-
Cited by 4 (3 self)
- Add to MetaCart
Lexical ambiguity is a fundamental problem in Information Retrieval (IR), especially in the medical domain. Many systems use a subset of the words contained in the document to represent the content, but they are faced with the problem of ambiguity. In this paper, we propose a method for disambiguation based on existing medical terminological resources on the one hand, and statistical tools for linguistic annotation on the other, in order to develop more satisfactory indexing techniques for patient reports. The main hypothesises guiding this method are that: (i) Syntax can help to distinguate meanings of words that are polyfunctional. (ii) Syntactic analysis can be done by a probabilistic tagger (HMM, Hidden Markov Model) and, more daringly, (iii) remaining semantic ambiguity can also be solved (mutatis mutandis) by an HMM tagger. Keywords: semantic disambiguation, statistical tagging, information retrieval, medical patient records 1. Introduction Lexical ambiguity is a fundamental p...
Inductive Logic Programming for Corpus-Based Acquisition of Semantic Lexicons
, 2000
"... In this paper, we propose an Inductive Logic Programming learning method which aims at automatically extracting special Noun-Verb (NV) pairs from a corpus in order to build up semantic lexicons based on Pustejovsky's Gen- erarive Lexicon (GL) principles (Pustejovsky, 1995). In one of the components ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
In this paper, we propose an Inductive Logic Programming learning method which aims at automatically extracting special Noun-Verb (NV) pairs from a corpus in order to build up semantic lexicons based on Pustejovsky's Gen- erarive Lexicon (GL) principles (Pustejovsky, 1995). In one of the components of this lex- ical model, called the qualia structure, words are described in terms of semantic roles. For example, the relic role indicates the purpose or function of an item (cut for knife), the agentive role its creation mode (build for house), etc. The qualia structure of a noun is mainly made up of verbal associations, encoding relational information. The Inductive Logic Pro- gramming learning method that we have developed enables us to automatically extract from a corpus N-V pairs whose elements are linked by one of the semantic relations defined in the qualia structure in GL, and to distinguish them, in terms of surrounding categorial context from N-V pairs also present in sentences of the corpus but not relevant. This method has been theoret- ically and empirically validated, on a technical corpus. The N-V pairs that have been extracted will further be used in information retrieval applications for index expansion .
Using Part-of-Speech and Semantic Tagging for the Corpus-Based Learning of Qualia Structure Elements
- In First International Workshop on Generative Approaches to the Lexicon, GL'2001
, 2001
"... This paper describes the implementation and results of a machine learning method, developed within the inductive logic programming (ILP) framework (Muggleton and De-Raedt, 1994), to automatically extract, from a corpus tagged with parts of speech (POS) and semantic classes, noun-verb pairs whose com ..."
Abstract
-
Cited by 4 (3 self)
- Add to MetaCart
This paper describes the implementation and results of a machine learning method, developed within the inductive logic programming (ILP) framework (Muggleton and De-Raedt, 1994), to automatically extract, from a corpus tagged with parts of speech (POS) and semantic classes, noun-verb pairs whose components are bound by one of the relations defined in the qualia structure in the Generative Lexicon (Pustejovsky,1995).
Acquisition of Qualia Elements from Corpora - Evaluation of a Symbolic Learning Method
, 2002
"... This paper presents and evaluates a system extracting from a corpus noun-verb pairs whose components are related by a special kind of link: the qualia roles as defined in the Generative Lexicon. This system is based on a symbolic learning method that automatically learns, from noun-verb pairs that a ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
This paper presents and evaluates a system extracting from a corpus noun-verb pairs whose components are related by a special kind of link: the qualia roles as defined in the Generative Lexicon. This system is based on a symbolic learning method that automatically learns, from noun-verb pairs that are or are not related by a qualia link, rules characterizing positive examples from negative ones in terms of their surrounding part-of-speech or semantic contexts. The qualia noun-verb pair extraction is thus performed by applying the learnt rules on a part-of-speech or semantically tagged text. Stress is put on the quality of the learning when compared with traditional statistical or syntactical-based approaches. The linguistic relevance of the rules is also evaluated through a comparison with manually acquired qualia patterns.
Treatment of Unknown Words
, 1999
"... . Words not present in the dictionary are almost always found in unrestricted texts. However, there is a need to obtain their likely base forms (in lemmatization), or morphological categories (in tagging), or both. Some of them nd their ways into dictionaries, and it would be nice to predict wha ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
. Words not present in the dictionary are almost always found in unrestricted texts. However, there is a need to obtain their likely base forms (in lemmatization), or morphological categories (in tagging), or both. Some of them nd their ways into dictionaries, and it would be nice to predict what their entries should look like. Humans can perform those tasks using endings of words (sometimes prexes and inxes as well), and so can do computers. Previous approaches used manually constructed lists of endings and associated information. Brill proposed transformation-based learning from corpora, and Mikheev used Brill's approach on data for a morphological lexicon. However, both Brill's algorithm, and Mikheev's algorithm that is derived from Brill's one, lack speed, both in the rule acquisition phase, and in the rule application phase. Their algorithms handle only the case of tagging, although an extension to other tasks seems possible. We propose a very fast nite-state met...
Integrating Textual Knowledge and Formal Knowledge for Improving Traceability
- Proceedings of the ECAI Workshop on Knowledge Management and Organisational Memories
, 2000
"... This article deals with traceability in knowledge repositories. More precisely, we concentrate on the role of terminological knowledge in the mapping between (informal) textual requirements and (formal) object models. We show that terminological knowledge facilitates the production of traceability l ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
This article deals with traceability in knowledge repositories. More precisely, we concentrate on the role of terminological knowledge in the mapping between (informal) textual requirements and (formal) object models. We show that terminological knowledge facilitates the production of traceability links and model generation, provided that language processing technologies allow to elaborate semiautomatically the required terminological resources. The presented system is one step towards incremental formalization from textual knowledge. As such, it is a valuable tool for building knowledge repositories.
A short history of two-level morphology
, 2001
"... Twenty years ago morphological analysis of natural language was a challenge to computational linguists. Simple cut-and-paste programs could be and were written to analyze strings in particular languages, but there was no general language-independent method available. Furthermore, cut-and-paste progr ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Twenty years ago morphological analysis of natural language was a challenge to computational linguists. Simple cut-and-paste programs could be and were written to analyze strings in particular languages, but there was no general language-independent method available. Furthermore, cut-and-paste programs for analysis were not reversible, they could not be used to generate words. Generative phonologists of that time described morphological alternations by means of ordered rewrite rules, but it was not understood how such rules could be used for analysis. This was the situation in the spring of 1981 when Kimmo Koskenniemi came to a conference on parsing that Lauri Karttunen had organized at the University of Texas at Austin. Also at the same conference were two Xerox researchers from Palo Alto, Ronald M. Kaplan and Martin Kay. The four Ks discovered that all of them were interested and had been working on the problem of morphological analysis. Koskenniemi went on to Palo Alto to visit Kay and Kaplan at PARC. This was the beginning of Two-Level Morphology, the first general model in the history of computational linguistics for the analysis and generation of morphologically complex languages. The language-specific components, the lexicon and the rules, were combined with a runtime engine applicable to all languages. In this article we trace the development of the finite-state technology that Two-Level Morphology is based on. 1 The Origins Traditional phonological grammars, formalized in the 1960s by Noam Chomsky and Morris Halle (Chomsky and Halle, 1968) , consisted of an ordered sequence of rewrite rules that converted abstract phonological representations into surface forms through a series of intermediate representations. Such rules have the general form x-> y / z w where x, y, z, and w can be arbitrarily complex strings or feature-matrices. In mathematical linguistics (Partee et al., 1993), such rules are called CONTEXT-SENSITIVE REWRITE RULES, and they are more powerful than regular expressions or context-free rewrite rules.
Exogenous and endogenous approaches to semantic categorization of unknown technical terms
- in In Proceedings of the 18th International Conference on Computational Linguistics (COLING
, 2000
"... Acquiring and updating terminological resources are di cult and tedious tasks, especially when semantic information should be provided. This paper deals with Term Semantic Categorization. The goal of this process is to assign semantic categories to unknown technical terms. We propose two approaches ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Acquiring and updating terminological resources are di cult and tedious tasks, especially when semantic information should be provided. This paper deals with Term Semantic Categorization. The goal of this process is to assign semantic categories to unknown technical terms. We propose two approaches to the problem that rely on di erent knowledge sources. The exogeneous approach exploits contextual information extracted from corpora. The endogeneous approach relies on a lexical analysis of the technical terms. After describing the two implemented methods, we present the experiments that we conducted on signi cant test sets. The results demonstrate that term categorization can provide a reliable help in the terminology acquisition processes. 1

