Results 1 - 10
of
41
Optimality Theory
, 2000
"... Introduction Rene Kager's textbook is one of the first to cover Optimality Theory (OT), a declarative grammar framework that swiftly took over phonology after it was introduced by Prince, Smolensky, and McCarthy in 1993. OT reclaims traditional grammar's ability to express surface generalizations ..."
Abstract
-
Cited by 113 (0 self)
- Add to MetaCart
Introduction Rene Kager's textbook is one of the first to cover Optimality Theory (OT), a declarative grammar framework that swiftly took over phonology after it was introduced by Prince, Smolensky, and McCarthy in 1993. OT reclaims traditional grammar's ability to express surface generalizations ("syllables have onsets," "no nasal+voiceless obstruent clusters"). Empirically, some surface generalizations are robust within a language, or---perhaps for functionalist reasons--- widespread across languages. Derivational theories were forced to posit diverse rules that rescued these robust generalizations from other phonological processes. An OT grammar avoids such "conspiracies" by stating the generalizations directly, as in TwoLevel Morphology (Koskenniemi, 1983) or Declarative Phonology (Bird, 1995). In OT, the processes that try but fail to disrupt a robust generalization are described not as rules (cf. Paradis (1988)), but as lower-ranked generalizations. Suc
A Formal Framework for Linguistic Annotation
- Speech Communication
, 2000
"... `Linguistic annotation' covers any descriptive or analytic notations applied to raw language data. The basic data may be in the form of time functions -- audio, video and/or physiological recordings -- or it may be textual. The added notations may include transcriptions of all sorts (from phonetic ..."
Abstract
-
Cited by 97 (18 self)
- Add to MetaCart
`Linguistic annotation' covers any descriptive or analytic notations applied to raw language data. The basic data may be in the form of time functions -- audio, video and/or physiological recordings -- or it may be textual. The added notations may include transcriptions of all sorts (from phonetic features to discourse structures), part-of-speech and sense tagging, syntactic analysis, `named entity' identification, co-reference annotation, and so on. While there are several ongoing efforts to provide formats and tools for such annotations and to publish annotated linguistic databases, the lack of widely accepted standards is becoming a critical problem. Proposed standards, to the extent they exist, have focused on file formats. This paper focuses instead on the logical structure of linguistic annotations. We survey a wide variety of existing annotation formats and demonstrate a common conceptual core, the annotation graph. This provides a formal framework for constructing, mai...
Phonological derivation in optimality theory
- In COLING’94 Vol II
, 1994
"... Summary: Optimality Theory is a constraint-based theory of phonology which allows constraints to conflict and to be violated. Consequently, implementing the theory presents problems for declarative constraint-based processing frameworks. On the basis of two regularity assumptions, that sets are regu ..."
Abstract
-
Cited by 55 (0 self)
- Add to MetaCart
Summary: Optimality Theory is a constraint-based theory of phonology which allows constraints to conflict and to be violated. Consequently, implementing the theory presents problems for declarative constraint-based processing frameworks. On the basis of two regularity assumptions, that sets are regular and that constraints can be modelled by transducers, this paper presents and proves correct algorithms for computing the action of constraints, and hence deriving surface forms.
Learning bias and phonological-rule induction
- Computational Linguistics
, 1996
"... A fundamental debate in the machine learning of language has been the role of prior knowledge in the learning process. Purely nativist approaches, such as the Principles and Parameters model, build parameterized linguistic generalizations directly into the learning system. Purely empirical approache ..."
Abstract
-
Cited by 25 (0 self)
- Add to MetaCart
A fundamental debate in the machine learning of language has been the role of prior knowledge in the learning process. Purely nativist approaches, such as the Principles and Parameters model, build parameterized linguistic generalizations directly into the learning system. Purely empirical approaches use a general, domain-independent learning rule (Error Back-Propagation, Instance-based Generalization, Minimum Description Length) to learn linguistic generalizations directly from the data. In this paper we suggest that an alternative to the purely nativist or purely empiricist learning paradigms is to represent the prior knowledge of language as a set of abstract learning biases, which guide an empirical inductive learning algorithm. We test our idea by examining the machine learning of simple Sound Pattern of English ( S P E)-style phonological rules. We represent phonological rules as finite-state transducers that accept underlying forms as input and generate surface forms as output. We show that OSTIA, a general-purpose transducer induction algorithm, was incapable of learning simple phonological rules like flapping. We then augmented OSTIA with three kinds of learning biases that are specific to natural language phonology, and that are assumed explicitly or implicitly by every theory of phonology: faithfulness (underlying segments
Information Packaging in HPSG
, 1996
"... This paper is concerned with how information structure should be optimally integrated into grammar. It proposes an analysis with the following characteristics: (1) information structure is an integral part of grammar since it interacts in principled ways with both syntax and phonology, (2) the repre ..."
Abstract
-
Cited by 24 (0 self)
- Add to MetaCart
This paper is concerned with how information structure should be optimally integrated into grammar. It proposes an analysis with the following characteristics: (1) information structure is an integral part of grammar since it interacts in principled ways with both syntax and phonology, (2) the representation of information structure in the grammar is independent of its particular structural realisation in different languages, and (3) there is a direct analogous implementation of the relationship between information structure and prosody in English-type languages and between information structure and the word-order dimension in Catalan-type languages. The framework utilised is HPSG. HPSG's multidimensional constraint-based architecture lends itself very well to expressing the mutual constraints on interpretation, syntax, and phonology that so diversely characterise focus-ground in different languages. The study of information structure, we argue, is essential in addressing fundamental questions regarding grammar architecture. Our point of departure is the assumption, expressed in e.g. Chafe 1976, Prince 1986, that what underlies the focus-ground distinction is a need to `package' the information conveyed by a sentence so that hearers can easily identify which part of the sentence represents an actual contribution to their information state at the time of utterance, and which part represents material that is already subsumed by this information state. In particular, we adopt the proposal in Vallduví 1992, 1994 that these `ways of packaging' can be viewed as updating instructions or, equivalently, as types of transitions between information states. The paper is structured as follows. Section 2 provides a brief overview of information packaging. Section 3 discusses the st...
The Iterative Learning of Phonological Constraints
- Computational Linguistics
, 1991
"... This paper presents a simplicity measure for violable phonological constraints based on the minimum message length method. This measure captures the intuitive desiderata of conciseness, accuracy and precision. A family of constraints can be specified by parameterising a specific constraint, and so f ..."
Abstract
-
Cited by 14 (0 self)
- Add to MetaCart
This paper presents a simplicity measure for violable phonological constraints based on the minimum message length method. This measure captures the intuitive desiderata of conciseness, accuracy and precision. A family of constraints can be specified by parameterising a specific constraint, and so forming a template. The combination of this measure with a search algorithm is a powerful learning method for finding the best constraint matching a template and fitting a corpus. This method may be applied iteratively, using the same template, to learn a number of different constraints. Five applications of an implementation show some of the successes of this learning method: from learning consonant cluster constraints to vowel harmony.
Querying Databases of Annotated Speech
, 2000
"... Annotated speech corpora are databases consisting of signal data along with time-aligned symbolic `transcriptions'. Such databases are typically multidimensional, heterogeneous and dynamic. These properties present a number of tough challenges for representation and query. The temporal nature of the ..."
Abstract
-
Cited by 11 (8 self)
- Add to MetaCart
Annotated speech corpora are databases consisting of signal data along with time-aligned symbolic `transcriptions'. Such databases are typically multidimensional, heterogeneous and dynamic. These properties present a number of tough challenges for representation and query. The temporal nature of the data adds an additional layer of complexity. This paper presents and harmonises two independent efforts to model annotated speech databases, one at Macquarie University and one at the University of Pennsylvania. Various query languages are described, along with illustrative applications to a variety of analytical problems. The research reported here forms a part of several ongoing projects to develop platform-independent opensource tools for creating, browsing, searching, querying and transforming linguistic databases, and to disseminate large linguistic databases over the internet. 1. Databases of Annotated Speech Recordings Annotated corpora have been an essential component of research ...
Towards A Formal Framework For Linguistic Annotations
- IN PROCEEDINGS OF THE ICLSP
, 1999
"... `Linguistic annotation' is a term covering any transcription, translation or annotation of textual data or recorded linguistic signals. While there are several ongoing efforts to provide formats and tools for such annotations and to publish annotated linguistic databases, the lack of widely accepted ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
`Linguistic annotation' is a term covering any transcription, translation or annotation of textual data or recorded linguistic signals. While there are several ongoing efforts to provide formats and tools for such annotations and to publish annotated linguistic databases, the lack of widely accepted standards is becoming a critical problem. Proposed standards, to the extent they exist, have focussed on file formats. This paper focuses instead on the logical structure of linguistic annotations. We survey a wide variety of annotation formats and demonstrate a common conceptual core. This provides the foundation for an algebraic framework which encompasses the representation, archiving and query of linguistic annotations, while remaining consistent with many alternative file formats.

