Results 1 -
8 of
8
The TIGER Treebank
, 2002
"... This paper reports on the TIGER Treebank, a corpus of currently 35.000 syntactically annotated German newspaper sentences. We describe what kind of information is encoded in the treebank and introduce the different representation formats that are used for the annotation and exploitation of the tr ..."
Abstract
-
Cited by 173 (3 self)
- Add to MetaCart
This paper reports on the TIGER Treebank, a corpus of currently 35.000 syntactically annotated German newspaper sentences. We describe what kind of information is encoded in the treebank and introduce the different representation formats that are used for the annotation and exploitation of the treebank. We explain the different methods used for the annotation: interactive annotation, using the tool Annotate, and LFG parsing. Furthermore, we give an account of the annotation scheme used for the TIGER treebank. This scheme is an extended and improved version of the NEGRA annotation scheme and we illustrate in detail the linguistic extensions that were made concerning the annotation in the TIGER project. The main differences are concerned with coordination, verb-subcategorization, expletives as well as proper nouns. In addition, the paper also presents the query tool TIGERSearch that was developed in the project to exploit the treebank in an adequate way. We describe the query language which was designed to facilitate a simple formulation of complex queries; furthermore, we shortly introduce TIGERin, a graphical user interface for query input. The paper concludes with a summary and some directions for future work.
The LinGO Redwoods Treebank -- Motivation and Preliminary Applications
"... The LinGO Redwoods initiative is a seed activity in the design and development of a new type of treebank. While several medium- to large-scale treebanks exist for English (and for other major languages), pre-existing publicly available resources exhibit the following limitations: (i) annotation is m ..."
Abstract
-
Cited by 54 (15 self)
- Add to MetaCart
The LinGO Redwoods initiative is a seed activity in the design and development of a new type of treebank. While several medium- to large-scale treebanks exist for English (and for other major languages), pre-existing publicly available resources exhibit the following limitations: (i) annotation is mono-stratal, either encoding topological (phrase structure) or tectogrammatical (dependency) information, (ii) the depth of linguistic information recorded is comparatively shallow, (iii) the design and format of linguistic representation in the treebank hard-wires a small, predefined range of ways in which information can be extracted from the treebank, and (iv) representations in existing treebanks are static and over the (often year- or decade-long) evolution of a large-scale treebank tend to fall behind the development of the field. LinGO Redwoods aims at the development of a novel treebanking methodology, rich in nature and dynamic both in the ways linguistic data can be retrieved from the treebank in varying granularity and in the constant evolution and regular updating of the treebank itself. Since October 2001, the project is working to build the foundations for this new type of treebank, to develop a basic set of tools for treebank construction and maintenance, and to construct an initial set of 10,000 annotated trees to be distributed together with the tools under an open-source license.
LinGO Redwoods - A Rich and Dynamic Treebank for HPSG
- In Beyond PARSEVAL. Workshop of the Third LREC Conference
, 2002
"... The LinGO Redwoods initiative is a seed activity in the design and development of a new type of treebank. A treebank is a (typically hand-built) collection of natural language utterances and associated linguistic analyses; typical treebanks---as for example the widely recognized Penn Treebank (Ma ..."
Abstract
-
Cited by 24 (6 self)
- Add to MetaCart
The LinGO Redwoods initiative is a seed activity in the design and development of a new type of treebank. A treebank is a (typically hand-built) collection of natural language utterances and associated linguistic analyses; typical treebanks---as for example the widely recognized Penn Treebank (Marcus, Santorini, & Marcinkiewicz, 1993), the Prague Dependency Treebank (Hajic, 1998), or the German TiGer Corpus (Skut, Krenn, Brants, & Uszkoreit, 1997)---assign syntactic phrase structure or tectogrammatical dependency trees over sentences taken from a naturallyoccuring source, often newspaper text. Applications of existing treebanks fall into two broad categories: (i) use of an annotated corpus in empirical linguistics as a source of structured language data and distributional patterns and (ii) use of the treebank for the acquisition (e.g. using stochastic or machine learning approaches) and evaluation of parsing systems.
Automatic F-Structure Annotation Of Treebank Trees
- THE FIFTH INTERNATIONAL CONFERENCE ON LEXICAL-FUNCTIONAL GRAMMAR, THE UNIVERSITY OF CALIFORNIA AT BERKELEY, 19 JULY - 20 JULY 2000, CSLI
, 2000
"... We describe a method that automatically induces LFG f-structures from treebank tree representations, given a set of f-structure annotation principles that define partial, modular c- to f-structure correspondences in a linguistically informed, principle-based way. ..."
Abstract
-
Cited by 22 (6 self)
- Add to MetaCart
We describe a method that automatically induces LFG f-structures from treebank tree representations, given a set of f-structure annotation principles that define partial, modular c- to f-structure correspondences in a linguistically informed, principle-based way.
Building a Linguistically Interpreted Corpus of Bulgarian: the BulTreeBank
- In: Proceedings of LREC 2002, Canary Islands
, 2002
"... In the field of Human Language Technology (HLT), the existence of linguistically interpreted real-world texts provides the license necessary for a given language to enter the area of high-tech applications. The significance of BulTreeBank is the granting of an HLT license to a "less processed" langu ..."
Abstract
-
Cited by 14 (2 self)
- Add to MetaCart
In the field of Human Language Technology (HLT), the existence of linguistically interpreted real-world texts provides the license necessary for a given language to enter the area of high-tech applications. The significance of BulTreeBank is the granting of an HLT license to a "less processed" language like Bulgarian which, until recently, has been formally modelled and processed mainly on the morphology level. The BulTreeBank project aims at the creation of syntactically annotated data for Bulgarian and the tools for their production, management and automatic processing. It provides not only language resources, but develops an infrastructure of research solutions, production scenarios and services.
Ambiguity Management in Grammar Writing
, 2000
"... When lingusitically motivated grammars are implemented on a larger scale, and applied to real-life corpora, keeping track of ambiguity sources becomes a dicult task. Yet it is of great importance, since unintended ambiguities arising from underrestricted rules or interactions have to be distingui ..."
Abstract
-
Cited by 6 (4 self)
- Add to MetaCart
When lingusitically motivated grammars are implemented on a larger scale, and applied to real-life corpora, keeping track of ambiguity sources becomes a dicult task. Yet it is of great importance, since unintended ambiguities arising from underrestricted rules or interactions have to be distinguished from linguistically warranted ambiguities. In this paper we report on various tools in the XLE grammar development platform which can be used for ambiguity management in grammar writing. In particular, we look at packed representations of ambiguities that allow the grammar writer to view sorted descriptions of ambiguity sources. Also discussed are tools for specifying desired tree structures and for cutting down the solution space prior to parsing. 1
The LinGO Redwoods Treebank
"... The LinGO Redwoods initiative is a seed activity in the design and development of a new type of treebank. While several medium- to large-scale treebanks exist for English (and for other major languages), pre-existing publicly available resources exhibit the following limitations: (i) annotation is m ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
The LinGO Redwoods initiative is a seed activity in the design and development of a new type of treebank. While several medium- to large-scale treebanks exist for English (and for other major languages), pre-existing publicly available resources exhibit the following limitations: (i) annotation is mono-stratal, either encoding topological (phrase structure) or tectogrammatical (dependency) information, (ii) the depth of linguistic information recorded is comparatively shallow, (iii) the design and format of linguistic representation in the treebank hard-wires a small, predefined range of ways in which information can be extracted from the treebank, and (iv) representations in existing treebanks are static and over the (often year- or decade-long) evolution of a large-scale treebank tend to fall behind the development of the field. LinGO Redwoods aims at the development of a novel treebanking methodology, rich in nature and dynamic both in the ways linguistic data can be retrieved from the treebank in varying granularity and in the constant evolution and regular updating of the treebank itself. Since October 2001, the project is working to build the foundations for this new type of treebank, to develop a basic set of tools for treebank construction and maintenance, and to construct an initial set of 10,000 annotated trees to be distributed together with the tools under an open-source license.
Treebank Conversion Creating a German f-structure bank from the TIGER Corpus
"... This paper reports on the conversion of the TIGER treebank, a syntactically interpreted corpus of German newspaper texts, into a testsuite for a broad-coverage Lexical-Functional Grammar (LFG) for German. It presents the two major steps of the conversion, which consists of an XSLT transformation of ..."
Abstract
- Add to MetaCart
This paper reports on the conversion of the TIGER treebank, a syntactically interpreted corpus of German newspaper texts, into a testsuite for a broad-coverage Lexical-Functional Grammar (LFG) for German. It presents the two major steps of the conversion, which consists of an XSLT transformation of the TIGER XML representation into a relational Prolog-like representation and the subsequent application of term-rewriting rules as they are used in certain MT transfer components to that representation. Then some problems due to considerable differences in analysis or to information not encoded in the TIGER representation are discussed. The output consists of (partly ambiguous) f-structure charts, which can then be mapped against the grammar’s output for evaluation purposes. 1

