Results 1 - 10
of
17
ATLAS: A flexible and extensible architecture for linguistic annotation
- In Proceedings of the Second International Conference on Language Resources and Evaluation. Paris: European Language Resources Association
, 2000
"... We describe a formal model for annotating linguistic artifacts, from which we derive an application programming interface (API) to a suite of tools for manipulating these annotations. The abstract logical model provides for a range of storage formats and promotes the reuse of tools that interact thr ..."
Abstract
-
Cited by 38 (5 self)
- Add to MetaCart
We describe a formal model for annotating linguistic artifacts, from which we derive an application programming interface (API) to a suite of tools for manipulating these annotations. The abstract logical model provides for a range of storage formats and promotes the reuse of tools that interact through this API. We focus first on “Annotation Graphs, ” a graph model for annotations on linear signals (such as text and speech) indexed by intervals, for which efficient database storage and querying techniques are applicable. We note how a wide range of existing annotated corpora can be mapped to this annotation graph model. This model is then generalized to encompass a wider variety of linguistic “signals, ” including both naturally occuring phenomena (as recorded in images, video, multi-modal interactions, etc.), as well as the derived resources that are increasingly important to the engineering of natural language processing systems (such as word lists, dictionaries, aligned bilingual corpora, etc.). We conclude with a review of the current efforts towards implementing key pieces of this architecture. 1.
A Description Language for Syntactically Annotated Corpora
, 2000
"... This paper introduces a description language for syntactically annotated corpora which allows for encoding both the syntactic annotation to a corpus and the queries to a syntactically annotated corpus. ..."
Abstract
-
Cited by 17 (1 self)
- Add to MetaCart
This paper introduces a description language for syntactically annotated corpora which allows for encoding both the syntactic annotation to a corpus and the queries to a syntactically annotated corpus.
Extending XPath to support linguistic queries
- In: Workshop on Programming Language Technologies for XML (PLAN-X
, 2005
"... Linguistic research and language technology development employ large repositories of ordered trees. XML, a standard ordered tree model, and XPath, its associated language, are natural choices for storing and querying linguistic data. However, several important expressive features required for lingui ..."
Abstract
-
Cited by 14 (3 self)
- Add to MetaCart
Linguistic research and language technology development employ large repositories of ordered trees. XML, a standard ordered tree model, and XPath, its associated language, are natural choices for storing and querying linguistic data. However, several important expressive features required for linguistic queries are missing in XPath. In this paper, we motivate and illustrate these features with a variety of linguistic queries. Then we define extensions to XPath which support linguistic tree queries. We provide a relational representation for trees, and define an SQL translation for queries. Experiments demonstrate that the query system is significantly faster than other linguistic tree query systems for a wide range of queries. 1
Designing and evaluating an XPath dialect for linguistic queries
- In 22nd International Conference on Data Engineering
, 2006
"... Linguistic research and natural language processing employ large repositories of ordered trees. XML, a standard ordered tree model, and XPath, its associated language, are natural choices for linguistic data and queries. However, several important expressive features required for linguistic queries ..."
Abstract
-
Cited by 13 (6 self)
- Add to MetaCart
Linguistic research and natural language processing employ large repositories of ordered trees. XML, a standard ordered tree model, and XPath, its associated language, are natural choices for linguistic data and queries. However, several important expressive features required for linguistic queries are missing or hard to express in XPath. In this paper, we motivate and illustrate these features with a variety of linguistic queries. Then we propose extensions to XPath to support linguistic queries, and design an efficient query engine based on a novel labeling scheme. Experiments demonstrate that our language is not only sufficiently expressive for linguistic trees but also efficient for practical usage. 1
Evolving GATE to Meet New Challenges in . . .
, 1998
"... In this paper we present recent work on GATE, a widely-used framework and graphical development environment for creating and deploying Language Engineering components and resources in a robust fashion. The GATE architecture has facilitated the development of a number of successful applications for v ..."
Abstract
-
Cited by 11 (2 self)
- Add to MetaCart
In this paper we present recent work on GATE, a widely-used framework and graphical development environment for creating and deploying Language Engineering components and resources in a robust fashion. The GATE architecture has facilitated the development of a number of successful applications for various language processing tasks (such as Information Extraction, dialogue and summarisation), the building and annotation of corpora and the quantitative evaluations of LE applications. The focus of this paper is on recent developments in response to new challenges in Language Engineering: Semantic Web, integration with Information Retrieval and data mining, and the need for machine learning support.
Indexing and Querying Linguistic Metadata and Document Content
"... The need for efficient corpus indexing and querying arises frequently both in machine learning-based and human-engineered natural language processing systems. This paper presents the ANNIC system, which can index documents not only by content, but also by their linguististic annotations and features ..."
Abstract
-
Cited by 6 (2 self)
- Add to MetaCart
The need for efficient corpus indexing and querying arises frequently both in machine learning-based and human-engineered natural language processing systems. This paper presents the ANNIC system, which can index documents not only by content, but also by their linguististic annotations and features. It also enables users to formulate versatile queries mixing keywords and linguistic information. The result consists of the matching texts in the corpus, displayed within the context of linguistic annotations (not just text, as is customary for KWIC systems). The data is displayed in a graphical user interface, which facilitates its exploration and the discovery of new patterns, which can in turn be tested by launching new ANNIC queries. 1
Requirements, Tools, and Architectures for Annotated Corpora
- In Proceedings of Data Architectures and Software Support for Large Corpora
, 2000
"... This paper provides an overview of the needs for corpus annotation and exploitation, and some suggested strategies for development of a widely usable and reusable corpus-handling environment. The central plank of our argument is that cross-disciplinary acceptability is no longer an optional extra. T ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
This paper provides an overview of the needs for corpus annotation and exploitation, and some suggested strategies for development of a widely usable and reusable corpus-handling environment. The central plank of our argument is that cross-disciplinary acceptability is no longer an optional extra. The overall goal is to provide a framework which can be adapted to meet the needs of a research community which is intellectually, geographically, and linguistically diverse. The ecology of corpora Annotated text and speech corpora are a staple of language processing research, as well as other applications such as lexicography and corpus linguistics. The cost of creating an annotated corpus can be very high, both in direct financial terms and in terms of the opportunity cost of allocating skilled labor. So funders, whether public or commercial, have come to expect that the cost of corpus creation will be amortized over multiple research and development efforts. The more costly the corpus, t...
Generalizing XPath for directed graphs
, 2003
"... XPath is a very natural and powerful way to specify locations in XML documents. This paper examines possible generalizations of XPath to allow both locations and paths through generalized labeled directed graphs to be specified. The need for such a path language is driven by work in querying Linguis ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
XPath is a very natural and powerful way to specify locations in XML documents. This paper examines possible generalizations of XPath to allow both locations and paths through generalized labeled directed graphs to be specified. The need for such a path language is driven by work in querying Linguistic Annotations which are in general more complex in structure than XML documents. The result of this exercise is a powerful path language which reduces to XPath as a special case and which could potentially be useful in a range of query applications.
A Formal Framework For Interlinear Text
- Proceedings of the Workshop on Web-Based Language Documentation and Description
, 2000
"... Interlinear texts come in many forms and can be represented digitally in many ways, e.g. plain text with hard spacing, tables, special markup, and special-purpose data structures. There are various methods for linking to audio data and lexical entries, and for including footnotes and other margina ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
Interlinear texts come in many forms and can be represented digitally in many ways, e.g. plain text with hard spacing, tables, special markup, and special-purpose data structures. There are various methods for linking to audio data and lexical entries, and for including footnotes and other marginalia. This diversity of form presents problems for general purpose software for searching, exchanging, displaying and enriching interlinear texts.
Text Augmentation: Inserting XML tags into natural language text with PPM Models and Viterbi-like search
, 2003
"... This thesis develops work on using Hidden Markov Models to insert tags natural language text. A taxonomy of tags is developed unifying the fields of text segmentation tagging, part-of-speech tagging, proper noun extraction and hierarchical entity extraction. The search spaces for inserting tags are ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
This thesis develops work on using Hidden Markov Models to insert tags natural language text. A taxonomy of tags is developed unifying the fields of text segmentation tagging, part-of-speech tagging, proper noun extraction and hierarchical entity extraction. The search spaces for inserting tags are examined from both a theoretical and experimental point of view across the taxonomy and on four corpora. A analysis of different correctness measures for different types of tag insertion problem is undertaken and a technique to determine whether tag-insertion errors are the result of a modelling failure or a searching failure is discovered.

