Results 1 - 10
of
23
A multimodal discourse ontology for meeting understanding
- Machine Learning for Multimodal Interaction: 2nd International Workshop, MLMI 2005
, 2006
"... Abstract. In this paper, we present a multimodal discourse ontology that serves as a knowledge representation and annotation framework for the discourse understanding component of an artificial personal office assistant. The ontology models components of natural language, multimodal communication, m ..."
Abstract
-
Cited by 11 (3 self)
- Add to MetaCart
Abstract. In this paper, we present a multimodal discourse ontology that serves as a knowledge representation and annotation framework for the discourse understanding component of an artificial personal office assistant. The ontology models components of natural language, multimodal communication, multi-party dialogue structure, meeting structure, and the physical and temporal aspects of human communication. We compare our models to those from the research literature and from similar applications. We also highlight some algorithms that are used to perform automatic processing and understanding using these models and suggest elements of the ontology that may be of immediate interest to meeting annotation by human or automated means. 1
Sustainability of Linguistic Resources
- Proceedings of the LREC 2006 Satellite Workshop on "Merging and Layering Linguistic Information
, 2006
"... This paper describes a new research initiative addressing the issue of sustainability of linguistic resources. This initiative is a cooperation between three linguistic collaborative research centres in Germany, which comprise more than 40 individual research projects altogether. These projects are ..."
Abstract
-
Cited by 4 (3 self)
- Add to MetaCart
This paper describes a new research initiative addressing the issue of sustainability of linguistic resources. This initiative is a cooperation between three linguistic collaborative research centres in Germany, which comprise more than 40 individual research projects altogether. These projects are involved in creating manifold language resources, especially corpora, tailored to their particular needs. The aim of the project described here is to ensure an effective and sustainable access of these data by third-party researchers beyond the termination of these projects. This goal involves a number of measures, such as the definition of a common data format to completely capture the heterogeneous information encoded in the individual corpora, the development of user-friendly and sustainably usable tools for processing (e.g. querying) the data, and the specification of common inventories of metadata and terminology. Moreover, the project aims at formulating general rules of best practice for creating, accessing, and archiving linguistic resources. 1.
ODIN: A Model for Adapting and Enriching Legacy Infrastructure
- IN PROCEEDINGS OF THE E-HUMANITIES WORKSHOP
, 2006
"... The Online Database of Interlinear Text (ODIN) is a database of interlinear text "snippets", harvested mostly from scholarly documents posted to the Web. Although large amounts of language data are posted to the Web as part of scholarly discourse, making the existing "e-Linguistic infrastructure ..."
Abstract
-
Cited by 4 (3 self)
- Add to MetaCart
The Online Database of Interlinear Text (ODIN) is a database of interlinear text "snippets", harvested mostly from scholarly documents posted to the Web. Although large amounts of language data are posted to the Web as part of scholarly discourse, making the existing "e-Linguistic infrastructure " surprisingly rich, most linguistic data available on the Web exists in legacy formats, is highly displaycentric, and is often difficult to locate or interoperate over. ODIN seeks to leverage this existing infrastructure into a rich, searchable, and interoperable resource by converting readily available semi-structured data to content-centric, searchable formats. To do this, ODIN mines scholarly papers and webpages for instances of linguistic data, focusing mostly on interlinear texts, extracts them, identifies source languages, and makes the instances available to search. Through ODIN's standard search feature, users can locate data by language name or Ethnologue code, and display lists of data by document for languages of interest. The newer Advanced Search feature allows users to locate instances by grammatical markup that is used (e.g., NOM, ACC, ERG, PST, 3SG), and by linguistic constructions (e.g., passives, conditionals, possessives, raising constructions, etc.). The latter are made possible through additional enrichment of discovered data using automated statistical taggers and parsers.
An ontology of linguistic annotations
, 2008
"... This paper describes development and design of an ontology of linguistic annotations, primarily word classes and morphosyntactic features, based on existing standardization approaches (e.g. EAGLES), a set of annotation schemes (e.g. for German, STTS and morphological annotations), and existing term ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
This paper describes development and design of an ontology of linguistic annotations, primarily word classes and morphosyntactic features, based on existing standardization approaches (e.g. EAGLES), a set of annotation schemes (e.g. for German, STTS and morphological annotations), and existing terminological resources (e.g. GOLD). The ontology is intended to be a platform for terminological integration, integrated representation and ontology-based search across existing linguistic resources with terminologically heterogeneous annotations. Further, it can be applied to augment the semantic analysis of a given text with an ontological interpretation of its morphosyntactic analysis.
Towards a General Model for Linguistic Paradigms
- IN PROCEEDINGS OF THE E-MELD WORKSHOP 2004: LINGUISTIC DATABASES AND BEST PRACTICE, JULY 15–18 2004
, 2004
"... Linguistic forms are inherently multi-dimensional. They exhibit a variety of phonological, orthographic, morphosyntactic, semantic and pragmatic properties. Accordingly, linguistic analysis involves multi-dimensional exploration, a process in which the same collection of forms are laid out in many w ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
Linguistic forms are inherently multi-dimensional. They exhibit a variety of phonological, orthographic, morphosyntactic, semantic and pragmatic properties. Accordingly, linguistic analysis involves multi-dimensional exploration, a process in which the same collection of forms are laid out in many ways until clear patterns emerge. Equally, language documentation usually contains tabulations of linguistic forms to illustrate systematic patterns and variations. In all such cases, multi-dimensional data is projected onto a two-dimensional table known as a linguistic paradigm, the most widespread format for linguistic data presentation. In this paper we survey a representative sample of paradigms and develop a simple relational data model. We show how XML technologies can be used to store and render paradigms. The result is a flexible and extensible model for the storage, interchange and delivery of linguistic paradigms.
Repurposing theoretical linguistic data for tool development and search
- In Proceedings of IJCNLP-2008
, 2008
"... For the majority of the world’s languages, the number of linguistic resources (e.g., annotated corpora and parallel data) is very limited. Consequently, supervised methods, as well as many unsupervised methods, cannot be applied directly, leaving these languages largely untouched and unnoticed. In t ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
For the majority of the world’s languages, the number of linguistic resources (e.g., annotated corpora and parallel data) is very limited. Consequently, supervised methods, as well as many unsupervised methods, cannot be applied directly, leaving these languages largely untouched and unnoticed. In this paper, we describe the construction of a resource that taps the large body of linguistically analyzed language data that has made its way to the Web, and propose using this resource to bootstrap NLP tool development. 1
Flexible Ontology Population from Text: The OwlExporter
- In: Int. Conf. on Language Resources and Evaluation (LREC
, 2010
"... Ontology population from text is becoming increasingly important for NLP applications. Ontologies in OWL format provide for a standardized means of modeling, querying, and reasoning over large knowledge bases. Populated from natural language texts, they offer significant advantages over traditional ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
Ontology population from text is becoming increasingly important for NLP applications. Ontologies in OWL format provide for a standardized means of modeling, querying, and reasoning over large knowledge bases. Populated from natural language texts, they offer significant advantages over traditional export formats, such as plain XML. The development of text analysis systems has been greatly facilitated by modern NLP frameworks, such as the General Architecture for Text Engineering (GATE). However, ontology population is not currently supported by a standard component. We developed a GATE resource called the OwlExporter that allows to easily map existing NLP analysis pipelines to OWL ontologies, thereby allowing language engineers to create ontology population systems without requiring extensive knowledge of ontology APIs. A particular feature of our approach is the concurrent population and linking of a domainand NLP-ontology, including NLP-specific features such as safe reasoning over coreference chains. 1.
Avoiding Data Graveyards: Deriving an Ontology for Accessing Heterogeneous Data Collections
- Proceedings of the International Workshop “Ontologies in Text Technology
, 2006
"... Abstract. In this paper, I describe derivation and practical application of an ontology of word classes manually derived from four different sources: – the EAGLES recommendations for the morphosyntactic annotation of corpora, – several language-specific, or task-specific tag sets for part-of-speech ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Abstract. In this paper, I describe derivation and practical application of an ontology of word classes manually derived from four different sources: – the EAGLES recommendations for the morphosyntactic annotation of corpora, – several language-specific, or task-specific tag sets for part-of-speech tagging, – the typologically-oriented SFB632 guidelines for part-of-speech tagging, and – the General Ontology for Linguistic Description (GOLD). The resulting ontology is intended to provide integrated representation and access to terminologically heterogeneous resources. It will be applied as part of a sustainable archive of linguistic resources to be developed by the project ”Sustainability of Linguistic Data”, a just-started joint initiative by three German special research centers. While in the first phase, the focus of the ontology development has been put on terminology for part-of-speech (POS) tagging which requires hand-crafted methods, a possible extension towards the semi-automatic integration of syntactic annotation will be sketched as an outlook. 1

