Results 1 - 10
of
94
Learning to Extract Symbolic Knowledge from the World Wide Web
, 1998
"... The World Wide Web is a vast source of information accessible to computers, but understandable only to humans. The goal of the research described here is to automatically create a computer understandable world wide knowledge base whose content mirrors that of the World Wide Web. Such a ..."
Abstract
-
Cited by 290 (24 self)
- Add to MetaCart
The World Wide Web is a vast source of information accessible to computers, but understandable only to humans. The goal of the research described here is to automatically create a computer understandable world wide knowledge base whose content mirrors that of the World Wide Web. Such a
Learning to Construct Knowledge Bases from the World Wide Web
, 2000
"... The World Wide Web is a vast source of information accessible to computers, but understandable only to humans. The goal of the research described here is to automatically create a computer understandable knowledge base whose content mirrors that of the World Wide Web. Such a knowledge base would ena ..."
Abstract
-
Cited by 187 (3 self)
- Add to MetaCart
The World Wide Web is a vast source of information accessible to computers, but understandable only to humans. The goal of the research described here is to automatically create a computer understandable knowledge base whose content mirrors that of the World Wide Web. Such a knowledge base would enable much more effective retrieval of Web information, and promote new uses of the Web to support knowledge-based inference and problem solving. Our approach is to develop a trainable information extraction system that takes two inputs. The first is an ontology that defines the classes (e.g., company, person, employee, product) and relations (e.g., employed_by, produced_by) of interest when creating the knowledge base. The second is a set of training data consisting of labeled regions of hypertext that represent instances of these classes and relations. Given these inputs, the system learns to extract information from other pages and hyperlinks on the Web. This article describes our general a...
RQL: A Declarative Query Language for RDF
"... Real-scale Semantic Web applications, such as Web Portals and E-Marketplaces, require the management of voluminous metadata repositories containing descriptive information (i.e., metadata) about the available Web resources and services. Better knowledge about the meaning, usage, accessibility or qua ..."
Abstract
-
Cited by 174 (19 self)
- Add to MetaCart
Real-scale Semantic Web applications, such as Web Portals and E-Marketplaces, require the management of voluminous metadata repositories containing descriptive information (i.e., metadata) about the available Web resources and services. Better knowledge about the meaning, usage, accessibility or quality of these resources and services will considerably facilitate the automated processing of both Web content and services. In this context, the Resource Description Framework (RDF) enables the creation and exchange of metadata as any other Web data. Although large volumes of RDF descriptions are already appearing (e.g., as exported Portal catalogs or service descriptions), sufficiently expressive declarative languages for querying both RDF descriptions and schemas are still missing. In this paper, we propose RQL, a new RDF query language, relying on a formal graph model that permits the interpretation of superimposed resource descriptions. RQL is an OQL-inspired adaptation of XML query languages to the peculiarities of RDF but, foremost, is an extension of this functionality for uniformly querying both descriptions and schemas. We illustrate the syntax, semantics and core functionality of RQL bymeans of a set of benchmark queries and report on the performance of RSSDB, our persistent RDF Store, for storing and querying voluminous RDF descriptions.
Query Optimization for XML
- In Proceedings of VLDB
, 1999
"... XML is an emerging standard for data representation and exchange on the World-Wide Web. Due to the nature of information on the Web and the inherent flexibility of XML, we expect that much of the data encoded in XML will be semistructured:the data may be irregular or incomplete, and its structu ..."
Abstract
-
Cited by 173 (2 self)
- Add to MetaCart
XML is an emerging standard for data representation and exchange on the World-Wide Web. Due to the nature of information on the Web and the inherent flexibility of XML, we expect that much of the data encoded in XML will be semistructured:the data may be irregular or incomplete, and its structure may change rapidly or unpredictably. This paper describes the query processor of Lore,aDBMS for XML-based data supporting an expressive query language. We focus primarily on Lore's cost-based query optimizer. While all of the usual problems associated with cost-based query optimization apply to XML-based query languages, a number of additional problems arise, such as new kinds of indexing, more complicated notions of database statistics, and vastly different query execution strategies for different databases. We define appropriate logical and physical query plans, database statistics, and a cost model, and we describe plan enumeration including heuristics for reducing the large search space. Our optimizer is fully implemented in Lore and preliminary performance results are reported.
From Manual to Semi-automatic Semantic Annotation: About Ontology-Based Text Annotation Tools
- IN P. BUITELAAR & K. HASIDA (EDS). PROCEEDINGS OF THE COLING 2000 WORKSHOP ON SEMANTIC ANNOTATION AND INTELLIGENT CONTENT
, 2000
"... Semantic Annotation is a basic technology for intelligent content and is beneficial in a wide range of contentoriented intelligent applications. In this paper we present our work in ontology-based semantic annotation, which is embedded in a scenario of a knowledge portal application. Starting with s ..."
Abstract
-
Cited by 63 (16 self)
- Add to MetaCart
Semantic Annotation is a basic technology for intelligent content and is beneficial in a wide range of contentoriented intelligent applications. In this paper we present our work in ontology-based semantic annotation, which is embedded in a scenario of a knowledge portal application. Starting with seemingly good and bad manual semantic annotation, we describe our experiences made within the KA²-initiative. The experiences gave us the starting point for developing an ergonomic and knowledge base-supported annotation tool. Furthermore, the annotation tool described are currently extended with mechanisms for semi-automatic information-extraction based annotation. Supporting the evolving nature of semantic content we additionally describe our idea of evolving ontologies supporting semantic annotation.
Semi-automatic Grammar Recovery
- SOFTWARE—PRACTICE & EXPERIENCE
, 2001
"... We proposed a new approach for the construction of grammars and parsers for existing languages. The approach is both very powerful and simple. We provided a structured process and explained our methods in detail so that others can apply our ideas for their own grammar construction activities. We ill ..."
Abstract
-
Cited by 39 (9 self)
- Add to MetaCart
We proposed a new approach for the construction of grammars and parsers for existing languages. The approach is both very powerful and simple. We provided a structured process and explained our methods in detail so that others can apply our ideas for their own grammar construction activities. We illustrated the proposed approach with a nontrivial case study. Using our process, we constructed in a few weeks a complete and correct VS COBOL II grammar specification for IBM mainframes. We not only constructed a parser for it, but also published a web-enabled grammar specification so that others can use this result to conveniently construct their own grammar-based tools for VS COBOL II, or derivatives.
XIRQL: An XML Query Language Based on Information Retrieval Concepts
, 2001
"... Most proposals for XML query languages are based on the data-centric view on XML and do not support uncertainty and vagueness, thus being insuitable for information retrieval (IR) of XML documents. Based on the document-centric view, we present the query language XIRQL which implements IR-related fe ..."
Abstract
-
Cited by 31 (2 self)
- Add to MetaCart
Most proposals for XML query languages are based on the data-centric view on XML and do not support uncertainty and vagueness, thus being insuitable for information retrieval (IR) of XML documents. Based on the document-centric view, we present the query language XIRQL which implements IR-related features such as weighting and ranking, relevance-oriented search, datatypes with vague predicates, and structural relativism. XIRQL integrates these features by using ideas from logic-based probabilistic IR models, in combination with concepts from the database area. For processing XIRQL queries, a path algebra is presented which also serves as a starting point for query optimization.
A Semantic Approach to Integrating XML and Structured Data Sources
, 2000
"... XML is fast becoming the standard for information exchange on the Internet. As such, information expressed in XML will need to be integrated with existing information systems, which are mostly based on structured data models such as relational, object-oriented or object /relational data models. T ..."
Abstract
-
Cited by 31 (6 self)
- Add to MetaCart
XML is fast becoming the standard for information exchange on the Internet. As such, information expressed in XML will need to be integrated with existing information systems, which are mostly based on structured data models such as relational, object-oriented or object /relational data models. This paper shows how our previous framework for integrating heterogeneous structured data sources can also be used for integrating XML data sources with each other and/or with other structured data sources. In our approach, the constructs and transformations of modelling languages such as ER, XML etc. are defined in terms of the constructs and transformations of a lower-level graphbased data model. This allows constructs from multiple modelling languages to co-exist within the same intermediate schema, thus avoiding the need for a high-level common data model and the semantic mismatches that this can bring about. Transformations between schemas are expressed as sequences of primitive transformations and a key feature of them is that they are automatically reversible. This allows automatic translation of data, queries and updates between semantically equivalent or overlapping heterogenous schemas.

