Results 1 - 10
of
29
Integration of Heterogeneous Databases Without Common Domains Using Queries Based on Textual Similarity
, 1998
"... Most databases contain "name constants" like course numbers, personal names, and place names that correspond to entities in the real world. Previous work in integration of heterogeneous databases has assumed that local name constants can be mapped into an appropriate global domain by normalization. ..."
Abstract
-
Cited by 193 (13 self)
- Add to MetaCart
Most databases contain "name constants" like course numbers, personal names, and place names that correspond to entities in the real world. Previous work in integration of heterogeneous databases has assumed that local name constants can be mapped into an appropriate global domain by normalization. However, in many cases, this assumption does not hold; determining if two name constants should be considered identical can require detailed knowledge of the world, the purpose of the user's query, or both. In this paper, we reject the assumption that global domains can be easily constructed, and assume instead that the names are given in natural language text. We then propose a logic called WHIRL which reasons explicitly about the similarity of local names, as measured using the vector-space model commonly adopted in statistical information retrieval. We describe an efficient implementation of WHIRL and evaluate it experimentally on data extracted from the World Wide Web. We show that WHIR...
Ontology-Based Integration of Information - A Survey of Existing Approaches
, 2001
"... We review the use on ontologies for the integration of heterogeneous information sources. Based on an in-depth evaluation of existing approaches to this problem we discuss how ontologies are used to support the integration task. We evaluate and compare the languages used to represent the ontologies ..."
Abstract
-
Cited by 171 (1 self)
- Add to MetaCart
We review the use on ontologies for the integration of heterogeneous information sources. Based on an in-depth evaluation of existing approaches to this problem we discuss how ontologies are used to support the integration task. We evaluate and compare the languages used to represent the ontologies and the use of mappings between ontologies as well as to connect ontologies with information sources. We also enquire into ontology engineering methods and tools used to develop ontologies for information integration. Based on the results of our analysis we summarize the state-of-the-art in ontology-based information integration and name areas of further research activities.
Generating Finite-State Transducers For Semi-Structured Data Extraction From The Web
, 1998
"... Integrating a large number of Web information sources may significantly increase the utility of the World-Wide Web. A promising solution to the integration is through the use of a Web Information mediator that provides seamless, transparent access for the clients. Information mediators need wrappers ..."
Abstract
-
Cited by 126 (3 self)
- Add to MetaCart
Integrating a large number of Web information sources may significantly increase the utility of the World-Wide Web. A promising solution to the integration is through the use of a Web Information mediator that provides seamless, transparent access for the clients. Information mediators need wrappers to access a Web source as a structured database, but building wrappers by hand is impractical. Previous work on wrapper induction is too restrictive to handle a large number of Web pages that contain tuples with missing attributes, multiple values, variant attribute permutations, exceptions and typos. This paper presents SoftMealy, a novel wrapper representation formalism. This representation is based on a finite-state transducer (FST) and contextual rules. This approach can wrap a wide range of semistructured Web pages because FSTs can encode each different attribute permutation as a path. A SoftMealy wrapper can be induced from a handful of labeled examples using our generalization algori...
LARKS: Dynamic Matchmaking Among Heterogeneous Software Agents in Cyberspace
- in Cyberspace. Autonomous Agents and Multi-Agent Systems
, 2002
"... Introduction Theaeb{@ of servicesar deployed softwad aftwa in the mostfatb@ offspring of the Internet, the World Wide Web, isexponentiakp increatia Inabp{q}}b the Internet is a open environment, where infor maorb sources,communica}}{ links an anksb themselvesma aems a disaselv unpredicta}{} Thus,a ..."
Abstract
-
Cited by 114 (10 self)
- Add to MetaCart
Introduction Theaeb{@ of servicesar deployed softwad aftwa in the mostfatb@ offspring of the Internet, the World Wide Web, isexponentiakp increatia Inabp{q}}b the Internet is a open environment, where infor maorb sources,communica}}{ links an anksb themselvesma aems a disaselv unpredicta}{} Thus,a effective, affectiv seaec aa selection ofrelevaq services orab@pk isessentia forhuma usersae aersba well. We distinguish threegenera aner cara{p}fi in theCyberspaC{ service providers, service requester,a{ middle agents. Service providers provide some type of service, sucha finding informaorb} or performing somepab@}@v@b doma@ specific problem solving. Requesterauest need provideraovid to perform some service for them. Agentstha helploca{ othersah caers middle addle [6]. Matchmaking is the # Thisreseapv ha been sponsored inpa} by Office ofNa@{ Resea@} gra N-00014-96-16-1-1222, aby DARPAgraD F-30602-98-2-0138. 174 sycara et al. process of findinga aingb{fi@{v provider fora requester througha
Semantic Web Support for the Business-to-Business E-Commerce Lifecycle
, 2002
"... widespread, standardisation of ontologies, message content and message protocols will be necessary. In this paper, we present a lifecycle of a business-to-business e-commerce interaction, and show how the Semantic Web can support a service description language that can be used throughout this lifecy ..."
Abstract
-
Cited by 74 (4 self)
- Add to MetaCart
widespread, standardisation of ontologies, message content and message protocols will be necessary. In this paper, we present a lifecycle of a business-to-business e-commerce interaction, and show how the Semantic Web can support a service description language that can be used throughout this lifecycle. By using DAML+OIL, we develop a service description language su#ciently expressive and flexible to be used not only in advertisements, but also in matchmaking queries, negotiation proposals and agreements. We also identify which operations must be carried out on this description language if the B2B lifecycle is to be fully supported. We do not propose specific standard protocols, but instead argue that our operators are able to support a wide variety of interaction protocols, and so will be fundamental irrespective of which protocols are finally adopted.
Data Integration Using Similarity Joins and a Word-Based Information Representation Language
- ACM TRANSACTIONS ON INFORMATION SYSTEMS
, 2000
"... ..."
A Web-based Information System that Reasons with Structured Collections of Text
- In Agents '98
, 1998
"... The degree to which information sources are pre-processed by Web-based information systems varies greatly. In search engines like Altavista, little pre-processing is done, while in "knowledge integration" systems, complex site-specific "wrappers" are used integrate different information sources into ..."
Abstract
-
Cited by 53 (7 self)
- Add to MetaCart
The degree to which information sources are pre-processed by Web-based information systems varies greatly. In search engines like Altavista, little pre-processing is done, while in "knowledge integration" systems, complex site-specific "wrappers" are used integrate different information sources into a common database representation. In this paper we describe an intermediate between these two models. In our system, information sources are converted into a highly structured collection of small fragments of text. Databaselike queries to this structured collection of text fragments are approximated using a novel logic called WHIRL, which combines inference in the style of deductive databases with ranked retrieval methods from information retrieval. WHIRL allows queries that integrate information from multiple Web sites, without requiring the extraction and normalization of object identifiers that can be used as keys; instead, operations that in conventional databases require equality tests...
WHIRL: A Word-based Information Representation Language
- Artificial Intelligence
, 1999
"... We describe WHIRL, an "information representation language" that synergistically combines properties of logic-based and text-based representation systems. WHIRL is a subset of Datalog that has been extended by introducing an atomic type for textual entities, an atomic operation for computing textual ..."
Abstract
-
Cited by 24 (1 self)
- Add to MetaCart
We describe WHIRL, an "information representation language" that synergistically combines properties of logic-based and text-based representation systems. WHIRL is a subset of Datalog that has been extended by introducing an atomic type for textual entities, an atomic operation for computing textual similarity, and a "soft" semantics; that is, inferences in WHIRL are associated with numeric scores, and presented to the user in decreasing order by score. This paper briefly describes WHIRL, and then surveys a number of applications. We show that WHIRL strictly generalizes both ranked retrieval of documents, and logical deduction; that non-trivial queries about large databases can be answered eciently; that WHIRL can be used to accurately integrate data from heterogeneous information sources, such as those found on the Web; that WHIRL can be used effectively for inductive classification of text; and nally, that WHIRL can be used to semi-automatically generate extraction programs for structured documents.
Enabling Technologies for Interoperability
- TZI, University of Bremen
, 2000
"... We present a new approach, which proposes to minimize the numerous problems existing in order to have fully interoperable GIS. We discuss the existence of these heterogeneity problems and the fact that they must be solved to achieve interoperability. These problems are addressed on three levels ..."
Abstract
-
Cited by 14 (4 self)
- Add to MetaCart
We present a new approach, which proposes to minimize the numerous problems existing in order to have fully interoperable GIS. We discuss the existence of these heterogeneity problems and the fact that they must be solved to achieve interoperability. These problems are addressed on three levels: the syntactic, structural and semantic level. In addition, we identify the needs for an approach performing semantic translation for interoperability and introduce a uniform description of contexts. Furthermore, we discuss a conceptual architecture Buster (Bremen University Semantic Translation for Enhanced Retrieval) which can provide intelligent information integration based on a reclassification of information entities in a new context. Lastly, we demonstrate our theories by sketching a real life scenario.
Knowledge Integration for Structured Information Sources Containing Text (Extended Abstract)
- In The SIGIR-97 Workshop on Networked Information Retrieval
, 1997
"... ) William W. Cohen AT&T Labs---Research 180 Park Avenue, Florham Park NJ 07932 wcohen@research.att.com August 1, 1997 Abstract Knowledge integration is the integration of distributed, heterogeneous databases, such as those available on the World Wide Web. In this paper we will consider a new type ..."
Abstract
-
Cited by 10 (6 self)
- Add to MetaCart
) William W. Cohen AT&T Labs---Research 180 Park Avenue, Florham Park NJ 07932 wcohen@research.att.com August 1, 1997 Abstract Knowledge integration is the integration of distributed, heterogeneous databases, such as those available on the World Wide Web. In this paper we will consider a new type of knowledge integration problem, namely, the problem of combining information from relations that lack common object identifiers. A general technique for this problem is proposed, based on well-studied similarity measures for text, and the observation that Web-based databases often present their information to the end user through a veneer of text. We describe an extension of Datalog called WHIRL which allows passages of ordinary text to be used as keys. WHIRL supports documents as a built-in type, similarity reasoning with a built-in predicate, and answers every query with a list of answer substutitions that are ranked according to an overall similarity score. Experiments with a prototype...

