Results 1 - 10
of
128
Learning to Match Ontologies on the Semantic Web
, 2003
"... On the Semantic Web, data will inevitably come from many different ontologies, and information processing across ontologies is not possible without knowing the semantic mappings between them. Manually finding such mappings is tedious, error-prone, and clearly not possible at the Web scale. Hence, th ..."
Abstract
-
Cited by 126 (2 self)
- Add to MetaCart
On the Semantic Web, data will inevitably come from many different ontologies, and information processing across ontologies is not possible without knowing the semantic mappings between them. Manually finding such mappings is tedious, error-prone, and clearly not possible at the Web scale. Hence, the development of tools to assist in the ontology mapping process is crucial to the success of the Semantic Web. We describe GLUE, a system that employs machine learning techniques to find such mappings. Given two ontologies, for each concept in one ontology GLUE finds the most similar concept in the other ontology. We give well-founded probabilistic definitions to several practical similarity measures, and show that GLUE can work with all of them. Another key feature of GLUE is that it uses multiple learning strategies, each of which exploits well a different type of information either in the data instances or in the taxonomic structure of the ontologies. To further improve matching accuracy, we extend GLUE to incorporate commonsense knowledge and domain constraints into the matching process. Our approach is thus distinguished in that it works with a variety of well-defined similarity notions and that it efficiently incorporates multiple types of knowledge. We describe a set of experiments on several real-world domains, and show that GLUE proposes highly accurate semantic mappings. Finally, we extend GLUE to find complex mappings between ontologies, and describe experiments that show the promise of the approach.
An Integrative Proximity Measure for Ontology Alignment
, 2003
"... Integrating heterogeneous resources of the web will require finding agreement between the underlying ontologies. A variety of methods from the literature may be used for this task, basically they perform pair-wise comparison of entities from each of the ontologies and select the most similar pairs. ..."
Abstract
-
Cited by 51 (4 self)
- Add to MetaCart
Integrating heterogeneous resources of the web will require finding agreement between the underlying ontologies. A variety of methods from the literature may be used for this task, basically they perform pair-wise comparison of entities from each of the ontologies and select the most similar pairs. We introduce a similarity measure that takes advantage of most of the features of OWL-Lite ontologies and integrates many ontology comparison techniques in a common framework. Moreover, we put forth a computation technique to deal with one-to-many relations and circularities in the similarity definitions.
Scaling Semantic Parsers with On-the-fly Ontology Matching
"... We consider the challenge of learning semantic parsers that scale to large, open-domain problems, such as question answering with Freebase. In such settings, the sentences cover a wide variety of topics and include many phrases whose meaning is difficult to represent in a fixed target ontology. For ..."
Abstract
-
Cited by 50 (6 self)
- Add to MetaCart
(Show Context)
We consider the challenge of learning semantic parsers that scale to large, open-domain problems, such as question answering with Freebase. In such settings, the sentences cover a wide variety of topics and include many phrases whose meaning is difficult to represent in a fixed target ontology. For example, even simple phrases such as ‘daughter’ and ‘number of people living in ’ cannot be directly represented in Freebase, whose ontology instead encodes facts about gender, parenthood, and population. In this paper, we introduce a new semantic parsing approach that learns to resolve such ontological mismatches. The parser is learned from question-answer pairs, uses a probabilistic CCG to build linguistically motivated logicalform meaning representations, and includes an ontology matching model that adapts the output logical forms for each target ontology. Experiments demonstrate state-of-the-art performance on two benchmark semantic parsing datasets, including a nine point accuracy improvement on a recent Freebase QA corpus. 1
Sambo - a system for aligning and merging biomedical ontologies
- Journal of Web Semantics
, 2006
"... Due to the recent explosion of the amount of on-line accessible biomedical data and tools, finding and retrieving the relevant information is not an easy task. The vision of a Semantic Web for life sciences alleviates these difficulties. A key technology for the Semantic Web are ontologies. In recen ..."
Abstract
-
Cited by 43 (11 self)
- Add to MetaCart
(Show Context)
Due to the recent explosion of the amount of on-line accessible biomedical data and tools, finding and retrieving the relevant information is not an easy task. The vision of a Semantic Web for life sciences alleviates these difficulties. A key technology for the Semantic Web are ontologies. In recent years many biomedical ontologies have been developed and many of these ontologies contain overlapping information. To be able to use multiple ontologies they have to be aligned or merged. In this paper we propose a framework for aligning and merging ontologies. Further, we developed a system for aligning and merging biomedical ontologies (SAMBO) based on this framework. The framework is also a first step towards a general framework that can be used for comparative evaluations of alignment strategies and their combinations. In this paper we evaluated different strategies and their combinations in terms of quality and processing time and compared SAMBO with two other systems.
Leveraging data and structure in ontology integration
- In SIGMOD Conference
, 2007
"... There is a great deal of research on ontology integration which makes use of rich logical constraints to reason about the structural and logical alignment of ontologies. There is also considerable work on matching data instances from heterogeneous schema or ontologies. However, little work exploits ..."
Abstract
-
Cited by 38 (3 self)
- Add to MetaCart
(Show Context)
There is a great deal of research on ontology integration which makes use of rich logical constraints to reason about the structural and logical alignment of ontologies. There is also considerable work on matching data instances from heterogeneous schema or ontologies. However, little work exploits the fact that ontologies include both data and structure. We aim to close this gap by presenting a new algorithm (ILIADS) that tightly integrates both data matching and logical reasoning to achieve better matching of ontologies. We evaluate our algorithm on a set of 30 pairs of OWL Lite ontologies with the schema and data matchings found by human reviewers. We compare against two systems-the ontology matching tool FCA-merge [28] and the schema matching tool COMA++ [1]. ILIADS shows an average improvement of 25 % in quality over FCA-merge and a 11% improvement in recall over COMA++.
A Bayesian model for supervised clustering with the Dirichlet process prior
- Journal of Machine Learning Research
, 2005
"... We develop a Bayesian framework for tackling the supervised clustering problem, the generic problem encountered in tasks such as reference matching, coreference resolution, identity uncertainty and record linkage. Our clustering model is based on the Dirichlet process prior, which enables us to defi ..."
Abstract
-
Cited by 30 (0 self)
- Add to MetaCart
(Show Context)
We develop a Bayesian framework for tackling the supervised clustering problem, the generic problem encountered in tasks such as reference matching, coreference resolution, identity uncertainty and record linkage. Our clustering model is based on the Dirichlet process prior, which enables us to define distributions over the countably infinite sets that naturally arise in this problem. We add supervision to our model by positing the existence of a set of unobserved random variables (we call these “reference types”) that are generic across all clusters. Inference in our framework, which requires integrating over infinitely many parameters, is solved using Markov chain Monte Carlo techniques. We present algorithms for both conjugate and non-conjugate priors. We present a simple—but general—parameterization of our model based on a Gaussian assumption. We evaluate this model on one artificial task and three real-world tasks, comparing it against both unsupervised and state-of-the-art supervised algorithms. Our results show that our model is able to outperform other models across a variety of tasks and performance metrics.
Large-scale Semantic Parsing via Schema Matching and Lexicon Extension
- In Proceedings of the Annual Meeting of the Association for Computational Linguistics
, 2013
"... Supervised training procedures for semantic parsers produce high-quality semantic parsers, but they have difficulty scaling to large databases because of the sheer number of logical constants for which they must see labeled training data. We present a technique for developing semantic parsers for la ..."
Abstract
-
Cited by 28 (0 self)
- Add to MetaCart
(Show Context)
Supervised training procedures for semantic parsers produce high-quality semantic parsers, but they have difficulty scaling to large databases because of the sheer number of logical constants for which they must see labeled training data. We present a technique for developing semantic parsers for large databases based on a reduction to standard supervised training algorithms, schema matching, and pattern learning. Leveraging techniques from each of these areas, we develop a semantic parser for Freebase that is capable of parsing questions with an F1 that improves by 0.42 over a purely-supervised learning algorithm. 1
H.: Association rule ontology matching approach
- International Journal on Semantic Web and Information Systems
, 2007
"... This paper presents a hybrid, extensional and asymmetric matching approach designed to find out semantic relations (equivalence and subsumption) between entities issued from two textual taxonomies (web directories or OWL ontologies). By using the association rule paradigm and a statistical measure d ..."
Abstract
-
Cited by 22 (3 self)
- Add to MetaCart
(Show Context)
This paper presents a hybrid, extensional and asymmetric matching approach designed to find out semantic relations (equivalence and subsumption) between entities issued from two textual taxonomies (web directories or OWL ontologies). By using the association rule paradigm and a statistical measure developed in this context, this method relies on the following idea: “An entity A will be more specific than or equivalent to an entity B if the vocabulary (i.e. terms and data) used to describe A and its instances tends to be included in that of B and its instances”. This matching approach is divided into two parts: (1) The representation of each entity by a set of relevant terms and data; (2) The discovery of binary association rules between entities. The selection of rules uses two criteria. The first one permits to assess the implication quality by using implication intensity measure. The second criterion verifies the generativity of the rule and then permits to reduce redundancy. Finally, the proposed method is evaluated on two benchmarks. The first contains two conceptual hierarchies containing textual documents and the second one is composed of OWL ontologies. The experimentations show that the method obtains good precision values and also permits to discover meaningful subsumptions that are not taken into account by similarity-based approaches. 1
Classifying search engine queries using the web as background knowledge
- In SIGKDD Explorations, volume 7. ACM, 2005
"... The performance of search engines crucially depends on their ability to capture the meaning of a query most likely in-tended by the user. We study the problem of mapping a search engine query to those nodes of a given subject tax-onomy that characterize its most likely meanings. We de-scribe the arc ..."
Abstract
-
Cited by 21 (0 self)
- Add to MetaCart
(Show Context)
The performance of search engines crucially depends on their ability to capture the meaning of a query most likely in-tended by the user. We study the problem of mapping a search engine query to those nodes of a given subject tax-onomy that characterize its most likely meanings. We de-scribe the architecture of a classification system that uses a web directory to identify the subject context that the query terms are frequently used in. Based on its performance on the classification of 800,000 example queries recorded from MSN search, the system received the Runner-Up Award for Query Categorization Performance of the KDD Cup 2005. 1.
L.: Context integration for mobile data tailoring
- In: Proceedings of 7th Int. Conference on Mobile Data Management (MDM’06
, 2006
"... Independent, heterogeneous, distributed, sometimes transient and mobile data sources produce an enormous amount of information that should be semantically inte-grated and filtered, or, as we say, tailored, based on the user’s interests and context. Since both the user and the data sources can be mob ..."
Abstract
-
Cited by 19 (15 self)
- Add to MetaCart
(Show Context)
Independent, heterogeneous, distributed, sometimes transient and mobile data sources produce an enormous amount of information that should be semantically inte-grated and filtered, or, as we say, tailored, based on the user’s interests and context. Since both the user and the data sources can be mobile, and the communication might be unreliable, caching the information on the user device may become really useful. Therefore new challenges have to be faced such as: data filtering in a context-aware fash-ion, integration of not-known-in-advance data sources, au-tomatic extraction of the semantics. We propose a novel system named Context-ADDICT (Context-Aware Data Design, Integration, Customization and Tailoring) able to deal with the described scenario. The system we are designing aims at tailoring the available information to the needs of the current user in the current context, in order to offer a more manageable amount of in-formation; such information is to be cached on the user’s device according to policies defined at design-time, to cope with data source transiency. This paper focuses on the information representation and tailoring problem and on the definition of the global architecture of the system. 1.