Results 1 - 10
of
41
Learning to Match Ontologies on the Semantic Web
, 2003
"... On the Semantic Web, data will inevitably come from many different ontologies, and information processing across ontologies is not possible without knowing the semantic mappings between them. Manually finding such mappings is tedious, error-prone, and clearly not possible at the Web scale. Hence, th ..."
Abstract
-
Cited by 65 (2 self)
- Add to MetaCart
On the Semantic Web, data will inevitably come from many different ontologies, and information processing across ontologies is not possible without knowing the semantic mappings between them. Manually finding such mappings is tedious, error-prone, and clearly not possible at the Web scale. Hence, the development of tools to assist in the ontology mapping process is crucial to the success of the Semantic Web. We describe GLUE, a system that employs machine learning techniques to find such mappings. Given two ontologies, for each concept in one ontology GLUE finds the most similar concept in the other ontology. We give well-founded probabilistic definitions to several practical similarity measures, and show that GLUE can work with all of them. Another key feature of GLUE is that it uses multiple learning strategies, each of which exploits well a different type of information either in the data instances or in the taxonomic structure of the ontologies. To further improve matching accuracy, we extend GLUE to incorporate commonsense knowledge and domain constraints into the matching process. Our approach is thus distinguished in that it works with a variety of well-defined similarity notions and that it efficiently incorporates multiple types of knowledge. We describe a set of experiments on several real-world domains, and show that GLUE proposes highly accurate semantic mappings. Finally, we extend GLUE to find complex mappings between ontologies, and describe experiments that show the promise of the approach.
An Integrative Proximity Measure for Ontology Alignment
, 2003
"... Integrating heterogeneous resources of the web will require finding agreement between the underlying ontologies. A variety of methods from the literature may be used for this task, basically they perform pair-wise comparison of entities from each of the ontologies and select the most similar pairs. ..."
Abstract
-
Cited by 31 (3 self)
- Add to MetaCart
Integrating heterogeneous resources of the web will require finding agreement between the underlying ontologies. A variety of methods from the literature may be used for this task, basically they perform pair-wise comparison of entities from each of the ontologies and select the most similar pairs. We introduce a similarity measure that takes advantage of most of the features of OWL-Lite ontologies and integrates many ontology comparison techniques in a common framework. Moreover, we put forth a computation technique to deal with one-to-many relations and circularities in the similarity definitions.
Leveraging data and structure in ontology integration
- In SIGMOD Conference
, 2007
"... There is a great deal of research on ontology integration which makes use of rich logical constraints to reason about the structural and logical alignment of ontologies. There is also considerable work on matching data instances from heterogeneous schema or ontologies. However, little work exploits ..."
Abstract
-
Cited by 20 (3 self)
- Add to MetaCart
There is a great deal of research on ontology integration which makes use of rich logical constraints to reason about the structural and logical alignment of ontologies. There is also considerable work on matching data instances from heterogeneous schema or ontologies. However, little work exploits the fact that ontologies include both data and structure. We aim to close this gap by presenting a new algorithm (ILIADS) that tightly integrates both data matching and logical reasoning to achieve better matching of ontologies. We evaluate our algorithm on a set of 30 pairs of OWL Lite ontologies with the schema and data matchings found by human reviewers. We compare against two systems-the ontology matching tool FCA-merge [28] and the schema matching tool COMA++ [1]. ILIADS shows an average improvement of 25 % in quality over FCA-merge and a 11% improvement in recall over COMA++.
Sambo - a system for aligning and merging biomedical ontologies
- Journal of Web Semantics
, 2006
"... Due to the recent explosion of the amount of on-line accessible biomedical data and tools, finding and retrieving the relevant information is not an easy task. The vision of a Semantic Web for life sciences alleviates these difficulties. A key technology for the Semantic Web are ontologies. In recen ..."
Abstract
-
Cited by 19 (8 self)
- Add to MetaCart
Due to the recent explosion of the amount of on-line accessible biomedical data and tools, finding and retrieving the relevant information is not an easy task. The vision of a Semantic Web for life sciences alleviates these difficulties. A key technology for the Semantic Web are ontologies. In recent years many biomedical ontologies have been developed and many of these ontologies contain overlapping information. To be able to use multiple ontologies they have to be aligned or merged. In this paper we propose a framework for aligning and merging ontologies. Further, we developed a system for aligning and merging biomedical ontologies (SAMBO) based on this framework. The framework is also a first step towards a general framework that can be used for comparative evaluations of alignment strategies and their combinations. In this paper we evaluated different strategies and their combinations in terms of quality and processing time and compared SAMBO with two other systems.
Classifying search engine queries using the web as background knowledge
- In SIGKDD Explorations. Vol. 7. ACM
, 2005
"... The performance of search engines crucially depends on their ability to capture the meaning of a query most likely intended by the user. We study the problem of mapping a search engine query to those nodes of a given subject taxonomy that characterize its most likely meanings. We describe the archit ..."
Abstract
-
Cited by 15 (0 self)
- Add to MetaCart
The performance of search engines crucially depends on their ability to capture the meaning of a query most likely intended by the user. We study the problem of mapping a search engine query to those nodes of a given subject taxonomy that characterize its most likely meanings. We describe the architecture of a classification system that uses a web directory to identify the subject context that the query terms are frequently used in. Based on its performance on the classification of 800,000 example queries recorded from MSN search, the system received the Runner-Up Award for Query Categorization Performance of the KDD Cup 2005. 1.
A bayesian model for supervised clustering with the dirichlet process prior
- Journal of Machine Learning Research
, 2005
"... We develop a Bayesian framework for tackling the supervised clustering problem, the generic problem encountered in problems such as reference matching, coreference resolution, identity uncertainty and record linkage. Our clustering model is based on the Dirichlet process prior, which enables us to d ..."
Abstract
-
Cited by 14 (0 self)
- Add to MetaCart
We develop a Bayesian framework for tackling the supervised clustering problem, the generic problem encountered in problems such as reference matching, coreference resolution, identity uncertainty and record linkage. Our clustering model is based on the Dirichlet process prior, which enables us to define distributions over the countably infinite sets that naturally arise in this problem. We add supervision to our model by positing the existence of a set of unobserved random variables (we call these “reference types”) that are generic across all clusters. Inference in our framework, which require integrating over infinitely many parameters, is solved using Markov chain Monte Carlo techniques. We present algorithms for both conjugate and non-conjugate priors. We present a simple – but general – parameterization of our model based on a Gaussian assumption. We evaluate this model on one artificial task and three real-world tasks, comparing it against both unsupervised and state-of-the-art supervised algorithms. Our results show that our model is able to outperform other models across a variety of tasks, performance metrics, and problem settings Keywords:
Instance Matching with COMA++
"... Abstract: Schema matching is the process of identifying semantic correspondences between schemas. COMA++ is a matching prototype which uses several characteristics of schemas to determine similarities between them, for example the names and data types of the schema elements and structural informatio ..."
Abstract
-
Cited by 9 (1 self)
- Add to MetaCart
Abstract: Schema matching is the process of identifying semantic correspondences between schemas. COMA++ is a matching prototype which uses several characteristics of schemas to determine similarities between them, for example the names and data types of the schema elements and structural information. In this paper we propose two instance-based matchers for COMA++ to gain a further quality improvement. The features of the matchers and first results are described. 1
A generic component for exchanging user models between web-based systems
, 2006
"... Educational web-based systems exemplify the increasing need for personalisation. Applications that adapt to individual users need a model of the user that contains as accurate data as possible. On the web, learners use multiple educational systems and spend their time over many applications: these a ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
Educational web-based systems exemplify the increasing need for personalisation. Applications that adapt to individual users need a model of the user that contains as accurate data as possible. On the web, learners use multiple educational systems and spend their time over many applications: these are individually limited in their user modelling but can gain from joining forces. This boils down to establishing semantic interoperability of user or learner models. While semantic interoperability is hard, the emerging Semantic Web (SW) might offer just the mechanisms we need. In this paper, we develop the Generic User model Component (GUC): a generic software that utilises SW technology to support the exchange of user model data between applications. For a semantically effective user model exchange, GUC allows the configuration of a distributed management of mappings between user models. Thus, applications can choose different levels of uniting user models to maximise their personalisation.
Actively Learning Ontology Matching via User Interaction ⋆
"... Abstract. Ontology matching plays a key role for semantic interoperability. Many methods have been proposed for automatically finding the alignment between heterogeneous ontologies. However, in many real-world applications, finding the alignment in a completely automatic way is highly infeasible. Id ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
Abstract. Ontology matching plays a key role for semantic interoperability. Many methods have been proposed for automatically finding the alignment between heterogeneous ontologies. However, in many real-world applications, finding the alignment in a completely automatic way is highly infeasible. Ideally, an ontology matching system would have an interactive interface to allow users to provide feedbacks to guide the automatic algorithm. Fundamentally, we need answer the following questions: How can a system perform an efficiently interactive process with the user? How many interactions are sufficient for finding a more accurate matching? To address these questions, we propose an active learning framework for ontology matching, which tries to find the most informative candidate matches to query the user. The user’s feedbacks are used to: 1) correct the mistake matching and 2) propagate the supervise information to help the entire matching process. Three measures are proposed to estimate the confidence of each matching candidate. A correct propagation algorithm is further proposed to maximize the spread of the user’s “guidance”. Experimental results on several public data sets show that the proposed approach can significantly improve the matching accuracy (+8.0 % better than the baseline methods). 1

