Results 1 - 10
of
16
oro.open.ac.uk Unsupervised Learning of Link Discovery Configuration
"... and other research outputs ..."
(Show Context)
Discovering Linkage Points over Web Data
"... A basic step in integration is the identification of linkage points, i.e., finding attributes that are shared (or related) between data sources, and that can be used to match records or entities across sources. This is usually performed using a match operator, that associates attributes of one datab ..."
Abstract
-
Cited by 10 (1 self)
- Add to MetaCart
(Show Context)
A basic step in integration is the identification of linkage points, i.e., finding attributes that are shared (or related) between data sources, and that can be used to match records or entities across sources. This is usually performed using a match operator, that associates attributes of one database to another. However, the massive growth in the amount and variety of unstructured and semistructured data on the Web has created new challenges for this task. Such data sources often do not have a fixed pre-defined schema and contain large numbers of diverse attributes. Furthermore, the end goal is not schema alignment as these schemas may be too heterogeneous (and dynamic) to meaningfully align. Rather, the goal is to align any overlapping data shared by these sources. We will show that even attributes with different meanings (that would not qualify as schema matches) can sometimes be useful in aligning data. The solution we propose in this paper replaces the basic schemamatching step with a more complex instance-based schema analysis and linkage discovery. We present a framework consisting of a library of efficient lexical analyzers and similarity functions, and a set of search algorithms for effective and efficient identification of linkage points over Web data. We experimentally evaluate the effectiveness of our proposed algorithms in real-world integration scenarios in several domains.
B.: SLINT: A Schema-Independent Linked Data Interlinking System
- In: Proc. of the 7th International Workshop on Ontology Matching
, 2012
"... Abstract. Linked data interlinking is the discovery of all instances that represent the same real-world object and locate in different data sources. Since different data publishers frequently use different schemas for storing resources, we aim at developing a schema-independent interlinking system. ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
(Show Context)
Abstract. Linked data interlinking is the discovery of all instances that represent the same real-world object and locate in different data sources. Since different data publishers frequently use different schemas for storing resources, we aim at developing a schema-independent interlinking system. Our system automatically selects important predicates and useful predicate alignments, which are used as the key for blocking and instance matching. The key distinction of our system is the use of weighted co-occurrence and adaptive filtering in blocking and instance matching. Experimental results show that the system highly improves the precision and recall over some recent ones. The performance of the system and the efficiency of main steps are also discussed.
Unsupervised Learning of Link Specifications: Deterministic vs. Non-Deterministic
"... Abstract. Link Discovery has been shown to be of utter importance for the Linked Data Web. In previous works, several supervised approaches have been developed for learning link specifications out of labelled data. Most recently, genetic programming has also been utilized to learn link specification ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
(Show Context)
Abstract. Link Discovery has been shown to be of utter importance for the Linked Data Web. In previous works, several supervised approaches have been developed for learning link specifications out of labelled data. Most recently, genetic programming has also been utilized to learn link specifications in an unsupervised fashion by optimizing a parametrized pseudo-F-measure. The questions underlying this evaluation paper are twofold: First, how well do pseudo-F-measures predict the real accuracy of non-deterministic and deterministic approaches across different types of datasets? Second, how do deterministic approaches compare to non-deterministic approaches? To answer these questions, we evaluated linear and Boolean classifiers against classifiers computed by using genetic programming on six different data sets. We also studied the correlation between two different pseudo-F-measures and the real F-measures achieved by the classifiers at hand. Our evaluation suggests that pseudo-F-measures behave differently on the synthetic and real data sets. 1
Link discovery with guaranteed reduction ratio in affine spaces with minkowski measures.
- In Proceedings of ISWC,
, 2012
"... ..."
SAIM – One Step Closer to Zero-Configuration Link Discovery
"... Abstract. Link discovery plays a central role in the implementation of the Linked Data vision. In this demo paper, we present SAIM, a tool that aims to support users during the creation of high-quality link specifications. The tool implements a simple but effective workflow to creating initial link ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
(Show Context)
Abstract. Link discovery plays a central role in the implementation of the Linked Data vision. In this demo paper, we present SAIM, a tool that aims to support users during the creation of high-quality link specifications. The tool implements a simple but effective workflow to creating initial link specifications. In addition, SAIM implements a variety of state-of-the-art machine-learning algorithms for unsupervised, semi-supervised and supervised instance matching on structured data. We demonstrate SAIM by using benchmark data such as the OAEI datasets.
Raven: Towards zero-configuration link discovery
- In Proceedings of OM@ISWC
, 2011
"... Abstract. With the growth of the Linked Data Web, time-efficient ap-proaches for computing links between data sources have become indis-pensable. Yet, in many cases, determining the right specification for a link discovery problem is a tedious task that must still be carried out manually. In this ar ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
(Show Context)
Abstract. With the growth of the Linked Data Web, time-efficient ap-proaches for computing links between data sources have become indis-pensable. Yet, in many cases, determining the right specification for a link discovery problem is a tedious task that must still be carried out manually. In this article we present RAVEN, an approach for the semi-automatic determination of link specifications. Our approach is based on the combination of stable solutions of matching problems and active learning leveraging the time-efficient link discovery framework LIMES. RAVEN is designed to require a small number of interactions with the user in order to generate classifiers of high accuracy. We focus with RAVEN on the computation and configuration of Boolean and weighted classifiers, which we evaluate in three experiments against link specifi-cations created manually. Our evaluation shows that we can compute linking configurations that achieve more than 90 % F-score by asking the user to verify at most twelve potential links.
Can we Create Better Links by Playing Games?
"... Abstract—Just like links are the backbone of the traditional World Wide Web, they are an equally important element in the Data Web. There exist a variety of automated tools, which are able to create a high number of links between RDF resources by using heuristics. However, without manual verificatio ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
(Show Context)
Abstract—Just like links are the backbone of the traditional World Wide Web, they are an equally important element in the Data Web. There exist a variety of automated tools, which are able to create a high number of links between RDF resources by using heuristics. However, without manual verification of the created links, it is difficult to ensure high precision and recall. In this article, we investigate whether game based approaches can be used to improve this manual verification stage. Based on the VeriLinks game platform, which we developed, we describe experiments using a survey and statistics collected within a specific interlinking game. Using three different link tasks as examples, we present an analysis of the strengths and limitations of game based link verification. I.
Improving Link Specifications using Context-Aware Information
"... ABSTRACT There is an increasing interest in publishing data using the Linked Open Data philosophy. To link the RDF datasets, a link discovery task is performed to generate owl:sameAs links. There are two ways to perform this task: by means of a classifier or a link specification; we focus in the la ..."
Abstract
- Add to MetaCart
(Show Context)
ABSTRACT There is an increasing interest in publishing data using the Linked Open Data philosophy. To link the RDF datasets, a link discovery task is performed to generate owl:sameAs links. There are two ways to perform this task: by means of a classifier or a link specification; we focus in the latter approach. Current link specification techniques only use the data properties of the instances that they are linking, and they do not take the context information into account. In this paper, we present a proposal that aims to generate context-aware link specifications to improve the regular link specifications, increasing the effectiveness of the results in several real-world scenarios where the context is crucial. Our context-aware link specifications are independent from similarity functions, transformations or aggregations. We have evaluated our proposal using two real-world scenarios in which we improve precision and recall with respect to regular link specifications in 23% and 58%, respectively.
Abstract
, 2012
"... As many cities around the world provide access to raw public data along the Open Data movement, many questions arise concerning the accessibility of these data. Various data formats, duplicate identifiers, heterogeneous metadata schema descriptions, and diverse means to access or query the data exis ..."
Abstract
- Add to MetaCart
(Show Context)
As many cities around the world provide access to raw public data along the Open Data movement, many questions arise concerning the accessibility of these data. Various data formats, duplicate identifiers, heterogeneous metadata schema descriptions, and diverse means to access or query the data exist. These factors make it difficult for consumers to reuse and integrate data sources to develop innovative applications. The Semantic Web provides a global solution to these problems by providing languages and protocols for describing and accessing datasets. This paper presents Datalift, a framework and a platform helping to lift raw data sources to semantic interlinked data sources.