• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations

LIMES: A time-efficient approach for large-scale Link Discovery on the Web of Data. In (2011)

by A-C N Ngomo, S Auer
Venue:IJCAI,
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 48
Next 10 →

LODStats – An Extensible Framework for High-performance Dataset Analytics

by Sören Auer, Jan Demter, Michael Martin, Jens Lehmann
"... Abstract. One of the major obstacles for a wider usage of web data is the difficulty to obtain a clear picture of the available datasets. In order to reuse, link, revise or query a dataset published on the Web it is important to know the structure, coverage and coherence of the data. In order to obt ..."
Abstract - Cited by 43 (2 self) - Add to MetaCart
Abstract. One of the major obstacles for a wider usage of web data is the difficulty to obtain a clear picture of the available datasets. In order to reuse, link, revise or query a dataset published on the Web it is important to know the structure, coverage and coherence of the data. In order to obtain such information we developed LODStats – a statement-stream-based approach for gathering comprehensive statistics about datasets adhering to the Resource Description Framework (RDF). LODStats is based on the declarative description of statistical dataset characteristics. Its main advantages over other approaches are a smaller memory footprint and significantly better performance and scalability. We integrated LODStats with the CKAN dataset metadata registry and obtained a comprehensive picture of the current state of a significant part of the Data Web. 1
(Show Context)

Citation Context

... a fundamental requirement for many Linked Data applications (e.g. data integration and fusion). Meanwhile, there are a number of tools available which support the automatic generation of links (e.g. =-=[11,10]-=-). An obstacle for the broad use of these tools is, however, the difficulty to identify suitable link targets on the Data Web. By attaching proper statistics about the internal structure of a dataset ...

Efficient Multidimensional Blocking for Link Discovery without losing Recall

by Robert Isele, Anja Jentzsch, Christian Bizer
"... Over the last three years, an increasing number of data providers have started to publish structured data according to the Linked Data principles on the Web. The resulting Web of Data currently consists of over 28 billion RDF triples. As the Web of Data grows, there is an increasing need for link di ..."
Abstract - Cited by 22 (0 self) - Add to MetaCart
Over the last three years, an increasing number of data providers have started to publish structured data according to the Linked Data principles on the Web. The resulting Web of Data currently consists of over 28 billion RDF triples. As the Web of Data grows, there is an increasing need for link discovery tools which scale to very large datasets. In record linkage, many partitioning methods have been proposed which substantially reduce the number of required entity comparisons. Unfortunately, most of these methods either lead to a decrease in recall or only work on metric spaces. We propose a novel blocking method called Multi-Block which uses a multidimensional index in which similar objects are located near each other. In each dimension the entities are indexed by a different property increasing the efficiency of the index significantly. In addition, it guarantees that no false dismissals can occur. Our approach works on complex link specifications which aggregate several different similarity measures. MultiBlock has been implemented as part of the Silk Link Discovery Framework. The evaluation shows a speedup factor of several 100 for large datasets compared to the full evaluation without losing recall.
(Show Context)

Citation Context

...ason that its data sources are connected by RDF links [2]. While there are some fully automatic tools for link discovery [6], most tools generate links semi-automatically based on link specifications =-=[20, 16, 11]-=-. Link specifications specify the conditions which must hold true for a pair of entities for the link discovery tool to generate a RDF link between them. Based on a link specification, the link discov...

RAVEN – Active Learning of Link Specifications

by Axel-cyrille Ngonga Ngomo, Jens Lehmann, Sören Auer, Konrad Höffner
"... Abstract. With the growth of the Linked Data Web, time-efficient approaches for computing links between data sources have become indispensable. Yet, in many cases, determining the right specification for a link discovery problem is a tedious task that must still be carried out manually. We present R ..."
Abstract - Cited by 16 (5 self) - Add to MetaCart
Abstract. With the growth of the Linked Data Web, time-efficient approaches for computing links between data sources have become indispensable. Yet, in many cases, determining the right specification for a link discovery problem is a tedious task that must still be carried out manually. We present RAVEN, an approach for the semi-automatic determination of link specifications. Our approach is based on the combination of stable solutions of matching problems and active learning with the time-efficient link discovery framework LIMES. RAVEN aims at requiring a small number of interactions with the user to generate classifiers of high accuracy. We focus on using RAVEN to compute and configure boolean and weighted classifiers, which we evaluate in three experiments against link specifications created manually. Our evaluation shows that we can compute linking configurations that achieve more than 90 % F-score by asking the user to verify at most twelve potential links.
(Show Context)

Citation Context

...erlinked data sources [2]. One of the key challenges that arise when trying to discover links between two data sources lies in the specification of an appropriate configuration for the tool of choice =-=[10]-=-. Such a specification usually consists of a set of restrictions on the source and target knowledge base, a list of properties of the source and target knowledge base to use for similarity detection, ...

Assessing linked data mappings using network measures

by Paul Groth, Claus Stadler, Jens Lehmann - In ESWC , 2012
"... Abstract. Linked Data is at its core about the setting of links between resources. Links provide enriched semantics, pointers to extra informa-tion and enable the merging of data sets. However, as the amount of Linked Data has grown, there has been the need to automate the cre-ation of links and suc ..."
Abstract - Cited by 15 (4 self) - Add to MetaCart
Abstract. Linked Data is at its core about the setting of links between resources. Links provide enriched semantics, pointers to extra informa-tion and enable the merging of data sets. However, as the amount of Linked Data has grown, there has been the need to automate the cre-ation of links and such automated approaches can create low-quality links or unsuitable network structures. In particular, it is difficult to know whether the links introduced improve or diminish the quality of Linked Data. In this paper, we present LINK-QA, an extensible framework that allows for the assessment of Linked Data mappings using network met-rics. We test five metrics using this framework on a set of known good and bad links generated by a common mapping system, and show the behaviour of those metrics.
(Show Context)

Citation Context

... the schema level [6]. The Silk Link discovery framework [23] offers a more versatile approach allowing configurable decisions on semantic relationships between two entities. More recently, the LIMES =-=[17]-=- framework offers an efficient implementation of similar functionality. Driven by those approaches, there has been increasing interest in new ways to measure the quality of automated links. For exampl...

Publishing Statistical Data on the Web

by Percy E. Rivera Salas, Michael Martin, O Maia Da Mota, Sören Auer, Karin Breitman, Marco A. Casanova
"... Abstract—Statistical data is one of the most important sources of information, relevant for large numbers of stakeholders in the governmental, scientific and business domains alike. In this article, we overview how statistical data can be managed on the Web. With OLAP2DataCube and CSV2DataCube we pr ..."
Abstract - Cited by 11 (1 self) - Add to MetaCart
Abstract—Statistical data is one of the most important sources of information, relevant for large numbers of stakeholders in the governmental, scientific and business domains alike. In this article, we overview how statistical data can be managed on the Web. With OLAP2DataCube and CSV2DataCube we present two complementary approaches on how to extract and publish statistical data. We also discuss the linking, repair and the visualization of statistical data. As a comprehensive use case, we report on the extraction and publishing on the Web of statistical data describing 10 years of life in Brazil. I.
(Show Context)

Citation Context

...geneous and distributed data sources is one of the fundamental features of the Web of Data. In this section we describe the application of existing general purpose link discovery tools (such as LIMES =-=[19]-=- or SILK [22]) for linking of statistical data. Interlinking various statistical dimensions (such as municipalities and states with DBpedia and GeoNames) facilitates the unforeseen integration of inde...

When to Reach for the Cloud: Using Parallel Hardware for Link Discovery

by Axel-cyrille Ngonga Ngomo, Lars Kolb, Norman Heino, Michael Hartung, Erhard Rahm
"... Abstract. With the ever-growing amount of RDF data available across the Web, the discovery of links between datasets and deduplication of resources within knowledge bases have become tasks of crucial importance. Over the last years, several link discovery approaches have been developed to tackle the ..."
Abstract - Cited by 3 (3 self) - Add to MetaCart
Abstract. With the ever-growing amount of RDF data available across the Web, the discovery of links between datasets and deduplication of resources within knowledge bases have become tasks of crucial importance. Over the last years, several link discovery approaches have been developed to tackle the runtime and complexity problems that are intrinsic to link discovery. Yet, so far, little attention has been paid to the management of hardware resources for the execution of link discovery tasks. This paper addresses this research gap by investigating the efficient use of hardware resources for link discovery. We implement the HR 3 approach for three different parallel processing paradigms including the use of GPUs and MapReduce platforms. We also perform a thorough performance comparison for these implementations. Our results show that certain tasks that appear to require cloud computing techniques can actually be accomplished using standard parallel hardware. Moreover, our evaluation provides break-even points that can serve as guidelines for deciding on when to use which hardware for link discovery.

Composition Methods for Link Discovery

by Michael Hartung, Anika Groß, Erhard Rahm
"... Abstract: The Linked Open Data community publishes an increasing number of data sources on the so-called Data Web and interlinks them to support data integration applications. We investigate how the composition of existing links and mappings can help discovering new links and mappings between LOD so ..."
Abstract - Cited by 3 (1 self) - Add to MetaCart
Abstract: The Linked Open Data community publishes an increasing number of data sources on the so-called Data Web and interlinks them to support data integration applications. We investigate how the composition of existing links and mappings can help discovering new links and mappings between LOD sources. Often there will be many alternatives for composition so that the problem arises which paths can provide the best linking results with the least computation effort. We therefore investigate different methods to select and combine the most suitable mapping paths. We also propose an approach for selecting and composing individual links instead of entire mappings. We comparatively evaluate the methods on several real-world linking problems from the LOD cloud. The results show the high value of reusing and composing existing links as well as the high effectiveness of our methods. 1

A Machine Learning Approach for Instance Matching Based on Similarity Metrics

by Shu Rong , Xing Niu , Evan Wei Xiang , Haofen Wang , Qiang Yang , Yong Yu - Proceedings of 11th International Semantic Web Conference (ISWC , 2012
"... Abstract. The Linking Open Data (LOD) project is an ongoing effort to construct a global data space, i.e. the Web of Data. One important part of this project is to establish owl:sameAs links among structured data sources. Such links indicate equivalent instances that refer to the same real-world ob ..."
Abstract - Cited by 3 (0 self) - Add to MetaCart
Abstract. The Linking Open Data (LOD) project is an ongoing effort to construct a global data space, i.e. the Web of Data. One important part of this project is to establish owl:sameAs links among structured data sources. Such links indicate equivalent instances that refer to the same real-world object. The problem of discovering owl:sameAs links between pairwise data sources is called instance matching. Most of the existing approaches addressing this problem rely on the quality of prior schema matching, which is not always good enough in the LOD scenario. In this paper, we propose a schema-independent instance-pair similarity metric based on several general descriptive features. We transform the instance matching problem to the binary classification problem and solve it by machine learning algorithms. Furthermore, we employ some transfer learning methods to utilize the existing owl:sameAs links in LOD to reduce the demand for labeled data. We carry out experiments on some datasets of OAEI2010. The results show that our method performs well on real-world LOD data and outperforms the participants of OAEI2010.
(Show Context)

Citation Context

...ctional properties (IFP). Such properties are not sufficient in LOD, so [15] tries to find more IFPs with a statistical method. ObjectCoref[17] employs a self-learning framework to iteratively find the discriminative propertyvalue pairs for instance matching, which are lax IFPs. RAVEN[22] applies active learning techniques for instance matching. Both ObjectCoref and RAVEN match the properties from different data sources by measuring value similarities. Similar ideas are proposed in the domain of schema matching [26]. Finally, some papers focus on improving the efficiency of instance matching. [21] limits the number of candidate instance pairs to match based on the triangle equation. [30], [23] and [18] generate candidates by indexing some key words of instances. This kind of method can be applied to optimize ours. 7 Conclusion and Feature Work In this paper, we presented a property matching independent approach for instance matching. We transformed the problem of instance matching to a classification problem, by designing a novel feature vector of high-level similarity metrics. Suitable learning models were selected according to the feature space. Our experimental results on the datase...

Utilizing Domain-Specific Keywords for Discovering Public SPARQL Endpoints: A Life-Sciences Use-Case

by Muntazir Mehdi, Aftab Iqbal, Ali Hasnain, Yasar Khan, Stefan Decker, Ratnesh Sahay - In ACM SAC (SWA track , 2014
"... The LOD cloud comprises of billions of facts covering hun-dreds of datasets. In accordance with the Linked Data prin-ciples, these datasets are connected by a variety of typed links, forming an interlinked “Web of Data”. The growing diversity of the Web of Data makes it more and more chal-lenging fo ..."
Abstract - Cited by 3 (1 self) - Add to MetaCart
The LOD cloud comprises of billions of facts covering hun-dreds of datasets. In accordance with the Linked Data prin-ciples, these datasets are connected by a variety of typed links, forming an interlinked “Web of Data”. The growing diversity of the Web of Data makes it more and more chal-lenging for publishers to find relevant datasets that could be linked to, particularly in specialist domain-specific settings. This paper thus proposes a baseline method to automati-cally identify a list of public SPARQL endpoints whose con-tent is deemed relevant to a local dataset based on queries generated from a local set of domain-specific keywords.
(Show Context)

Citation Context

...tory Rate of Spirometry Medical History Musculoskeletal But creating links is a challenging task for publishers. Addressing this challenge, a number of linking frameworks, such as Silk [12] and LIMES =-=[10]-=-, have been proposed to help publishers link their local datasets to a remote LOD dataset through a specified SPARQL endpoint. However, given that there are now hundreds of public SPARQL endpoints, an...

Link discovery with guaranteed reduction ratio in affine spaces with minkowski measures.

by A-C Ngonga Ngomo - In Proceedings of ISWC, , 2012
"... ..."
Abstract - Cited by 2 (1 self) - Add to MetaCart
Abstract not found
Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University