• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations

Probabilistic Entity Linkage for Heterogeneous Information Spaces (2008)

by E Ioannou, C Niederée, W Nejdl
Add To MetaCart

Tools

Sorted by:
Results 1 - 8 of 8

idMesh: Graph-Based Disambiguation of Linked Data

by Parisa Haghani, Michael Jost, Karl Aberer, Hermann De Meer, U. Passau Germany - In WWW , 2009
"... passau.de We tackle the problem of disambiguating entities on the Web. We propose a user-driven scheme where graphs of entities – repre-sented by globally identifiable declarative artifacts – self-organize in a dynamic and probabilistic manner. Our solution has the fol-lowing two desirable propertie ..."
Abstract - Cited by 22 (2 self) - Add to MetaCart
passau.de We tackle the problem of disambiguating entities on the Web. We propose a user-driven scheme where graphs of entities – repre-sented by globally identifiable declarative artifacts – self-organize in a dynamic and probabilistic manner. Our solution has the fol-lowing two desirable properties: i) it lets end-users freely define associations between arbitrary entities and ii) it probabilistically in-fers entity relationships based on uncertain links using constraint-satisfaction mechanisms. We outline the interface between our scheme and the current data Web, and show how higher-layer ap-plications can take advantage of our approach to enhance search and update of information relating to online entities. We describe a decentralized infrastructure supporting efficient and scalable entity disambiguation and demonstrate the practicability of our approach in a deployment over several hundreds of machines.
(Show Context)

Citation Context

...sic-related data sets on the Web. Their method takes into account both the similarities of the web resources using literal matching as well as the similarity of neighboring resources. Ioannou et. al. =-=[15]-=- suggest the use of Bayesian networks to disambiguate entities based on related metadata. Jaffri et al. [16] recently investigated entity disambiguation in two popular portals (DBLP and DBpedia) and f...

Efficient Entity Resolution for Large Heterogeneous Information Spaces

by George Papadakis, Ekaterini Ioannou, Claudia Niederée, Peter Fankhauser
"... We have recently witnessed an enormous growth in the volume of structured and semi-structured data sets available on the Web. An important prerequisite for using and combining such data sets is the detection and merge of information that describes the same real-world entities, a task known as Entity ..."
Abstract - Cited by 8 (2 self) - Add to MetaCart
We have recently witnessed an enormous growth in the volume of structured and semi-structured data sets available on the Web. An important prerequisite for using and combining such data sets is the detection and merge of information that describes the same real-world entities, a task known as Entity Resolution. To make this quadratic task efficient, blocking techniques are typically employed. However, the high dynamics, loose schema binding, and heterogeneity of (semi-)structured data, impose new challenges to entity resolution. Existing blocking approaches become inapplicable because they rely on the homogeneity of the considered data and a-priory known schemata. In this paper, we introduce a novel approach for entity resolution, scaling it up for large, noisy, and heterogeneous information spaces. It combines an attribute-agnostic mechanism for building blocks with intelligent block processing techniques that boost blocks with high expected utility, propagate knowledge about identified matches, and preempt the resolution process when it gets too expensive. Our extensive evaluation on real-world, large, heterogeneous data sets verifies that the suggested approach is both effective and efficient.
(Show Context)

Citation Context

...ifying data that describe the same real-world entities. Suggested methods span from string similarity metrics [3], to similarity methods using transformations [20, 21], and relationships between data =-=[6, 12, 14]-=-. A complete overview of the existing work in this domain can be found in [5, 7, 15]. As noted in [7], the prevalent method for enhancing the efficiency of the resolution process is data blocking. Rel...

Efficient Semantic-Aware Detection of Near Duplicate Resources

by Ekaterini Ioannou, Odysseas Papapetrou, Dimitrios Skoutas, Wolfgang Nejdl
"... Abstract. Efficiently detecting near duplicate resources is an important task when integrating information from various sources and applications. Once detected, near duplicate resources can be grouped together, merged, or removed, in order to avoid repetition and redundancy, and to increase the dive ..."
Abstract - Cited by 4 (0 self) - Add to MetaCart
Abstract. Efficiently detecting near duplicate resources is an important task when integrating information from various sources and applications. Once detected, near duplicate resources can be grouped together, merged, or removed, in order to avoid repetition and redundancy, and to increase the diversity in the information provided to the user. In this paper, we introduce an approach for efficient semantic-aware near duplicate detection, by combining an indexing scheme for similarity search with the RDF representations of the resources. We provide a probabilistic analysis for the correctness of the suggested approach, which allows applications to configure it for satisfying their specific quality requirements. Our experimental evaluation on the RDF descriptions of real-world news articles from various news agencies demonstrates the efficiency and effectiveness of our approach. Key words: near duplicate detection, data integration 1
(Show Context)

Citation Context

... information in a complex information space. A modified version of this algorithm [1] has also been used for detecting conflict of interests in paper reviewing processes. Probabilistic Entity Linkage =-=[11]-=- constructs a bayesian network from the possible duplicates, and it then uses probabilistic inference for computing their similarity. Other approaches introduced clustering using relationships [3, 4],...

Finding Experts on the Semantic Desktop

by Gianluca Demartini, Claudia Niederée
"... Abstract. Expert retrieval has attracted deep attention because of the huge economical impact it can have on enterprises. The classical dataset on which to perform this task is company intranet (i.e., personal pages, e-mails, documents). We propose a new system for finding experts in the user’s desk ..."
Abstract - Cited by 1 (0 self) - Add to MetaCart
Abstract. Expert retrieval has attracted deep attention because of the huge economical impact it can have on enterprises. The classical dataset on which to perform this task is company intranet (i.e., personal pages, e-mails, documents). We propose a new system for finding experts in the user’s desktop content. Looking at private documents and e-mails of the user, the system builds expert profiles for all the people named in the desktop. This allows the search system to focus on the user’s topics of interest thus generating satisfactory results on topics well represented on the desktop. We show, with an artificial test collection, how the desktop content is appropriate for finding experts on the topic the user is interested in. 1
(Show Context)

Citation Context

... author field as well as e-mails in which her e-mail address appears as part of the sender or receiver fields. This is obtained linking together the objects that refer to the same real world entities =-=[20]-=-. The expert search system we propose can leverage on the extracted relations between documents and people as well as on the linkage information between different representations (e.g., surname and e-...

From Web Data to Entities and Back

by Zoltán Miklós, Nicolas Bonvin, Paolo Bouquet, Michele Catasta, Peter Fankhauser, Julien Gaugaz, Ekaterini Ioannou, Antonio Maña, Claudia Niederée, Themis Palpanas
"... Abstract. We present the Entity Name System (ENS), an enabling infrastructure, which can host descriptions of named entities and provide unique identifiers, on large-scale. In this way, it opens new perspectives to realize entity-oriented, rather than keyword-oriented, Web information systems. We de ..."
Abstract - Add to MetaCart
Abstract. We present the Entity Name System (ENS), an enabling infrastructure, which can host descriptions of named entities and provide unique identifiers, on large-scale. In this way, it opens new perspectives to realize entity-oriented, rather than keyword-oriented, Web information systems. We describe the architecture and the functionality of the ENS, along with tools, which all contribute to realize the Web of entities.
(Show Context)

Citation Context

...the matching of the requested entity with the entity descriptions available in the repository of the ENS. At first sight, the entity search task has a strong similarity with entity linkage techniques =-=[14]-=-, also known as data matching [4, 9], deduplication [27], resolution [3], merge-purge [13], entity identification [21], and reference reconciliation [10]. Entity Linkage is the process that decides wh...

Entity-Aware Query Processing for Heterogeneous Data with Uncertainty and Correlations

by Ekaterini Ioannou
"... Many modern systems rely on rich heterogeneous data that has been integrated from a variety of different applications and sources. To successfully perform their tasks, these systems require to know which data refer to the same real-world entities, such as locations, people, or movies. My work focuse ..."
Abstract - Add to MetaCart
Many modern systems rely on rich heterogeneous data that has been integrated from a variety of different applications and sources. To successfully perform their tasks, these systems require to know which data refer to the same real-world entities, such as locations, people, or movies. My work focuses on addressing this requirement through a new approach for entity-aware query processing over heterogeneous data. Data provided for integration is processed to generate the possible entities and linkages between these entities. This information is never merged with the original data, but used during query processing to provide entity-aware results that reflect the real-world entities existing in the current data. Special emphasis is given to the effective management of uncertainty and correlations that either exist in the original data, or are generated by data matching techniques. Advisor: Prof. Dr. Wolfgang Nejdl 1.
(Show Context)

Citation Context

...llowing paragraphs present in more details the topics that I will concentrate on during this thesis. 4.1 Probabilistic Entity Linkage We already worked on a new probabilistic entity linkage algorithm =-=[25]-=-. Our goal was addressing the matching problem as this appears when integrating heterogeneous data from Personal Information Management (PIM). PIM systems, such as NEPOMUK 3 and Haystack 4 , integrate...

Leveraging PersonalMetadata for Desktop Search: The Beagle++ System

by Enrico Minack A, Raluca Paiu A, Stefania Costache A, Gianluca Demartini A, Julien Gaugaz A, Ekaterini Ioannou A
"... Search on PCs has become less efficient than searching the Web due to the increasing amount of stored data. In this paper we present an innovative Desktop search solution, which relies on extracted metadata, context information as well as additional background information for improving Desktop searc ..."
Abstract - Add to MetaCart
Search on PCs has become less efficient than searching the Web due to the increasing amount of stored data. In this paper we present an innovative Desktop search solution, which relies on extracted metadata, context information as well as additional background information for improving Desktop search results. We also present a practical application of this approach — the extensible Beagle++ toolbox. To prove the validity of our approach, we conducted a series of experiments. By comparing our results against the ones of a regular Desktop search solution — Beagle — we show an improved quality in search and overall performance.
(Show Context)

Citation Context

...he information on the metadata level. The following paragraphs explain how this algorithm works within Beagle++. A more detailed description of the algorithm and experiments performed is available in =-=[25]-=-. Consider searching with Beagle++ for resources related to person “Steven Kean”. Searching using his surname retrieves publications in which his surname appears as part of an author field. Searching ...

User Data Discovery and Aggregation: the CS-UDD Algorithm

by unknown authors
"... and other research outputs User data discovery and aggregation: the CS-UDD al-gorithm Journal Article ..."
Abstract - Add to MetaCart
and other research outputs User data discovery and aggregation: the CS-UDD al-gorithm Journal Article
(Show Context)

Citation Context

.... In the cross-linking step of the algorithm, we check whether theresare links between the Pr profiles retrieved on different OSNs. The concept of cross-linking is similar to that of entitylinkage in =-=[29]-=-.sThe previous Section explained how the algorithm estimates the degree of matching between each pair of profiless(Pi,Prj) by computing their MatchScore, as specified in formula (1). Up to now, we hav...

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University