Results 1 - 10
of
52
Linked Data -- The story so far
"... The term Linked Data refers to a set of best practices for publishing and connecting structured data on the Web. These best practices have been adopted by an increasing number of data providers over the last three years, leading to the creation of a global data space containing billions of assertion ..."
Abstract
-
Cited by 136 (7 self)
- Add to MetaCart
The term Linked Data refers to a set of best practices for publishing and connecting structured data on the Web. These best practices have been adopted by an increasing number of data providers over the last three years, leading to the creation of a global data space containing billions of assertions- the Web of Data. In this article we present the concept and technical principles of Linked Data, and situate these within the broader context of related technological developments. We describe progress to date in publishing Linked Data on the Web, review applications that have been developed to exploit the Web of Data, and map out a research agenda for the Linked Data community as it moves forward.
Data integration with uncertainties
- In Proc. of VLDB
, 2007
"... This paper reports our first set of results on managing uncertainty in data integration. We posit that data-integration systems need to handle uncertainty at three levels, and do so in a principled fashion. First, the semantic mappings between the data sources and the mediated schema may be approxim ..."
Abstract
-
Cited by 41 (2 self)
- Add to MetaCart
This paper reports our first set of results on managing uncertainty in data integration. We posit that data-integration systems need to handle uncertainty at three levels, and do so in a principled fashion. First, the semantic mappings between the data sources and the mediated schema may be approximate because there may be too many of them to be created and maintained or because in some domains (e.g., bioinformatics) it is not clear what the mappings should be. Second, queries to the system may be posed with keywords rather than in a structured form. Third, the data from the sources may be extracted using information extraction techniques and so may yield imprecise data. As a first step to building such a system, we introduce the concept of probabilistic schema mappings and analyze their formal foundations. We show that there are two possible semantics for such mappings: by-table semantics assumes that there exists a correct mapping but we don’t know what it is; by-tuple semantics assumes that the correct mapping may depend on the particular tuple in the source data. We present the query complexity and algorithms for answering queries in the presence of approximate schema mappings, and we describe an algorithm for efficiently computing the top-k answers to queries in such a setting. 1.
Managing Uncertainty and Vagueness in Description Logics for the Semantic Web
, 2007
"... Ontologies play a crucial role in the development of the Semantic Web as a means for defining shared terms in web resources. They are formulated in web ontology languages, which are based on expressive description logics. Significant research efforts in the semantic web community are recently direct ..."
Abstract
-
Cited by 25 (4 self)
- Add to MetaCart
Ontologies play a crucial role in the development of the Semantic Web as a means for defining shared terms in web resources. They are formulated in web ontology languages, which are based on expressive description logics. Significant research efforts in the semantic web community are recently directed towards representing and reasoning with uncertainty and vagueness in ontologies for the Semantic Web. In this paper, we give an overview of approaches in this context to managing probabilistic uncertainty, possibilistic uncertainty, and vagueness in expressive description logics for the Semantic Web.
Web-scale Data Integration: You Can Only Afford to Pay As You Go
- In Proc. of CIDR-07
, 2007
"... The World Wide Web is witnessing an increase in the amount of structured content – vast heterogeneous collections of structured data are on the rise due to the Deep Web, annotation schemes like Flickr, and sites like Google Base. While this phenomenon is creating an opportunity for structured data m ..."
Abstract
-
Cited by 23 (5 self)
- Add to MetaCart
The World Wide Web is witnessing an increase in the amount of structured content – vast heterogeneous collections of structured data are on the rise due to the Deep Web, annotation schemes like Flickr, and sites like Google Base. While this phenomenon is creating an opportunity for structured data management, dealing with heterogeneity on the web-scale presents many new challenges. In this paper, we highlight these challenges in two scenarios – the Deep Web and Google Base. We contend that traditional data integration techniques are no longer valid in the face of such heterogeneity and scale. We propose a new data integration architecture, PAYGO, which is inspired by the concept of dataspaces and emphasizes pay-as-you-go data management as means for achieving web-scale data integration. 1.
Beauty and the beast: The theory and practice of information integration
- In ICDT
, 2007
"... Abstract. Information integration is becoming a critical problem for businesses and individuals alike. Data volumes are sky-rocketing, and new sources and types of information are proliferating. This paper briefly reviews some of the key research accomplishments in information integration (theory an ..."
Abstract
-
Cited by 16 (1 self)
- Add to MetaCart
Abstract. Information integration is becoming a critical problem for businesses and individuals alike. Data volumes are sky-rocketing, and new sources and types of information are proliferating. This paper briefly reviews some of the key research accomplishments in information integration (theory and systems), then describes the current state-of-the-art in commercial practice, and the challenges (still) faced by CIOs and application developers. One critical challenge is choosing the right combination of tools and technologies to do the integration. Although each has been studied separately, we lack a unified (and certainly, a unifying) understanding of these various approaches to integration. Experience with a variety of integration projects suggests that we need a broader framework, perhaps even a theory, which explicitly takes into account requirements on the result of the integration, and considers the entire end-to-end integration process.
Structured Data Meets the Web: A Few Observations
- IEEE Data Eng. Bull
"... The World Wide Web is witnessing an increase in the amount of structured content – vast heterogeneous collections of structured data are on the rise due to the Deep Web, annotation schemes like Flickr, and sites like Google Base. While this phenomenon is creating an opportunity for structured data m ..."
Abstract
-
Cited by 16 (4 self)
- Add to MetaCart
The World Wide Web is witnessing an increase in the amount of structured content – vast heterogeneous collections of structured data are on the rise due to the Deep Web, annotation schemes like Flickr, and sites like Google Base. While this phenomenon is creating an opportunity for structured data management, dealing with heterogeneity on the web-scale presents many new challenges. In this paper we articulate challenges based on our experience with addressing them at Google, and offer some principles for addressing them in a general fashion. 1
Indexing Dataspaces
, 2007
"... Dataspaces are collections of heterogeneous and partially unstructured data. Unlike data-integration systems that also offer uniform access to heterogeneous data sources, dataspaces do not assume that all the semantic relationships between sources are known and specified. Much of the user interactio ..."
Abstract
-
Cited by 14 (1 self)
- Add to MetaCart
Dataspaces are collections of heterogeneous and partially unstructured data. Unlike data-integration systems that also offer uniform access to heterogeneous data sources, dataspaces do not assume that all the semantic relationships between sources are known and specified. Much of the user interaction with dataspaces involves exploring the data, and users do not have a single schema to which they can pose queries. Consequently, it is important that queries are allowed to specify varying degrees of structure, spanning keyword queries to more structure-aware queries. This paper considers indexing support for queries that combine keywords and structure. We describe several extensions to inverted lists to capture structure when it is present. In particular, our extensions incorporate attribute labels, relationships between data items, hierarchies of schema elements, and synonyms among schema elements. We describe experiments showing that our indexing techniques improve query efficiency by an order of magnitude compared with alternative approaches, and scale well with the size of the data.
HomeViews: Peer-to-Peer Middleware for Personal Data Sharing Applications
- IN PROC. OF SIGMOD INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA
, 2007
"... This paper presents HomeViews, a peer-to-peer middleware system for building personal data management applications. HomeViews provides abstractions and services for data organization and distributed data sharing. The key innovation in HomeViews is the integration of three concepts: views and queries ..."
Abstract
-
Cited by 14 (0 self)
- Add to MetaCart
This paper presents HomeViews, a peer-to-peer middleware system for building personal data management applications. HomeViews provides abstractions and services for data organization and distributed data sharing. The key innovation in HomeViews is the integration of three concepts: views and queries from databases, a capability-based protection model from operating systems, and a peer-to-peer distributed architecture. Using HomeViews, applications can (1) create views to organize files into dynamic collections, (2) share these views in a protected way across the Internet through simple exchange of capabilities, and (3) transparently integrate remote views and data into a user’s local organizational structures. HomeViews operates in a purely peer-topeer fashion, without the need for account administration or centralized data and protection management inherent in typical data-sharing systems. We have prototyped HomeViews, deployed it on a small network of Linux machines, and used it to develop two distributed data-sharing applications: a peer-to-peer version of the Gallery photo-sharing application and a simple readonly shared file system. Using measurements, we demonstrate the practicality and performance of our approach.
Answering structured queries on unstructured data
- In WebDB
, 2006
"... There is growing number of applications that require access to both structured and unstructured data. Such collections of data have been referred to as dataspaces, and Dataspace Support Platforms (DSSPs) were proposed to offer several services over dataspaces, including search and query, source disc ..."
Abstract
-
Cited by 13 (0 self)
- Add to MetaCart
There is growing number of applications that require access to both structured and unstructured data. Such collections of data have been referred to as dataspaces, and Dataspace Support Platforms (DSSPs) were proposed to offer several services over dataspaces, including search and query, source discovery and categorization, indexing and some forms of recovery. One of the key services of a DSSP is to provide seamless querying on the structured and unstructured data. Querying each kind of data in isolation has been the main subject of study for the fields of databases and information retrieval. Recently the database community has studied the problem of answering keyword queries on structured data such as relational data or XML data. The only combination that has not been fully explored is answering structured queries on unstructured data. This paper explores an approach in which we carefully construct a keyword query from a given structured query, and submit the query to the underlying engine (e.g., a web-search engine) for querying unstructured data. We take the first step towards extracting keywords from structured queries even without domain knowledge and propose several directions we can explore to improve keyword extraction when domain knowledge exists. The experimental results show that our algorithm works fairly well for a large number of datasets from various domains. 1.

