• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Tools

Sorted by:
Try your query at:
Semantic Scholar Scholar Academic
Google Bing DBLP
Results 1 - 10 of 351
Next 10 →

Massive Semantic Web data compression with MapReduce

by Jacopo Urbani, Jason Maassen, Henri Bal
"... The Semantic Web consists of many billions of statements made of terms that are either URIs or literals. Since these terms usually consist of long sequences of characters, an effective compression technique must be used to reduce the data size and increase the application performance. One of the bes ..."
Abstract - Cited by 17 (1 self) - Add to MetaCart
of the best known techniques for data compression is dictionary encoding. In this paper we propose a MapReduce algorithm that efficiently compresses and decompresses a large amount of Semantic Web data. We have implemented a prototype using the Hadoop framework and we report an evaluation of the performance

COMA - A system for flexible combination of Schema Matching Approaches

by Hong-hai Do, Erhard Rahm - In VLDB , 2002
"... Schema matching is the task of finding semantic correspondences between elements of two schemas. It is needed in many database applications, such as integration of web data sources, data warehouse loading and XML message mapping. To reduce the amount of user effort as much as possible, automati ..."
Abstract - Cited by 443 (12 self) - Add to MetaCart
Schema matching is the task of finding semantic correspondences between elements of two schemas. It is needed in many database applications, such as integration of web data sources, data warehouse loading and XML message mapping. To reduce the amount of user effort as much as possible

Data Cube Materialization and Mining over MapReduce

by Arnab N, Cong Yu, Philip Bohannon, Raghu Ramakrishnan
"... Abstract—Computing interesting measures for data cubes and subsequent mining of interesting cube groups over massive datasets are critical for many important analyses done in the real world. Previous studies have focused on algebraic measures such asSUM that are amenable to parallel computation and ..."
Abstract - Cited by 1 (0 self) - Add to MetaCart
and can easily benefit from the recent advancement of parallel computing infrastructure such as MapReduce. Dealing with holistic measures such as TOP-K, however, is non-trivial. In this paper we detail real-world challenges in cube materialization and mining tasks on Web-scale datasets. Specifically, we

SCOPE: parallel databases meet MapReduce

by Jingren Zhou, Nicolas Bruno, Ming-chuan Wu, Per-ake Larson, Ronnie Chaiken, Darren Shakib - THE VLDB JOURNAL , 2012
"... Companies providing cloud-scale data services have increasing needs to store and analyze massive data sets, such as search logs, click streams, and web graph data. For cost and performance reasons, processing is typically done on large clusters of tens of thousands of commodity machines. Such massi ..."
Abstract - Cited by 15 (4 self) - Add to MetaCart
computation system, Structured Computations Optimized for Parallel Execution (Scope), targeted for this type of massive data analysis. Scope combines benefits from both traditional parallel databases and MapReduce execution engines to allow easy programmability and deliver massive scalability and high

Towards Large Scale Semantic Annotation Built on MapReduce Architecture *

by Michal Laclavík, Martin Šeleng, Ladislav Hluchý
"... Abstract. Automated annotation of the web documents is a key challenge of the Semantic Web effort. Web documents are structured but their structure is understandable only for a human that is the major problem of the Semantic Web. Semantic Web can be exploited only if metadata understood by a compute ..."
Abstract - Cited by 5 (2 self) - Add to MetaCart
. In this paper we present how a pattern based annotation tool can benefit from Google’s MapReduce architecture to process large amount of text data.

Relaxed Synchronization and Eager Scheduling in MapReduce

by Karthik Kambatla, Naresh Rapolu, Suresh Jagannathan, Ananth Grama
"... MapReduce has emerged as a commonly-used programming model for large-scale distributed environments. While the underlying programming model based on maps and reductions has been shown to be effective in specific domains, significant questions relating to performance and application scope remain unre ..."
Abstract - Cited by 2 (1 self) - Add to MetaCart
unresolved. This paper targets key questions of performance through relaxed semantics of underlying map and reduce constructs in iterative MapReduce applications. Specifically, it investigates the notion of partial synchronizations combined with eager scheduling to overcome global synchronization overheads

A Case for MapReduce over the Internet

by Hrishikesh Gadre , Ivan Rodero , Javier Diaz-Montes , Manish Parashar
"... ABSTRACT In recent years, MapReduce programming model and specifically its open source implementation Hadoop has been widely used by organizations to perform large-scale data processing tasks such as web-indexing, data mining as well as scientific simulations. The key benefits of this programming m ..."
Abstract - Add to MetaCart
ABSTRACT In recent years, MapReduce programming model and specifically its open source implementation Hadoop has been widely used by organizations to perform large-scale data processing tasks such as web-indexing, data mining as well as scientific simulations. The key benefits of this programming

Toward Scalable Reasoning over Annotated RDF Data Using MapReduce

by Chang Liu, Guilin Qi
"... The Resource Description Framework (RDF) is one of the major representation standards for the Semantic Web. RDF Schema (RDFS) is used to describe vocabularies used in RDF descriptions. Recently, there is an increasing interest to express additional information on ..."
Abstract - Add to MetaCart
The Resource Description Framework (RDF) is one of the major representation standards for the Semantic Web. RDF Schema (RDFS) is used to describe vocabularies used in RDF descriptions. Recently, there is an increasing interest to express additional information on

Storage and Retrieval of Large RDF Graph Using Hadoop and MapReduce

by Mohammad Farhan , Husain , Pankil Doshi , Latifur Khan , Bhavani Thuraisingham
"... Abstract. Handling huge amount of data scalably is a matter of concern for a long time. Same is true for semantic web data. Current semantic web frameworks lack this ability. In this paper, we describe a framework that we built using Hadoop 1 to store and retrieve large number of RDF 2 triples. We ..."
Abstract - Add to MetaCart
describe our schema to store RDF data in Hadoop Distribute File System. We also present our algorithms to answer a SPARQL 3 query. We make use of Hadoop's MapReduce framework to actually answer the queries. Our results reveal that we can store huge amount of semantic web data in Hadoop clusters built

An Iterative MapReduce Approach to Frequent Subgraph Mining in Biological Datasets

by Steven Hill, Bismita Srichandan
"... Mining frequent subgraphs has attracted a great deal of at-tention in many areas, such as bioinformatics, web data min-ing and social networks. There are many promising main memory-based techniques available in this area, but they lack scalability as the main memory is a bottleneck. Tak-ing the mass ..."
Abstract - Cited by 2 (0 self) - Add to MetaCart
-ing the massive data into consideration, traditional database systems like relational databases and object databases fail miserably with respect to efficiency as frequent subgraph mining is computationally intensive. With the advent of the MapReduce framework by Google, a few researchers have applied the MapReduce
Next 10 →
Results 1 - 10 of 351
Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University