• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Tools

Sorted by:
Try your query at:
Semantic Scholar Scholar Academic
Google Bing DBLP
Results 1 - 10 of 1,536
Next 10 →

Pig Latin: A Not-So-Foreign Language for Data Processing

by Christopher Olston, Benjamin Reed, Utkarsh Srivastava, Ravi Kumar, Andrew Tomkins
"... There is a growing need for ad-hoc analysis of extremely large data sets, especially at internet companies where innovation critically depends on being able to analyze terabytes of data collected every day. Parallel database products, e.g., Teradata, offer a solution, but are usually prohibitively e ..."
Abstract - Cited by 607 (13 self) - Add to MetaCart
, is evidence of the above. However, the map-reduce paradigm is too low-level and rigid, and leads to a great deal of custom user code that is hard to maintain, and reuse. We describe a new language called Pig Latin that we have designed to fit in a sweet spot between the declarative style of SQL, and the low

The Potential for Using Thread-Level Data Speculation to Facilitate Automatic Parallelization

by J. Gregory Steffan , Todd C. Mowry - HPCA-4 , 1998
"... As we look to the future, and the prospect of a billion transistors on a chip, it seems inevitable that microprocessors will exploit having multiple parallel threads. To achieve the full potential of these "single-chip multiprocessors," however, we must find a way to parallelize non-numeri ..."
Abstract - Cited by 256 (9 self) - Add to MetaCart
-numeric applications. Unfortunately, compilers have had little success in parallelizing non-numeric codes due to their complex access patterns. This paper explores the potential for using thread-level data speculation (TLDS) to overcome this limitation by allowing the compiler to view parallelization solely as a cost

Behavioral Simulations in MapReduce

by Guozhang Wang, Marcos Vaz Salles, Benjamin Sowell, Xun Wang, Tuan Cao, Alan Demers Johannes Gehrke
"... In many scientific domains, researchers are turning to large-scale behavioral simulations to better understand real-world phenomena. While there has been a great deal of work on simulation tools from the high-performance computing community, behavioral simulations remain challenging to program and a ..."
Abstract - Cited by 5 (4 self) - Add to MetaCart
and automatically scale in parallel environments. In this paper we present BRACE (Big Red Agent-based Computation Engine), which extends the MapReduce framework to process these simulations efficiently across a cluster. We can leverage spatial locality to treat behavioral simulations as iterated spatial joins

A MapReduce Skeleton for Skandium

by Ioannis Assiouras
"... MapReduce is a popular programming model currently used for application development on large scale clusters. MapReduce realizes the concept of parallel programming skeletons: The model describes the overall structure of a computation, the programmer plugs in the low level problem-specific code that ..."
Abstract - Add to MetaCart
MapReduce is a popular programming model currently used for application development on large scale clusters. MapReduce realizes the concept of parallel programming skeletons: The model describes the overall structure of a computation, the programmer plugs in the low level problem-specific code

Comparison and Evaluation of Code Clone Detection Techniques and Tools: A Qualitative Approach

by Chanchal K. Roy, James R. Cordy , Rainer Koschke - SCIENCE OF COMPUTER PROGRAMMING , 2009
"... Over the last decade many techniques and tools for software clone detection have been proposed. In this paper, we provide a qualitative comparison and evaluation of the current state-of-the-art in clone detection techniques and tools, and organize the large amount of information into a coherent conc ..."
Abstract - Cited by 144 (32 self) - Add to MetaCart
might use the results of this study to choose the most appropriate clone detection tool or technique in the context of a particular set of goals and constraints. The primary contributions of this paper are: (1) a schema for classifying clone detection techniques and tools and a classification of current

Accurate Spectral Clustering for Community Detection in MapReduce

by Serafeim Tsironis, Mauro Sozio, Telecom Paristech, Michalis Vazirgiannis
"... Spectral clustering has become one of the most popular clustering algorithms and it is currently being used in a wide range of applications. Unfortunately, the running time of spectral clustering algorithms might be cubic on the size of the input dataset, which makes it prohibitive to use this appro ..."
Abstract - Cited by 1 (0 self) - Add to MetaCart
this approach on very large datasets. In recent years, several efforts have been made to cope with these scalability issues, however, a satisfactory solution is still missing. In this work 1, we investigate a variant of the spectral clustering which can be efficiently parallelized in MapReduce and we study its

Using slicing to identify duplication in source code

by Raghavan Komondoor, Susan Horwitz - IN PROCEEDINGS OF THE 8TH INTERNATIONAL SYMPOSIUM ON STATIC ANALYSIS , 2001
"... Programs often have a lot of duplicated code, which makes both understanding and maintenance more difficult. This problem can be alleviated by detecting duplicated code, extracting it into a separate new procedure, and replacing all the clones (the instances of the duplicated code) by calls to the ..."
Abstract - Cited by 150 (4 self) - Add to MetaCart
Programs often have a lot of duplicated code, which makes both understanding and maintenance more difficult. This problem can be alleviated by detecting duplicated code, extracting it into a separate new procedure, and replacing all the clones (the instances of the duplicated code) by calls

Optimization Principles and Application Performance Evaluation of a Multithreaded GPU Using CUDA

by Shane Ryoo, Christopher I. Rodrigues, Sara S. Baghsorkhi, Sam S. Stone, et al. - PPOPP'08 , 2008
"... GPUs have recently attracted the attention of many application developers as commodity data-parallel coprocessors. The newest generations of GPU architecture provide easier programmability and increased generality while maintaining the tremendous memory bandwidth and computational power of tradition ..."
Abstract - Cited by 215 (11 self) - Add to MetaCart
to the same or contiguous memory locations and apply classical optimizations to reduce the number of executed operations. We apply these strategies across a variety of applications and domains and achieve between a 10.5X to 457X speedup in kernel codes and between 1.16X to 431X total application speedup.

A Survey on Software Clone Detection Research

by Chanchal Kumar Roy, James R. Cordy - SCHOOL OF COMPUTING TR 2007-541, QUEEN’S UNIVERSITY , 2007
"... Code duplication or copying a code fragment and then reuse by pasting with or without any modifications is a well known code smell in software maintenance. Several studies show that about 5 % to 20 % of a software systems can contain duplicated code, which is basically the results of copying existin ..."
Abstract - Cited by 131 (17 self) - Add to MetaCart
. In this paper, we survey the state of the art in clone detection research. First, we describe the clone terms commonly used in the literature along with their corresponding mappings to the commonly used clone types. Second, we provide a review of the existing clone taxonomies, detection approaches

Enabling Inter-Machine Parallelism in High-Level Languages with SEJITS and MapReduce

by Michael Driscoll, Evangelos Georganas, Penporn Koanantakool
"... Selective, embedded, just-in-time specialization (SEJITS) is a technique for optimizing embedded domain-specific languages through the use of specializers, or code modules developed by expert programmers that target particular accelerators such as multicore processors and GPUs via justin-time compil ..."
Abstract - Add to MetaCart
-time compilation. We extend SEJITS to exploit intermachine parallelism by targeting clusters of machines via MapReduce. Our work enables the development of specializers for large, data-parallel applications whose workflows can be cast as MapReduce operations. We present an implementation that targets Hadoop and we
Next 10 →
Results 1 - 10 of 1,536
Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University