Results 1  10
of
926,054
Pig Latin: A NotSoForeign Language for Data Processing
"... There is a growing need for adhoc analysis of extremely large data sets, especially at internet companies where innovation critically depends on being able to analyze terabytes of data collected every day. Parallel database products, e.g., Teradata, offer a solution, but are usually prohibitively e ..."
Abstract

Cited by 584 (12 self)
 Add to MetaCart
expensive at this scale. Besides, many of the people who analyze this data are entrenched procedural programmers, who find the declarative, SQL style to be unnatural. The success of the more procedural mapreduce programming model, and its associated scalable implementations on commodity hardware
SkewTune: Mitigating Skew in MapReduce Applications
, 2012
"... We present an automatic skew mitigation approach for userdefined MapReduce programs and present SkewTune, a system that implements this approach as a dropin replacement for an existing MapReduce implementation. There are three key challenges: (a) require no extra input from the user yet work for al ..."
Abstract

Cited by 45 (6 self)
 Add to MetaCart
for all MapReduce applications, (b) be completely transparent, and (c) impose minimal overhead if there is no skew. The SkewTune approach addresses these challenges and works as follows: When a node in the cluster becomes idle, SkewTune identifies the task with the greatest expected remaining processing
Hive A Warehousing Solution Over a MapReduce Framework
 IN VLDB '09: PROCEEDINGS OF THE VLDB ENDOWMENT
, 2009
"... The size of data sets being collected and analyzed in the
industry for business intelligence is growing rapidly, mak
ing traditional warehousing solutions prohibitively expen
sive. Hadoop [3] is a popular opensource mapreduce im
plementation which is being used as an alternative to store
and pr ..."
Abstract

Cited by 247 (1 self)
 Add to MetaCart
and process extremely large data sets on commodity hard
ware. However, the mapreduce programming model is very
low level and requires developers to write custom programs
which are hard to maintain and reuse.
In this paper, we present Hive, an opensource data ware
housing solution built on top of Hadoop
Efficient similarity search in sequence databases
, 1994
"... We propose an indexing method for time sequences for processing similarity queries. We use the Discrete Fourier Transform (DFT) to map time sequences to the frequency domain, the crucial observation being that, for most sequences of practical interest, only the first few frequencies are strong. Anot ..."
Abstract

Cited by 505 (21 self)
 Add to MetaCart
We propose an indexing method for time sequences for processing similarity queries. We use the Discrete Fourier Transform (DFT) to map time sequences to the frequency domain, the crucial observation being that, for most sequences of practical interest, only the first few frequencies are strong
An Efficient Boosting Algorithm for Combining Preferences
, 1999
"... The problem of combining preferences arises in several applications, such as combining the results of different search engines. This work describes an efficient algorithm for combining multiple preferences. We first give a formal framework for the problem. We then describe and analyze a new boosting ..."
Abstract

Cited by 707 (18 self)
 Add to MetaCart
The problem of combining preferences arises in several applications, such as combining the results of different search engines. This work describes an efficient algorithm for combining multiple preferences. We first give a formal framework for the problem. We then describe and analyze a new
Implementing data cubes efficiently
 In SIGMOD
, 1996
"... Decision support applications involve complex queries on very large databases. Since response times should be small, query optimization is critical. Users typically view the data as multidimensional data cubes. Each cell of the data cube is a view consisting of an aggregation of interest, like total ..."
Abstract

Cited by 545 (1 self)
 Add to MetaCart
Decision support applications involve complex queries on very large databases. Since response times should be small, query optimization is critical. Users typically view the data as multidimensional data cubes. Each cell of the data cube is a view consisting of an aggregation of interest, like total sales. The values of many of these cells are dependent on the values of other cells in the data cube..A common and powerful query optimization technique is to materialize some or all of these cells rather than compute them from raw data each time. Commercial systems differ mainly in their approach to materializing the data cube. In this paper, we investigate the issue of which cells (views) to materialize when it is too expensive to materialize all views. A lattice framework is used to express dependencies among views. We present greedy algorithms that work off this lattice and determine a good set of views to materialize. The greedy algorithm performs within a small constant factor of optimal under a variety of models. We then consider the most common case of the hypercube lattice and examine the choice of materialized views for hypercubes in detail, giving some good tradeoffs between the space used and the average time to answer a query. 1
Efficient semantic matching
, 2004
"... We think of Match as an operator which takes two graphlike structures and produces a mapping between semantically related nodes. We concentrate on classifications with tree structures. In semantic matching, correspondences are discovered by translating the natural language labels of nodes into prop ..."
Abstract

Cited by 817 (67 self)
 Add to MetaCart
We think of Match as an operator which takes two graphlike structures and produces a mapping between semantically related nodes. We concentrate on classifications with tree structures. In semantic matching, correspondences are discovered by translating the natural language labels of nodes
On the impossibility of informationally efficient markets
 AMERICAN ECONOMIC REVIEW
, 1980
"... ..."
Approximate Signal Processing
, 1997
"... It is increasingly important to structure signal processing algorithms and systems to allow for trading off between the accuracy of results and the utilization of resources in their implementation. In any particular context, there are typically a variety of heuristic approaches to managing these tra ..."
Abstract

Cited by 516 (2 self)
 Add to MetaCart
It is increasingly important to structure signal processing algorithms and systems to allow for trading off between the accuracy of results and the utilization of resources in their implementation. In any particular context, there are typically a variety of heuristic approaches to managing
Efficient implementation of a BDD package
 In Proceedings of the 27th ACM/IEEE conference on Design autamation
, 1991
"... Efficient manipulation of Boolean functions is an important component of many computeraided design tasks. This paper describes a package for manipulating Boolean functions based on the reduced, ordered, binary decision diagram (ROBDD) representation. The package is based on an efficient implementat ..."
Abstract

Cited by 500 (9 self)
 Add to MetaCart
Efficient manipulation of Boolean functions is an important component of many computeraided design tasks. This paper describes a package for manipulating Boolean functions based on the reduced, ordered, binary decision diagram (ROBDD) representation. The package is based on an efficient
Results 1  10
of
926,054