• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations

Tools

Sorted by:
Try your query at:
Semantic Scholar Scholar Academic
Google Bing DBLP
Results 1 - 10 of 201,264
Next 10 →

MapReduce: Simplified data processing on large clusters.

by Jeffrey Dean , Sanjay Ghemawat - In Proceedings of the Sixth Symposium on Operating System Design and Implementation (OSDI-04), , 2004
"... Abstract MapReduce is a programming model and an associated implementation for processing and generating large data sets. Programs written in this functional style are automatically parallelized and executed on a large cluster of commodity machines. The run-time system takes care of the details of ..."
Abstract - Cited by 3439 (3 self) - Add to MetaCart
Abstract MapReduce is a programming model and an associated implementation for processing and generating large data sets. Programs written in this functional style are automatically parallelized and executed on a large cluster of commodity machines. The run-time system takes care of the details

Pig Latin: A Not-So-Foreign Language for Data Processing

by Christopher Olston, Benjamin Reed, Utkarsh Srivastava, Ravi Kumar, Andrew Tomkins
"... There is a growing need for ad-hoc analysis of extremely large data sets, especially at internet companies where innovation critically depends on being able to analyze terabytes of data collected every day. Parallel database products, e.g., Teradata, offer a solution, but are usually prohibitively e ..."
Abstract - Cited by 607 (13 self) - Add to MetaCart
There is a growing need for ad-hoc analysis of extremely large data sets, especially at internet companies where innovation critically depends on being able to analyze terabytes of data collected every day. Parallel database products, e.g., Teradata, offer a solution, but are usually prohibitively

Models and issues in data stream systems

by Brian Babcock, Shivnath Babu, Mayur Datar, Rajeev Motwani, Jennifer Widom - IN PODS , 2002
"... In this overview paper we motivate the need for and research issues arising from a new model of data processing. In this model, data does not take the form of persistent relations, but rather arrives in multiple, continuous, rapid, time-varying data streams. In addition to reviewing past work releva ..."
Abstract - Cited by 786 (19 self) - Add to MetaCart
In this overview paper we motivate the need for and research issues arising from a new model of data processing. In this model, data does not take the form of persistent relations, but rather arrives in multiple, continuous, rapid, time-varying data streams. In addition to reviewing past work

Static Scheduling of Synchronous Data Flow Programs for Digital Signal Processing

by Edward Ashford Lee, David G. Messerschmitt - IEEE TRANSACTIONS ON COMPUTERS , 1987
"... Large grain data flow (LGDF) programming is natural and convenient for describing digital signal processing (DSP) systems, but its runtime overhead is costly in real time or cost-sensitive applications. In some situations, designers are not willing to squander computing resources for the sake of pro ..."
Abstract - Cited by 598 (37 self) - Add to MetaCart
Large grain data flow (LGDF) programming is natural and convenient for describing digital signal processing (DSP) systems, but its runtime overhead is costly in real time or cost-sensitive applications. In some situations, designers are not willing to squander computing resources for the sake

Verbal reports as data

by K. Anders Ericsson, Herbert A. Simon - Psychological Review , 1980
"... The central proposal of this article is that verbal reports are data. Accounting for verbal reports, as for other kinds of data, requires explication of the mech-anisms by which the reports are generated, and the ways in which they are sensitive to experimental factors (instructions, tasks, etc.). W ..."
Abstract - Cited by 513 (3 self) - Add to MetaCart
The central proposal of this article is that verbal reports are data. Accounting for verbal reports, as for other kinds of data, requires explication of the mech-anisms by which the reports are generated, and the ways in which they are sensitive to experimental factors (instructions, tasks, etc

Gaussian processes for machine learning

by Carl Edward Rasmussen , 2003
"... We give a basic introduction to Gaussian Process regression models. We focus on understanding the role of the stochastic process and how it is used to define a distribution over functions. We present the simple equations for incorporating training data and examine how to learn the hyperparameters us ..."
Abstract - Cited by 720 (2 self) - Add to MetaCart
We give a basic introduction to Gaussian Process regression models. We focus on understanding the role of the stochastic process and how it is used to define a distribution over functions. We present the simple equations for incorporating training data and examine how to learn the hyperparameters

Hierarchical Dirichlet processes.

by Yee Whye Teh , Michael I Jordan , Matthew J Beal , David M Blei - Journal of the American Statistical Association, , 2006
"... We consider problems involving groups of data where each observation within a group is a draw from a mixture model and where it is desirable to share mixture components between groups. We assume that the number of mixture components is unknown a priori and is to be inferred from the data. In this s ..."
Abstract - Cited by 942 (78 self) - Add to MetaCart
We consider problems involving groups of data where each observation within a group is a draw from a mixture model and where it is desirable to share mixture components between groups. We assume that the number of mixture components is unknown a priori and is to be inferred from the data

Data Streams: Algorithms and Applications

by S. Muthukrishnan , 2005
"... In the data stream scenario, input arrives very rapidly and there is limited memory to store the input. Algorithms have to work with one or few passes over the data, space less than linear in the input size or time significantly less than the input size. In the past few years, a new theory has emerg ..."
Abstract - Cited by 533 (22 self) - Add to MetaCart
analysis, mining text message streams and processing massive data sets in general. Researchers in Theoretical Computer Science, Databases, IP Networking and Computer Systems are working on the data stream challenges. This article is an overview and survey of data stream algorithmics and is an updated

The Protein Data Bank

by Helen M. Berman, John Westbrook, Zukang Feng, Gary Gilliland, T. N. Bhat, Helge Weissig, Ilya N. Shindyalov, Philip E. Bourne - Nucleic Acids Res , 2000
"... The Protein Data Bank (PDB; http://www.rcsb.org/pdb/ ) is the single worldwide archive of structural data of biological macromolecules. This paper describes the goals of the PDB, the systems in place for data deposition and access, how to obtain further information, and near-term plans for the futur ..."
Abstract - Cited by 1387 (24 self) - Add to MetaCart
deposited. In the 1980s the number of deposited structures began to increase dramatically. This was due to the improved technology for all aspects of the crystallographic process, the addition of structures determined by nuclear magnetic resonance (NMR) methods, and changes in the community views about data

Synchronous data flow

by Edward A. Lee, et al. , 1987
"... Data flow is a natural paradigm for describing DSP applications for concurrent implementation on parallel hardware. Data flow programs for signal processing are directed graphs where each node represents a function and each arc represents a signal path. Synchronous data flow (SDF) is a special case ..."
Abstract - Cited by 622 (45 self) - Add to MetaCart
Data flow is a natural paradigm for describing DSP applications for concurrent implementation on parallel hardware. Data flow programs for signal processing are directed graphs where each node represents a function and each arc represents a signal path. Synchronous data flow (SDF) is a special case
Next 10 →
Results 1 - 10 of 201,264
Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University