• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations

Tools

Sorted by:
Try your query at:
Semantic Scholar Scholar Academic
Google Bing DBLP
Results 1 - 10 of 150,531
Next 10 →

CoolStreaming/DONet: A Data-driven Overlay Network for Peer-to-Peer Live Media Streaming

by Xinyan Zhang, Jiangchuan Liu, Bo Li, Tak-shing Peter Yum - in IEEE Infocom , 2005
"... This paper presents DONet, a Data-driven Overlay Network for live media streaming. The core operations in DONet are very simple: every node periodically exchanges data availability information with a set of partners, and retrieves unavailable data from one or more partners, or supplies available dat ..."
Abstract - Cited by 475 (42 self) - Add to MetaCart
This paper presents DONet, a Data-driven Overlay Network for live media streaming. The core operations in DONet are very simple: every node periodically exchanges data availability information with a set of partners, and retrieves unavailable data from one or more partners, or supplies available

A block-sorting lossless data compression algorithm

by M Burrows , D J Wheeler , 1994
"... We describe a block-sorting, lossless data compression algorithm, and our implementation of that algorithm. We compare the performance of our implementation with widely available data compressors running on the same hardware. The algorithm works by applying a reversible transformation to a block o ..."
Abstract - Cited by 809 (5 self) - Add to MetaCart
We describe a block-sorting, lossless data compression algorithm, and our implementation of that algorithm. We compare the performance of our implementation with widely available data compressors running on the same hardware. The algorithm works by applying a reversible transformation to a block

DBpedia: A Nucleus for a Web of Open Data

by Sören Auer, Christian Bizer, Georgi Kobilarov, Jens Lehmann, Zachary Ives, et al. - PROC. 6TH INT’L SEMANTIC WEB CONF , 2007
"... DBpedia is a community effort to extract structured information from Wikipedia and to make this information available on the Web. DBpedia allows you to ask sophisticated queries against datasets derived from Wikipedia and to link other datasets on the Web to Wikipedia data. We describe the extractio ..."
Abstract - Cited by 651 (37 self) - Add to MetaCart
DBpedia is a community effort to extract structured information from Wikipedia and to make this information available on the Web. DBpedia allows you to ask sophisticated queries against datasets derived from Wikipedia and to link other datasets on the Web to Wikipedia data. We describe

Data Mining: An Overview from Database Perspective

by Ming-syan Chen, Jiawei Hun, Philip S. Yu - IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING , 1996
"... Mining information and knowledge from large databases has been recognized by many researchers as a key research topic in database systems and machine learning, and by many industrial companies as an important area with an opportunity of major revenues. Researchers in many different fields have sh ..."
Abstract - Cited by 532 (26 self) - Add to MetaCart
the business opportunities. In response to such a demand, this article is to provide a survey, from a database researcher's point of view, on the data mining techniques developed recently. A classification of the available data mining techniques is provided and a comparative study of such techniques

Optimizing Search Engines using Clickthrough Data

by Thorsten Joachims , 2002
"... This paper presents an approach to automatically optimizing the retrieval quality of search engines using clickthrough data. Intuitively, a good information retrieval system should present relevant documents high in the ranking, with less relevant documents following below. While previous approaches ..."
Abstract - Cited by 1314 (23 self) - Add to MetaCart
-log of the search engine in connection with the log of links the users clicked on in the presented ranking. Such clickthrough data is available in abundance and can be recorded at very low cost. Taking a Support Vector Machine (SVM) approach, this paper presents a method for learning retrieval functions. From a

Combining labeled and unlabeled data with co-training

by Avrim Blum, Tom Mitchell , 1998
"... We consider the problem of using a large unlabeled sample to boost performance of a learning algorithm when only a small set of labeled examples is available. In particular, we consider a setting in which the description of each example can be partitioned into two distinct views, motivated by the ta ..."
Abstract - Cited by 1633 (28 self) - Add to MetaCart
We consider the problem of using a large unlabeled sample to boost performance of a learning algorithm when only a small set of labeled examples is available. In particular, we consider a setting in which the description of each example can be partitioned into two distinct views, motivated

BIRCH: an efficient data clustering method for very large databases

by Tian Zhang, Raghu Ramakrishnan, Miron Livny - In Proc. of the ACM SIGMOD Intl. Conference on Management of Data (SIGMOD , 1996
"... Finding useful patterns in large datasets has attracted considerable interest recently, and one of the most widely st,udied problems in this area is the identification of clusters, or deusel y populated regions, in a multi-dir nensional clataset. Prior work does not adequately address the problem of ..."
Abstract - Cited by 576 (2 self) - Add to MetaCart
multi-dimensional metric data points to try to produce the best quality clustering with the available resources (i. e., available memory and time constraints). BIRCH can typically find a goocl clustering with a single scan of the data, and improve the quality further with a few aclditioual scans. BIRCH

Dryad: Distributed Data-Parallel Programs from Sequential Building Blocks

by Michael Isard, Mihai Budiu, Yuan Yu, Andrew Birrell, Dennis Fetterly - In EuroSys , 2007
"... Dryad is a general-purpose distributed execution engine for coarse-grain data-parallel applications. A Dryad applica-tion combines computational “vertices ” with communica-tion “channels ” to form a dataflow graph. Dryad runs the application by executing the vertices of this graph on a set of availa ..."
Abstract - Cited by 762 (27 self) - Add to MetaCart
simultaneously on multi-ple computers, or on multiple CPU cores within a computer. The application can discover the size and placement of data at run time, and modify the graph as the computation pro-gresses to make efficient use of the available resources. Dryad is designed to scale from powerful multi-core sin

A comparison of bayesian methods for haplotype reconstruction from population genotype data.

by Matthew Stephens , Peter Donnelly , Dr Matthew Stephens - Am J Hum Genet , 2003
"... In this report, we compare and contrast three previously published Bayesian methods for inferring haplotypes from genotype data in a population sample. We review the methods, emphasizing the differences between them in terms of both the models ("priors") they use and the computational str ..."
Abstract - Cited by 557 (7 self) - Add to MetaCart
operates through the transmission of chromosomal segments. Experimental methods for haplotype determination exist, but they are currently timeconsuming and expensive. Statistical methods for inferring haplotypes are therefore of considerable interest. In some studies, data may be available on related

Pig Latin: A Not-So-Foreign Language for Data Processing

by Christopher Olston, Benjamin Reed, Utkarsh Srivastava, Ravi Kumar, Andrew Tomkins
"... There is a growing need for ad-hoc analysis of extremely large data sets, especially at internet companies where innovation critically depends on being able to analyze terabytes of data collected every day. Parallel database products, e.g., Teradata, offer a solution, but are usually prohibitively e ..."
Abstract - Cited by 607 (13 self) - Add to MetaCart
for the development and execution of their data analysis tasks, compared to using Hadoop directly. We also report on a novel debugging environment that comes integrated with Pig, that can lead to even higher productivity gains. Pig is an open-source, Apache-incubator project, and available for general use. 1.
Next 10 →
Results 1 - 10 of 150,531
Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University