Results 1 - 10
of
7,461
Data-dependent Hashing Based on p-Stable Distribution
, 2014
"... The p-stable distribution is traditionally used for data-independent hashing. In this paper, we describe how to perform data-dependent hashing based on p-stable distribution. We commence by formulating the Euclidean distance preserving property in terms of variance estimation. Based on this propert ..."
Abstract
- Add to MetaCart
The p-stable distribution is traditionally used for data-independent hashing. In this paper, we describe how to perform data-dependent hashing based on p-stable distribution. We commence by formulating the Euclidean distance preserving property in terms of variance estimation. Based
1Data-dependent Hashing Based on p-Stable Distribution
"... Abstract—The p-stable distribution is traditionally used for data-independent hashing. In this paper, we describe how to perform data-dependent hashing based on p-stable distribution. We commence by formulating the Euclidean distance preserving property in terms of variance estimation. Based on this ..."
Abstract
- Add to MetaCart
Abstract—The p-stable distribution is traditionally used for data-independent hashing. In this paper, we describe how to perform data-dependent hashing based on p-stable distribution. We commence by formulating the Euclidean distance preserving property in terms of variance estimation. Based
Locality-sensitive hashing scheme based on p-stable distributions
- In SCG ’04: Proceedings of the twentieth annual symposium on Computational geometry
, 2004
"... inÇÐÓ�Ò We present a novel Locality-Sensitive Hashing scheme for the Approximate Nearest Neighbor Problem underÐÔnorm, based onÔstable distributions. Our scheme improves the running time of the earlier algorithm for the case of theÐnorm. It also yields the first known provably efficient approximate ..."
Abstract
-
Cited by 521 (8 self)
- Add to MetaCart
inÇÐÓ�Ò We present a novel Locality-Sensitive Hashing scheme for the Approximate Nearest Neighbor Problem underÐÔnorm, based onÔstable distributions. Our scheme improves the running time of the earlier algorithm for the case of theÐnorm. It also yields the first known provably efficient approximate
LABEL PROPAGATION HASHING BASED ON P-STABLE DISTRIBUTION AND COORDINATE DESCENT
"... Hashing is a useful tool for contents-based image retrieval on large scale database. This paper presents an unsupervised data-dependent hashing method which learns similarity pre-serving binary codes. It uses p-stable distribution and coordi-nate descent method to achieve a good approximate solution ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
Hashing is a useful tool for contents-based image retrieval on large scale database. This paper presents an unsupervised data-dependent hashing method which learns similarity pre-serving binary codes. It uses p-stable distribution and coordi-nate descent method to achieve a good approximate
Consistent hashing and random trees: Distributed caching protocols for relieving hot spots on the World Wide Web
- IN PROC. 29TH ACM SYMPOSIUM ON THEORY OF COMPUTING (STOC
, 1997
"... We describe a family of caching protocols for distrib-uted networks that can be used to decrease or eliminate the occurrence of hot spots in the network. Our protocols are particularly designed for use with very large networks such as the Internet, where delays caused by hot spots can be severe, and ..."
Abstract
-
Cited by 699 (10 self)
- Add to MetaCart
of existing resources, and scale gracefully as the network grows. Our caching protocols are based on a special kind of hashing that we call consistent hashing. Roughly speaking, a consistent hash function is one which changes minimally as the range of the function changes. Through the development of good
Self-Similarity in World Wide Web Traffic: Evidence and Possible Causes
, 1996
"... Recently the notion of self-similarity has been shown to apply to wide-area and local-area network traffic. In this paper we examine the mechanisms that give rise to the self-similarity of network traffic. We present a hypothesized explanation for the possible self-similarity of traffic by using a p ..."
Abstract
-
Cited by 1416 (26 self)
- Add to MetaCart
, we show evidence that WWW traffic exhibits behavior that is consistent with self-similar traffic models. Then we show that the self-similarity insuch traffic can be explained based on the underlying distributions of WWW document sizes, the effects of caching and user preference in le transfer
Using Bayesian networks to analyze expression data
- Journal of Computational Biology
, 2000
"... DNA hybridization arrays simultaneously measure the expression level for thousands of genes. These measurements provide a “snapshot ” of transcription levels within the cell. A major challenge in computational biology is to uncover, from such measurements, gene/protein interactions and key biologica ..."
Abstract
-
Cited by 1088 (17 self)
- Add to MetaCart
biological features of cellular systems. In this paper, we propose a new framework for discovering interactions between genes based on multiple expression measurements. This framework builds on the use of Bayesian networks for representing statistical dependencies. A Bayesian network is a graph-based model
Probabilistic Counting Algorithms for Data Base Applications
, 1985
"... This paper introduces a class of probabilistic counting lgorithms with which one can estimate the number of distinct elements in a large collection of data (typically a large file stored on disk) in a single pass using only a small additional storage (typically less than a hundred binary words) a ..."
Abstract
-
Cited by 444 (6 self)
- Add to MetaCart
) and only a few operations per element scanned. The algorithms are based on statistical observations made on bits of hashed values of records. They are by con- struction totally insensitive to the replicafive structure of elements in the file; they can be used in the context of distributed systems
Similarity estimation techniques from rounding algorithms
- In Proc. of 34th STOC
, 2002
"... A locality sensitive hashing scheme is a distribution on a family F of hash functions operating on a collection of objects, such that for two objects x, y, Prh∈F[h(x) = h(y)] = sim(x,y), where sim(x,y) ∈ [0, 1] is some similarity function defined on the collection of objects. Such a scheme leads ..."
Abstract
-
Cited by 449 (6 self)
- Add to MetaCart
). Our hash functions map distributions to points in the metric space such that, for distributions P and Q,
Statistical methods for identifying differentially expressed genes in replicated cDNA microarray experiments
- STATISTICA SINICA
, 2002
"... DNA microarrays are a new and promising biotechnology whichallows the monitoring of expression levels in cells for thousands of genes simultaneously. The present paper describes statistical methods for the identification of differentially expressed genes in replicated cDNA microarray experiments. A ..."
Abstract
-
Cited by 438 (12 self)
- Add to MetaCart
into account the dependence structure between the gene expression levels. No specific parametric form is assumed for the distribution of the test statistics and a permutation procedure is used to estimate adjusted p-values. Several data displays are suggested for the visual identification of differentially
Results 1 - 10
of
7,461