Results 1  10
of
495
Fuzzy extractors: How to generate strong keys from biometrics and other noisy data. Technical Report 2003/235, Cryptology ePrint archive, http://eprint.iacr.org, 2006. Previous version appeared at EUROCRYPT 2004
 34 [DRS07] [DS05] [EHMS00] [FJ01] Yevgeniy Dodis, Leonid Reyzin, and Adam
, 2004
"... We provide formal definitions and efficient secure techniques for • turning noisy information into keys usable for any cryptographic application, and, in particular, • reliably and securely authenticating biometric data. Our techniques apply not just to biometric information, but to any keying mater ..."
Abstract

Cited by 532 (38 self)
 Add to MetaCart
We provide formal definitions and efficient secure techniques for • turning noisy information into keys usable for any cryptographic application, and, in particular, • reliably and securely authenticating biometric data. Our techniques apply not just to biometric information, but to any keying material that, unlike traditional cryptographic keys, is (1) not reproducible precisely and (2) not distributed uniformly. We propose two primitives: a fuzzy extractor reliably extracts nearly uniform randomness R from its input; the extraction is errortolerant in the sense that R will be the same even if the input changes, as long as it remains reasonably close to the original. Thus, R can be used as a key in a cryptographic application. A secure sketch produces public information about its input w that does not reveal w, and yet allows exact recovery of w given another value that is close to w. Thus, it can be used to reliably reproduce errorprone biometric inputs without incurring the security risk inherent in storing them. We define the primitives to be both formally secure and versatile, generalizing much prior work. In addition, we provide nearly optimal constructions of both primitives for various measures of “closeness” of input data, such as Hamming distance, edit distance, and set difference.
Nearoptimal hashing algorithms for approximate nearest neighbor in high dimensions
, 2008
"... In this article, we give an overview of efficient algorithms for the approximate and exact nearest neighbor problem. The goal is to preprocess a dataset of objects (e.g., images) so that later, given a new query object, one can quickly return the dataset object that is most similar to the query. The ..."
Abstract

Cited by 443 (7 self)
 Add to MetaCart
In this article, we give an overview of efficient algorithms for the approximate and exact nearest neighbor problem. The goal is to preprocess a dataset of objects (e.g., images) so that later, given a new query object, one can quickly return the dataset object that is most similar to the query. The problem is of significant interest in a wide variety of areas.
Similarity estimation techniques from rounding algorithms
 In Proc. of 34th STOC
, 2002
"... A locality sensitive hashing scheme is a distribution on a family F of hash functions operating on a collection of objects, such that for two objects x, y, Prh∈F[h(x) = h(y)] = sim(x,y), where sim(x,y) ∈ [0, 1] is some similarity function defined on the collection of objects. Such a scheme leads ..."
Abstract

Cited by 436 (6 self)
 Add to MetaCart
(Show Context)
A locality sensitive hashing scheme is a distribution on a family F of hash functions operating on a collection of objects, such that for two objects x, y, Prh∈F[h(x) = h(y)] = sim(x,y), where sim(x,y) ∈ [0, 1] is some similarity function defined on the collection of objects. Such a scheme leads to a compact representation of objects so that similarity of objects can be estimated from their compact sketches, and also leads to efficient algorithms for approximate nearest neighbor search and clustering. Minwise independent permutations provide an elegant construction of such a locality sensitive hashing scheme for a collection of subsets with the set similarity measure sim(A, B) = A∩B A∪B . We show that rounding algorithms for LPs and SDPs used in the context of approximation algorithms can be viewed as locality sensitive hashing schemes for several interesting collections of objects. Based on this insight, we construct new locality sensitive hashing schemes for: 1. A collection of vectors with the distance between ⃗u and ⃗v measured by θ(⃗u,⃗v)/π, where θ(⃗u,⃗v) is the angle between ⃗u and ⃗v. This yields a sketching scheme for estimating the cosine similarity measure between two vectors, as well as a simple alternative to minwise independent permutations for estimating set similarity. 2. A collection of distributions on n points in a metric space, with distance between distributions measured by the Earth Mover Distance (EMD), (a popular distance measure in graphics and vision). Our hash functions map distributions to points in the metric space such that, for distributions P and Q,
Bullet: High Bandwidth Data Dissemination Using an Overlay Mesh
, 2003
"... In recent years, overlay networks have become an effective alternative to IP multicast for efficient point to multipoint communication across the Internet. Typically, nodes selforganize with the goal of forming an efficient overlay tree, one that meets performance targets without placing undue burd ..."
Abstract

Cited by 424 (23 self)
 Add to MetaCart
In recent years, overlay networks have become an effective alternative to IP multicast for efficient point to multipoint communication across the Internet. Typically, nodes selforganize with the goal of forming an efficient overlay tree, one that meets performance targets without placing undue burden on the underlying network. In this paper, we target highbandwidth data distribution from a single source to a large number of receivers. Applications include largefile transfers and realtime multimedia streaming. For these applications, we argue that an overlay mesh, rather than a tree, can deliver fundamentally higher bandwidth and reliability relative to typical tree structures. This paper presents Bullet, a scalable and distributed algorithm that enables nodes spread across the Internet to selforganize into a high bandwidth overlay mesh. We construct Bullet around the insight that data should be distributed in a disjoint manner to strategic points in the network. Individual Bullet receivers are then responsible for locating and retrieving the data from multiple points in parallel. Key contributions of this work include: i) an algorithm that sends data to di#erent points in the overlay such that any data object is equally likely to appear at any node, ii) a scalable and decentralized algorithm that allows nodes to locate and recover missing data items, and iii) a complete implementation and evaluation of Bullet running across the Internet and in a largescale emulation environment reveals up to a factor two bandwidth improvements under a variety of circumstances. In addition, we find that, relative to treebased solutions, Bullet reduces the need to perform expensive bandwidth probing.
A Lowbandwidth Network File System
, 2001
"... This paper presents LBFS, a network file system designed for low bandwidth networks. LBFS exploits similarities between files or versions of the same file to save bandwidth. It avoids sending data over the network when the same data can already be found in the server's file system or the client ..."
Abstract

Cited by 393 (3 self)
 Add to MetaCart
(Show Context)
This paper presents LBFS, a network file system designed for low bandwidth networks. LBFS exploits similarities between files or versions of the same file to save bandwidth. It avoids sending data over the network when the same data can already be found in the server's file system or the client's cache. Using this technique, LBFS achieves up to two orders of magnitude reduction in bandwidth utilization on common workloads, compared to traditional network file systems
From frequency to meaning : Vector space models of semantics
 Journal of Artificial Intelligence Research
, 2010
"... Computers understand very little of the meaning of human language. This profoundly limits our ability to give instructions to computers, the ability of computers to explain their actions to us, and the ability of computers to analyse and process text. Vector space models (VSMs) of semantics are begi ..."
Abstract

Cited by 322 (3 self)
 Add to MetaCart
(Show Context)
Computers understand very little of the meaning of human language. This profoundly limits our ability to give instructions to computers, the ability of computers to explain their actions to us, and the ability of computers to analyse and process text. Vector space models (VSMs) of semantics are beginning to address these limits. This paper surveys the use of VSMs for semantic processing of text. We organize the literature on VSMs according to the structure of the matrix in a VSM. There are currently three broad classes of VSMs, based on term–document, word–context, and pair–pattern matrices, yielding three classes of applications. We survey a broad range of applications in these three categories and we take a detailed look at a specific open source project in each category. Our goal in this survey is to show the breadth of applications of VSMs for semantics, to provide a new perspective on VSMs for those who are already familiar with the area, and to provide pointers into the literature for those who are less familiar with the field. 1.
Minwise Independent Permutations
 Journal of Computer and System Sciences
, 1998
"... We define and study the notion of minwise independent families of permutations. We say that F ⊆ Sn is minwise independent if for any set X ⊆ [n] and any x ∈ X, when π is chosen at random in F we have Pr(min{π(X)} = π(x)) = 1 X . In other words we require that all the elements of any fixed set ..."
Abstract

Cited by 276 (14 self)
 Add to MetaCart
(Show Context)
We define and study the notion of minwise independent families of permutations. We say that F ⊆ Sn is minwise independent if for any set X ⊆ [n] and any x ∈ X, when π is chosen at random in F we have Pr(min{π(X)} = π(x)) = 1 X . In other words we require that all the elements of any fixed set X have an equal chance to become the minimum element of the image of X under π. Our research was motivated by the fact that such a family (under some relaxations) is essential to the algorithm used in practice by the AltaVista web index software to detect and filter nearduplicate documents. However, in the course of our investigation we have discovered interesting and challenging theoretical questions related to this concept – we present the solutions to some of them and we list the rest as open problems.
Google news personalization: scalable online collaborative filtering
 in WWW, 2007
"... Several approaches to collaborative filtering have been studied but seldom have studies been reported for large (several million users and items) and dynamic (the underlying item set is continually changing) settings. In this paper we describe our approach to collaborative filtering for generating p ..."
Abstract

Cited by 266 (0 self)
 Add to MetaCart
(Show Context)
Several approaches to collaborative filtering have been studied but seldom have studies been reported for large (several million users and items) and dynamic (the underlying item set is continually changing) settings. In this paper we describe our approach to collaborative filtering for generating personalized recommendations for users of Google News. We generate recommendations using three approaches: collaborative filtering using MinHash clustering, Probabilistic Latent Semantic Indexing (PLSI), and covisitation counts. We combine recommendations from different algorithms using a linear model. Our approach is content agnostic and consequently domain independent, making it easily adaptable for other applications and languages with minimal effort. This paper will describe our algorithms and system setup in detail, and report results of running the recommendations engine on Google News.
Winnowing: Local Algorithms for Document Fingerprinting
 Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data 2003
, 2003
"... Digital content is for copying: quotation, revision, plagiarism, and file sharing all create copies. Document fingerprinting is concerned with accurately identifying copying, including small partial copies, within large sets of documents. We introduce the class of local document fingerprinting algor ..."
Abstract

Cited by 257 (6 self)
 Add to MetaCart
(Show Context)
Digital content is for copying: quotation, revision, plagiarism, and file sharing all create copies. Document fingerprinting is concerned with accurately identifying copying, including small partial copies, within large sets of documents. We introduce the class of local document fingerprinting algorithms, which seems to capture an essential property of any fingerprinting technique guaranteed to detect copies. We prove a novel lower bound on the performance of any local algorithm. We also develop winnowing, an efficient local fingerprinting algorithm, and show that winnowing’s performance is within 33 % of the lower bound. Finally, we also give experimental results on Web data, and report experience with MOSS, a widelyused plagiarism detection service. 1.
Informed content delivery across adaptive overlay networks
 In Proceedings of ACM SIGCOMM
, 2002
"... Abstract—Overlay networks have emerged as a powerful and highly flexible method for delivering content. We study how to optimize throughput of large transfers across richly connected, adaptive overlay networks, focusing on the potential of collaborative transfers between peers to supplement ongoing ..."
Abstract

Cited by 251 (9 self)
 Add to MetaCart
(Show Context)
Abstract—Overlay networks have emerged as a powerful and highly flexible method for delivering content. We study how to optimize throughput of large transfers across richly connected, adaptive overlay networks, focusing on the potential of collaborative transfers between peers to supplement ongoing downloads. First, we make the case for an erasureresilient encoding of the content. Using the digital fountain encoding approach, end hosts can efficiently reconstruct the original content of size from a subset of any symbols drawn from a large universe of encoding symbols. Such an approach affords reliability and a substantial degree of applicationlevel flexibility, as it seamlessly accommodates connection migration and parallel transfers while providing resilience to packet loss. However, since the sets of encoding symbols acquired by peers during downloads may overlap substantially, care must be taken to enable them to collaborate effectively. Our main contribution is a collection of useful algorithmic tools for efficient summarization and approximate reconciliation of sets of symbols between pairs of collaborating peers, all of which keep message complexity and computation to a minimum. Through simulations and experiments on a prototype implementation, we demonstrate the performance benefits of our informed contentdelivery mechanisms and how they complement existing overlay network architectures. Index Terms—Bloom filter, content delivery, digital fountain, erasure code, minwise sketch, overlay, peertopeer, reconciliation. I.