Results 1 - 10
of
34
Approximation Algorithms for Projective Clustering
- Proceedings of the ACM SIGMOD International Conference on Management of data, Philadelphia
, 2000
"... We consider the following two instances of the projective clustering problem: Given a set S of n points in R d and an integer k ? 0; cover S by k hyper-strips (resp. hyper-cylinders) so that the maximum width of a hyper-strip (resp., the maximum diameter of a hyper-cylinder) is minimized. Let w ..."
Abstract
-
Cited by 196 (14 self)
- Add to MetaCart
We consider the following two instances of the projective clustering problem: Given a set S of n points in R d and an integer k ? 0; cover S by k hyper-strips (resp. hyper-cylinders) so that the maximum width of a hyper-strip (resp., the maximum diameter of a hyper-cylinder) is minimized. Let w be the smallest value so that S can be covered by k hyper-strips (resp. hyper-cylinders), each of width (resp. diameter) at most w : In the plane, the two problems are equivalent. It is NP-Hard to compute k planar strips of width even at most Cw ; for any constant C ? 0 [50]. This paper contains four main results related to projective clustering: (i) For d = 2, we present a randomized algorithm that computes O(k log k) strips of width at most 6w that cover S. Its expected running time is O(nk 2 log 4 n) if k 2 log k n; it also works for larger values of k, but then the expected running time is O(n 2=3 k 8=3 log 4 n). We also propose another algorithm that computes a c...
Efficient Search for Approximate Nearest Neighbor in High Dimensional Spaces
, 1998
"... We address the problem of designing data structures that allow efficient search for approximate nearest neighbors. More specifically, given a database consisting of a set of vectors in some high dimensional Euclidean space, we want to construct a space-efficient data structure that would allow us to ..."
Abstract
-
Cited by 173 (9 self)
- Add to MetaCart
We address the problem of designing data structures that allow efficient search for approximate nearest neighbors. More specifically, given a database consisting of a set of vectors in some high dimensional Euclidean space, we want to construct a space-efficient data structure that would allow us to search, given a query vector, for the closest or nearly closest vector in the database. We also address this problem when distances are measured by the L 1 norm, and in the Hamming cube. Significantly improving and extending recent results of Kleinberg, we construct data structures whose size is polynomial in the size of the database, and search algorithms that run in time nearly linear or nearly quadratic in the dimension (depending on the case; the extra factors are polylogarithmic in the size of the database). Computer Science Department, Technion --- IIT, Haifa 32000, Israel. Email: eyalk@cs.technion.ac.il y Bell Communications Research, MCC-1C365B, 445 South Street, Morristown, NJ ...
Reductions Among High Dimensional Proximity Problems
, 2000
"... We present improved running times for a wide range of approximate high dimensional proximity problems. We obtain subquadratic running time for each of these problems. These improved running times are obtained by reduction to Nearest Neighbour queries. The problems we consider in this paper are Ap ..."
Abstract
-
Cited by 26 (4 self)
- Add to MetaCart
We present improved running times for a wide range of approximate high dimensional proximity problems. We obtain subquadratic running time for each of these problems. These improved running times are obtained by reduction to Nearest Neighbour queries. The problems we consider in this paper are Approximate Diameter, Approximate Furthest Neighbours, Approximate Discrete Center, Approximate Line Center, Approximate Metric Facility Location, Approximate Bottleneck Matching, and Approximate Minimum Weight Matching. University of Southern California. Email: agoel@cs.usc.edu . y Stanford University. Email: indyk@cs.stanford.edu . z University of Iowa. Email: kvaradar@cs.uiowa.edu . 0 Problem Ref Approx. Time Comments Diameter [10] p 3 O(dn) [12] 1 + ffl O(dn log n + n 2 ) [2] 1 + ffl ~ O(n 2\GammaO(ffl 2 ) + dn) [18] 1 + ffl ~ O(n 1+1=(1+ffl=6) + dn) here 1 + ffl ~ O(n 1+1=(1+ffl) + dn) ~ O(n) (1 + ffl)-NNS queries here p 2 ~ O(dn) see Section 3 for some e...
CloudAV: N-Version Antivirus in the Network Cloud
"... Antivirus software is one of the most widely used tools for detecting and stopping malicious and unwanted files. However, the long term effectiveness of traditional hostbased antivirus is questionable. Antivirus software fails to detect many modern threats and its increasing complexity has resulted ..."
Abstract
-
Cited by 19 (6 self)
- Add to MetaCart
Antivirus software is one of the most widely used tools for detecting and stopping malicious and unwanted files. However, the long term effectiveness of traditional hostbased antivirus is questionable. Antivirus software fails to detect many modern threats and its increasing complexity has resulted in vulnerabilities that are being exploited by malware. This paper advocates a new model for malware detection on end hosts based on providing antivirus as an in-cloud network service. This model enables identification of malicious and unwanted software by multiple, heterogeneous detection engines in parallel, a technique we term ‘N-version protection’. This approach provides several important benefits including better detection of malicious software, enhanced forensics capabilities, retrospective detection, and improved deployability and management. To explore this idea we construct and deploy a production quality in-cloud antivirus system called CloudAV. CloudAV includes a lightweight, cross-platform host agent and a network service with ten antivirus engines and two behavioral detection engines. We evaluate the performance, scalability, and efficacy of the system using data from a real-world deployment lasting more than six months and a database of 7220 malware samples covering a one year period. Using this dataset we find that CloudAV provides 35% better detection coverage against recent threats compared to a single antivirus engine and a 98 % detection rate across the full dataset. We show that the average length of time to detect new threats by an antivirus engine is 48 days and that retrospective detection can greatly minimize the impact of this delay. Finally, we relate two case studies demonstrating how the forensics capabilities of CloudAV were used by operators during the deployment. 1
Entropy based Nearest Neighbor Search in High Dimensions
, 2005
"... In this paper we study the problem of finding the approximate nearest neighbor of a query point in the high dimensional space, focusing on the Euclidean space. The earlier approaches use locality-preserving hash functions (that tend to map nearby points to the same value) to construct several hash t ..."
Abstract
-
Cited by 19 (5 self)
- Add to MetaCart
In this paper we study the problem of finding the approximate nearest neighbor of a query point in the high dimensional space, focusing on the Euclidean space. The earlier approaches use locality-preserving hash functions (that tend to map nearby points to the same value) to construct several hash tables to ensure that the query point hashes to the same bucket as its nearest neighbor in at least one table. Our approach is different – we use one (or a few) hash table and hash several randomly chosen points in the neighborhood of the query point showing that at least one of them will hash to the bucket containing its nearest neighbor. We show that the number of randomly chosen points in the neighborhood of the query point q required depends on the entropy of the hash value h(p) of a random point p at the same distance from q at its nearest neighbor, given q and the locality preserving hash function h chosen randomly from the hash family. Precisely, we show that if the entropy I(h(p)|q, h) = M and g is a bound on the probability that two far-off points will hash to the same bucket, then we can find the approximate nearest neighbor in O(nρ) time and near linear Õ(n) space where ρ = M/log(1/g). Alternatively we can build a data structure of size Õ(n1/(1−ρ) ) to answer queries in Õ(d) time. By applying this analysis to the locality preserving hash functions in [17, 21, 6] and adjusting the parameters we show that the c nearest neighbor can be computed in time Õ(nρ) and near linear space where ρ ≈ 2.06/c as c becomes large.
Low Latency Photon Mapping Using Block Hashing
- IN PROCEEDINGS OF THE CONFERENCE ON GRAPHICS HARDWARE 2002
, 2002
"... Photon mapping is useful in the acceleration of global illumination and caustic effects computed by path tracing. For hardware accelerated rendering, photon maps would be especially useful for simulating caustic lighting effects on non-Lambertian surfaces. For this to be possible, an efficient hardw ..."
Abstract
-
Cited by 18 (1 self)
- Add to MetaCart
Photon mapping is useful in the acceleration of global illumination and caustic effects computed by path tracing. For hardware accelerated rendering, photon maps would be especially useful for simulating caustic lighting effects on non-Lambertian surfaces. For this to be possible, an efficient hardware algorithm for the computation of the k nearest neighbours to a sample point is required. Existing
On Multi-Dimensional Hilbert Indexings
- Theory of Computing Systems
, 1998
"... Indexing schemes for grids based on space-filling curves (e.g., Hilbert indexings) find applications in numerous fields, ranging from parallel processing over data structures to image processing. Because of an increasing interest in discrete multi-dimensional spaces, indexing schemes for them hav ..."
Abstract
-
Cited by 13 (1 self)
- Add to MetaCart
Indexing schemes for grids based on space-filling curves (e.g., Hilbert indexings) find applications in numerous fields, ranging from parallel processing over data structures to image processing. Because of an increasing interest in discrete multi-dimensional spaces, indexing schemes for them have won considerable interest. Hilbert curves are the most simple and popular space-filling indexing scheme. We extend the concept of curves with Hilbert property to arbitrary dimensions and present first results concerning their structural analysis that also simplify their applicability. We define and analyze in a precise mathematical way r-dimensional Hilbert indexings for arbitrary r 2. Moreover, we generalize and simplify previous work and clarify the concept of Hilbert curves for multi-dimensional grids. As we show, Hilbert indexings can be completely described and analyzed by "generating elements of order 1", thus, in comparison with previous work, reducing their structural comp...
SAXually Explicit Images: Finding Unusual Shapes
- In proceedings of the 2006 IEEE International Conference on Data Mining. Hong Kong. Dec
, 2006
"... Among the visual features of multimedia content, shape is of particular interest because humans can often recognize objects solely on the basis of shape. Over the past three decades, there has been a great deal of research on shape analysis, focusing mostly on shape indexing, clustering, and classif ..."
Abstract
-
Cited by 10 (2 self)
- Add to MetaCart
Among the visual features of multimedia content, shape is of particular interest because humans can often recognize objects solely on the basis of shape. Over the past three decades, there has been a great deal of research on shape analysis, focusing mostly on shape indexing, clustering, and classification. In this work, we introduce the new problem of finding shape discords, the most unusual shapes in a collection. We motivate the problem by considering the utility of shape discords in diverse domains including zoology, anthropology, and medicine. While the brute force search algorithm has quadratic time complexity, we avoid this by using locality-sensitive hashing to estimate similarity between shapes which enables us to reorder the search more efficiently. An extensive experimental evaluation demonstrates that our approach can speed up computation by three to four orders of magnitude.
Copy Detection for Intellectual Property Protection of VLSI Designs
- of VLSI design.” 36th IEEE/ACM International Conference on Computer Aided Design
, 1999
"... We give the first study of copy detection techniques for VLSI CAD applications; these techniques are complementary to previous watermarking-based IP protection methods in finding and proving improper use of design IP. After reviewing related literature (notably in the text processing domain), we pro ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
We give the first study of copy detection techniques for VLSI CAD applications; these techniques are complementary to previous watermarking-based IP protection methods in finding and proving improper use of design IP. After reviewing related literature (notably in the text processing domain), we propose a generic methodology for copy detection based on determining basic elements within structural representations of solutions (IPs), calculating (context-independent) signatures for such elements, and performing fast comparisons to identify potential violators of IP rights. We give example implementations of this methodology in the domains of scheduling, graph coloring and gate-level layout; experimental results show the effectiveness of our copy detection schemes as well as the low overhead of implementation. We remark on open research areas, notably the potentially deep and complementary interaction between watermarking and copy detection. 1 Introduction With more functionality integr...

