Results 1 - 10
of
36
Robust and fast similarity search for moving object trajectories
- In SIGMOD
, 2005
"... An important consideration in similarity-based retrieval of moving object trajectories is the definition of a distance function. The existing distance functions are usually sensitive to noise, shifts and scaling of data that commonly occur due to sensor failures, errors in detection techniques, dist ..."
Abstract
-
Cited by 61 (10 self)
- Add to MetaCart
An important consideration in similarity-based retrieval of moving object trajectories is the definition of a distance function. The existing distance functions are usually sensitive to noise, shifts and scaling of data that commonly occur due to sensor failures, errors in detection techniques, disturbance signals, and different sampling rates. Cleaning data to eliminate these is not always possible. In this paper, we introduce a novel distance function, Edit Distance on Real sequence (EDR) which is robust against these data imperfections. Analysis and comparison of EDR with other popular distance
Low-Distortion Embeddings of Finite Metric Spaces
- in Handbook of Discrete and Computational Geometry
, 2004
"... INTRODUCTION An n-point metric space (X; D) can be represented by an n n table specifying the distances. Such tables arise in many diverse areas. For example, consider the following scenario in microbiology: X is a collection of bacterial strains, and for every two strains, one is given their diss ..."
Abstract
-
Cited by 43 (0 self)
- Add to MetaCart
INTRODUCTION An n-point metric space (X; D) can be represented by an n n table specifying the distances. Such tables arise in many diverse areas. For example, consider the following scenario in microbiology: X is a collection of bacterial strains, and for every two strains, one is given their dissimilarity (computed, say, by comparing their DNA). It is dicult to see any structure in a large table of numbers, and so we would like to represent a given metric space in a more comprehensible way. For example, it would be very nice if we could assign to each x 2 X a point f(x) in the plane in such a way that D(x; y) equals the Euclidean distance of f(x) and f(y). Such a representation would allow us to see the structure of the metric space: tight clusters, isolated points, and so on. Another advantage would be that the metric would now be represented by only 2n real numbers, the coordinates of the n points in the plane, instead of numbers as before. Moreover, many quantities concern
Nonembeddability theorems via Fourier analysis
"... Various new nonembeddability results (mainly into L1) are proved via Fourier analysis. In particular, it is shown that the Edit Distance on {0, 1}d has L1 distortion (log d) 12-o(1). We also give new lower bounds on the L1 distortion of flat tori, quotients of the discrete hypercube under group ac ..."
Abstract
-
Cited by 34 (8 self)
- Add to MetaCart
Various new nonembeddability results (mainly into L1) are proved via Fourier analysis. In particular, it is shown that the Edit Distance on {0, 1}d has L1 distortion (log d) 12-o(1). We also give new lower bounds on the L1 distortion of flat tori, quotients of the discrete hypercube under group actions, and the transportation cost (Earthmover) metric.
Approximating edit distance efficiently
- In Proc. FOCS 2004
, 2004
"... Edit distance has been extensively studied for the past several years. Nevertheless, no linear-time algorithm is known to compute the edit distance between two strings, or even to approximate it to within a modest factor. Furthermore, for various natural algorithmic problems such as low-distortion e ..."
Abstract
-
Cited by 26 (5 self)
- Add to MetaCart
Edit distance has been extensively studied for the past several years. Nevertheless, no linear-time algorithm is known to compute the edit distance between two strings, or even to approximate it to within a modest factor. Furthermore, for various natural algorithmic problems such as low-distortion embeddings into normed spaces, approximate nearest-neighbor schemes, and sketching algorithms, known results for the edit distance are rather weak. We develop algorithms that solve gap versions of the edit distance problem: given two strings of length n with the promise that their edit distance is either at most k or greater than ℓ, decide which of the two holds. We present two sketching algorithms for gap versions of edit distance. Our first algorithm solves the k vs. (kn) 2/3 gap problem, using a constant size sketch. A more involved algorithm solves the stronger k vs. ℓ gap problem, where ℓ can be as small as O(k 2)—still with a constant sketch—but works only for strings that are mildly “non-repetitive”. Finally, we develop an n 3/7-approximation quasi-linear time algorithm for edit distance, improving the previous best factor of n 3/4 [5]; if the input strings are assumed to be non-repetitive, then the approximation factor can be strengthened to n 1/3. 1.
IMPROVED LOWER BOUNDS FOR EMBEDDINGS INTO L1
- SIAM J. COMPUT.
, 2009
"... We improve upon recent lower bounds on the minimum distortion of embedding certain finite metric spaces into L1. In particular, we show that for every n ≥ 1, there is an n-point metric space of negative type that requires a distortion of Ω(log log n) for such an embedding, implying the same lower bo ..."
Abstract
-
Cited by 25 (4 self)
- Add to MetaCart
We improve upon recent lower bounds on the minimum distortion of embedding certain finite metric spaces into L1. In particular, we show that for every n ≥ 1, there is an n-point metric space of negative type that requires a distortion of Ω(log log n) for such an embedding, implying the same lower bound on the integrality gap of a well-known semidefinite programming relaxation for sparsest cut. This result builds upon and improves the recent lower bound of (log log n) 1/6−o(1) due to Khot and Vishnoi [The unique games conjecture, integrality gap for cut problems and the embeddability of negative type metrics into l1, in Proceedings of the 46th Annual IEEE Symposium
Lower bounds for embedding edit distance into normed spaces
- In Proc. SODA 2003
, 2003
"... MIT S. Raskhodnikova MIT 1 Introduction The edit distance (also called Levenshtein metric) between two strings is the minimum number of operations (insertions, deletions and character substitutions) needed to transform one string into another. This distance is of key importance in computational biol ..."
Abstract
-
Cited by 23 (1 self)
- Add to MetaCart
MIT S. Raskhodnikova MIT 1 Introduction The edit distance (also called Levenshtein metric) between two strings is the minimum number of operations (insertions, deletions and character substitutions) needed to transform one string into another. This distance is of key importance in computational biology, as well as text processing and other areas. Algorithms for problems involving this metric have been extensively investigated. In particular, the quadratic-time dynamic programming algorithm for computing the edit distance between two strings is one of the most investigated and used algorithms in computational biology. Recently, a new approach to problems involving edit distance has been proposed. Its basic component is construction of a mapping f (called an embedding), which maps any string s into a vector f (s) 2!
Low distortion embeddings for edit distance
- In Proceedings of the Symposium on Theory of Computing
, 2005
"... We show that {0, 1} d endowed with edit distance embeds into ℓ1 with distortion 2 O( √ log d log log d). We further show efficient implementations of the embedding that yield solutions to various computational problems involving edit distance. These include sketching, communication complexity, neare ..."
Abstract
-
Cited by 17 (1 self)
- Add to MetaCart
We show that {0, 1} d endowed with edit distance embeds into ℓ1 with distortion 2 O( √ log d log log d). We further show efficient implementations of the embedding that yield solutions to various computational problems involving edit distance. These include sketching, communication complexity, nearest neighbor search. For all these problems, we improve upon previous bounds. 1
T.: Traffic aggregation for malware detection
, 2007
"... Abstract. Stealthy malware, such as botnets and spyware, are hard to detect because their activities are subtle and do not disrupt the network, in contrast to DoS attacks and aggressive worms. Stealthy malware, however, does communicate to exfiltrate data to the attacker, to receive the attacker’s c ..."
Abstract
-
Cited by 17 (1 self)
- Add to MetaCart
Abstract. Stealthy malware, such as botnets and spyware, are hard to detect because their activities are subtle and do not disrupt the network, in contrast to DoS attacks and aggressive worms. Stealthy malware, however, does communicate to exfiltrate data to the attacker, to receive the attacker’s commands, or to carry out those commands. Moreover, since malware rarely infiltrates only a single host in a large enterprise, these communications should emerge from multiple hosts within coarse temporal proximity to one another. In this paper, we describe a system called TĀMD (pronounced “tamed”) with which an enterprise can identify candidate groups of infected computers within its network. TĀMD accomplishes this by finding new communication “aggregates ” involving multiple internal hosts, i.e., communication flows that share common characteristics. We describe characteristics for defining aggregates—including flows that communicate with the same external network, that share similar payload, and/or that involve internal hosts with similar software platforms—and justify their use in finding infected hosts. We also detail efficient algorithms employed by TĀMD for identifying such aggregates, and demonstrate a particular configuration of TĀMD that identifies new infections for multiple bot and spyware examples, within traces of traffic recorded at the edge of a university network. This is achieved even when the number of infected hosts comprise only about 0.0097 % of all internal hosts in the network. 1
Minimum Common String Partition Problem: Hardness and Approximations
- Journal of Combinatorics
, 2004
"... String comparison is a fundamental problem in computer science, with applications in areas such as computational biology, text processing or compression. In this paper we address the minimum common string partition problem, a string comparison problem with tight connection to the problem of sorti ..."
Abstract
-
Cited by 15 (5 self)
- Add to MetaCart
String comparison is a fundamental problem in computer science, with applications in areas such as computational biology, text processing or compression. In this paper we address the minimum common string partition problem, a string comparison problem with tight connection to the problem of sorting by reversals with duplicates, a key problem in genome rearrangement.

