Results 1  10
of
82
On the optimality of the dimensionality reduction method
 in Proc. 47th IEEE Symposium on Foundations of Computer Science (FOCS
"... We investigate the optimality of (1+ɛ)approximation algorithms obtained via the dimensionality reduction method. We show that: • Any data structure for the (1 + ɛ)approximate nearest neighbor problem in Hamming space, which uses constant number of probes to answer each query, must use n Ω(1/ɛ2) sp ..."
Abstract

Cited by 34 (5 self)
 Add to MetaCart
(Show Context)
We investigate the optimality of (1+ɛ)approximation algorithms obtained via the dimensionality reduction method. We show that: • Any data structure for the (1 + ɛ)approximate nearest neighbor problem in Hamming space, which uses constant number of probes to answer each query, must use n Ω(1/ɛ2) space. • Any algorithm for the (1+ɛ)approximate closest substring problem must run in time exponential in 1/ɛ 2−γ for any γ> 0 (unless 3SAT can be solved in subexponential time) Both lower bounds are (essentially) tight. 1.
Peeping Tom in the Neighborhood: Keystroke Eavesdropping on MultiUser Systems
"... A multiuser system usually involves a large amount of information shared among its users. The security implications of such information can never be underestimated. In this paper, we present a new attack that allows a malicious user to eavesdrop on other users ’ keystrokes using such information. O ..."
Abstract

Cited by 26 (5 self)
 Add to MetaCart
(Show Context)
A multiuser system usually involves a large amount of information shared among its users. The security implications of such information can never be underestimated. In this paper, we present a new attack that allows a malicious user to eavesdrop on other users ’ keystrokes using such information. Our attack takes advantage of the stack information of a process disclosed by its virtual file within procfs, the process file system supported by Linux. We show that on a multicore system, the ESP of a process when it is making system calls can be effectively sampled by a “shadow ” program that continuously reads the public statistical information of the process. Such a sampling is shown to be reliable even in the presence of multiple users, when the system is under a realistic workload. From the ESP content, a keystroke event can be identified if they trigger system calls. As a result, we can accurately determine interkeystroke timings and launch a timing attack to infer the characters the victim entered. We developed techniques for automatically analyzing an application’s binary executable to extract the ESP pattern that fingerprints a keystroke event. The occurrences of such a pattern are identified from an ESP trace the shadow program records from the application’s runtime to calculate timings. These timings are further analyzed using a Hidden Markov Model and other public information related to the victim on a multiuser system. Our experimental study demonstrates that our attack greatly facilitates password cracking and also works very well on recognizing English words. 1
Scene detection in videos using shot clustering and sequence alignment
 IEEE Transactions on Multimedia
, 2009
"... Abstract—Video indexing requires the efficient segmentation of video into scenes. The video is first segmented into shots and a set of keyframes is extracted for each shot. Typical scene detection algorithms incorporate time distance in a shot similarity metric. In the method we propose, to overco ..."
Abstract

Cited by 18 (0 self)
 Add to MetaCart
(Show Context)
Abstract—Video indexing requires the efficient segmentation of video into scenes. The video is first segmented into shots and a set of keyframes is extracted for each shot. Typical scene detection algorithms incorporate time distance in a shot similarity metric. In the method we propose, to overcome the difficulty of having prior knowledge of the scene duration, the shots are clustered into groups based only on their visual similarity and a label is assigned to each shot according to the group that it belongs to. Then, a sequence alignment algorithm is applied to detect when the pattern of shot labels changes, providing the final scene segmentation result. In this way shot similarity is computed based only on visual features, while ordering of shots is taken into account during sequence alignment. To cluster the shots into groups we propose an improved spectral clustering method that both estimates the number of clusters and employs the fast global kmeans algorithm in the clustering stage after the eigenvector computation of the similarity matrix. The same spectral clustering method is applied to extract the keyframes of each shot and numerical experiments indicate that the content of each shot is efficiently summarized using the method we propose herein. Experiments on TVseries and movies also indicate that the proposed scene detection method accurately detects most of the scene boundaries while preserving a good tradeoff between recall and precision. Index Terms—Global kmeans, keyframes, scene detection, sequence alignment.
Efficient algorithms for substring near neighbor problem
 in Proc. 17th Annu. ACMSIAM Sympos. Discrete Algorithms
"... In this paper we consider the problem of finding the approximate nearest neighbor when the data set points are the substrings of a given text T. Specifically, for a string T of length n, we present a data structure which does the following: given a pattern P, if there is a substring of T within the ..."
Abstract

Cited by 15 (3 self)
 Add to MetaCart
In this paper we consider the problem of finding the approximate nearest neighbor when the data set points are the substrings of a given text T. Specifically, for a string T of length n, we present a data structure which does the following: given a pattern P, if there is a substring of T within the distance R from P, it reports a (possibly different) substring of T within distance cR from P. The length of the pattern P, denoted by m, is not known in advance. For the case where the distances are measured using the Hamming distance, we present a data structure which uses Õ(n1+1/c) space1 and with Õ n1/c +mno(1) query time. This essentially matches the earlier bounds of [Ind98], which assumed that the pattern length m is fixed in advance. In addition, our data structure can be constructed in time Õ n1+1/c + n1+o(1)M1/3, whereM is an upper bound for m. This essentially matches the preprocessing bound of [Ind98] as long as the term Õ n1+1/c dominates the running time, which is the case when, e.g., c < 3. We also extend our results to the case where the distances are measured according to the l1 distance. The query time and the space bound are essentially the same, while the preprocessing time becomes Õ n1+1/c + n1+o(1)M2/3
The $Calculus Process Algebra for Problem Solving and Its Support for Bioinformatics
 Proc. 2nd Indian Intern. Conf. on Artificial Intelligence IICAI'95
, 2005
"... In this paper a new technique for the solutions of hard computational problems in bioinformatics is investigated. This is the $calculus process algebra for problem solving that applies the cost performance measures to converge to optimal solutions with minimal problem solving costs. We demonstrat ..."
Abstract

Cited by 5 (3 self)
 Add to MetaCart
(Show Context)
In this paper a new technique for the solutions of hard computational problems in bioinformatics is investigated. This is the $calculus process algebra for problem solving that applies the cost performance measures to converge to optimal solutions with minimal problem solving costs. We demonstrate that the $calculus generic search method, called the kΩoptimization, can be used to solve gene finding and sequence alignment problems. The solutions can be either precise or approximate by applying the $calculus optimization or total optimization.
On parameterized intractability: hardness and completeness
, 2007
"... We study the theory and techniques developed in the research of parameterized intractability, emphasizing on parameterized hardness and completeness that imply (stronger) computational lower bounds for natural computational problems. Moreover, the fundamentals of the structural properties in paramet ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
We study the theory and techniques developed in the research of parameterized intractability, emphasizing on parameterized hardness and completeness that imply (stronger) computational lower bounds for natural computational problems. Moreover, the fundamentals of the structural properties in parameterized complexity theory, relationships to classical complexity theory and more recent developments in the area are also introduced.
Parameterized Complexity and Biopolymer Sequence Comparison
, 2007
"... The paper surveys parameterized algorithms and complexities for computational tasks on biopolymer sequences, including the problems of longest common subsequence, shortest common supersequence, pairwise sequence alignment, multiple sequencing alignment, structure–sequence alignment and structure–str ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
The paper surveys parameterized algorithms and complexities for computational tasks on biopolymer sequences, including the problems of longest common subsequence, shortest common supersequence, pairwise sequence alignment, multiple sequencing alignment, structure–sequence alignment and structure–structure alignment. Algorithm techniques, built on the structuralunit level as well as on the residue level, are discussed.
Motif Discovery in Physiological Datasets: A Methodology for Inferring Predictive Elements
"... In this article, we propose a methodology for identifying predictive physiological patterns in the absence of prior knowledge. We use the principle of conservation to identify activity that consistently precedes an outcome in patients, and describe a twostage process that allows us to efficiently s ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
(Show Context)
In this article, we propose a methodology for identifying predictive physiological patterns in the absence of prior knowledge. We use the principle of conservation to identify activity that consistently precedes an outcome in patients, and describe a twostage process that allows us to efficiently search for such patterns in large datasets. This involves first transforming continuous physiological signals from patients into symbolic sequences, and then searching for patterns in these reduced representations that are strongly associated with an outcome. Our strategy of identifying conserved activity that is unlikely to have occurred purely by chance in symbolic data is analogous to the discovery of regulatory motifs in genomic datasets. We build upon existing work in this area, generalizing the notion of a regulatory motif and enhancing current techniques to operate robustly on nongenomic data. We also address two significant considerations associated with motif discovery in general: computational efficiency and robustness in the presence of degeneracy and noise. To deal with these issues, we introduce the concept of active regions and new subsetbased techniques such as a twolayer Gibbs sampling algorithm. These extensions allow for a framework for information inference, where precursors are identified as approximately conserved activity of arbitrary complexity preceding multiple occurrences of an event.
An integrative approach to identifying biologically relevant genes
 In Proceedings of SIAM International Conference on Data Mining (SDM), 2010. 189
"... Gene selection aims at detecting biologically relevant genes to assist biologists ’ research. The cDNA Microarray data used in gene selection is usually “wide”. With more than several thousand genes, but only less than a hundred of samples, many biologically irrelevant genes can gain their statist ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
(Show Context)
Gene selection aims at detecting biologically relevant genes to assist biologists ’ research. The cDNA Microarray data used in gene selection is usually “wide”. With more than several thousand genes, but only less than a hundred of samples, many biologically irrelevant genes can gain their statistical relevance by sheer randomness. Addressing this problem goes beyond what the cDNA Microarray can offer and necessitates the use of additional information. Recent developments in bioinformatics have made various knowledge sources available, such as the KEGG pathway repository and Gene Ontology database. Integrating different types of knowledge could provide more information about genes and samples. In this work, we propose a novel approach to integrate different types of knowledge for identifying biologically relevant genes. The approach converts different types of external knowledge to its internal knowledge, which can be used to rank genes. Upon obtaining the ranking lists, it aggregates them via a probabilistic model and generates a final list. Experimental results from our study on acute lymphoblastic leukemia demonstrate the efficacy of the proposed approach and show that using different types of knowledge together can help detect biologically relevant genes. 1
Level characteristics corresponding to peripheral eigenvalues
, 2008
"... of a nonnegative matrix ..."