Results 1  10
of
12
Sharp Tractability Borderlines for Finding Connected Motifs in VertexColored Graphs
 34TH INTERNATIONAL COLLOQUIUM ON AUTOMATA, LANGUAGES AND PROGRAMMING (ICALP 2007), WROCLAW: POLAND
, 2007
"... We study the problem of finding occurrences of motifs in vertexcolored graphs, where a motif is a multiset of colors, and an occurrence of a motif is a subset of connected vertices with a bijection between its colors and the colors of the motif. This problem has applications in metabolic network an ..."
Abstract

Cited by 18 (8 self)
 Add to MetaCart
We study the problem of finding occurrences of motifs in vertexcolored graphs, where a motif is a multiset of colors, and an occurrence of a motif is a subset of connected vertices with a bijection between its colors and the colors of the motif. This problem has applications in metabolic network analysis, an important area in bioinformatics. We give two positive results and three negative results that together draw sharp borderlines between tractable and intractable instances of the problem.
Pattern Matching in Compressed Text and Images
, 2001
"... Normally compressed data needs to be decompressed before it is processed, but if the compression has been done in the fight way, it is often possible to search the data without having to decompress it, or at least only partially decompress it. The problem can be divided into lossless and lossy c ..."
Abstract

Cited by 13 (11 self)
 Add to MetaCart
Normally compressed data needs to be decompressed before it is processed, but if the compression has been done in the fight way, it is often possible to search the data without having to decompress it, or at least only partially decompress it. The problem can be divided into lossless and lossy compression methods, and then in each of these cases the pattern matching can be either exact or inexact. Much work has been reported in the literature on techniques for all of these cases, including algorithms that are suitable for pattern matching for various compression methods, and compression methods designed specifically for pattern matching. This work is surveyed in this paper. The paper also exposes the important relationship between pattern matching and compression, and proposes some performance measures for compressed pattern matching algorithms. Ideas and directions for future work are also described.
Pattern discovery and the algorithmics of surprise
 Proceedings of the NATO ASI on Arti Intelligence and Heuristic Methods for Bioinformatics
, 2003
"... ..."
(Show Context)
String Pattern Matching For A Deluge Survival Kit
, 2000
"... String Pattern Matching concerns itself with algorithmic and combinatorial issues related to matching and searching on linearly arranged sequences of symbols, arguably the simplest possible discrete structures. As unprecedented volumes of sequence data are amassed, disseminated and shared at an incr ..."
Abstract

Cited by 6 (1 self)
 Add to MetaCart
String Pattern Matching concerns itself with algorithmic and combinatorial issues related to matching and searching on linearly arranged sequences of symbols, arguably the simplest possible discrete structures. As unprecedented volumes of sequence data are amassed, disseminated and shared at an increasing pace, effective access to, and manipulation of such data depend crucially on the efficiency with which strings are structured, compressed, transmitted, stored, searched and retrieved. This paper samples from this perspective, and with the authors' own bias, a rich arsenal of ideas and techniques developed in more than three decades of history.
P.: Discovering Flow Anomalies: A SWEET Approach
 In: University of Minnesota, MN, Technical Report
, 2009
"... Given a percentagethreshold and readings from a pair of consecutive upstream and downstream sensors, flow anomaly discovery identifies dominant time intervals where the fraction of time instants of significantly mismatched sensor readings exceed the given percentagethreshold. Discovering flow ano ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
(Show Context)
Given a percentagethreshold and readings from a pair of consecutive upstream and downstream sensors, flow anomaly discovery identifies dominant time intervals where the fraction of time instants of significantly mismatched sensor readings exceed the given percentagethreshold. Discovering flow anomalies (FA) is an important problem due to applications such as environmental flow monitoring networks and early warning detection systems for water quality problems. However, mining FAs is computationally expensive because of the large (potentially infinite) number of time instants of measurement and potentially long delays due to stagnant (e.g. lakes) or slow moving (e.g. wetland) water bodies between consecutive sensors. Traditional outlier detection methods (e.g. ttest) are suited for detecting transient FAs (i.e., time instants of significant mismatches across consecutive sensors) and cannot detect persistent FAs (i.e., long variable timewindows with a high fraction of time instant transient FAs) due to a lack of a predefined window size. In contrast, we propose a Smart Window Enumeration and Evaluation of persistenceThresholds (SWEET) method to efficiently explore the search space of all possible window lengths. Computation overhead is brought down significantly by restricting the start and end points of a window to coincide with transient FAs, using a smart counter and efficient pruning techniques. Analytical evaluation show that the proposed method is correct and complete. Experimental evaluation using synthetic and real datasets shows our proposed approach outperforms Naïve alternatives. 1.
Discovering Teleconnected Flow Anomalies: A Relationship Analysis of Dynamic Neighborhoods (RAD) Approach
"... Abstract. Given a collection of sensors monitoring a flow network, the problem of discovering teleconnected flow anomalies aims to identify strongly connected pairs of events (e.g., introduction of a contaminant and its removal from a river). The ability to mine teleconnected flow anomalies is impor ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
(Show Context)
Abstract. Given a collection of sensors monitoring a flow network, the problem of discovering teleconnected flow anomalies aims to identify strongly connected pairs of events (e.g., introduction of a contaminant and its removal from a river). The ability to mine teleconnected flow anomalies is important for applications related to environmental science, video surveillance, and transportation systems. However, this problem is computationally hard because of the large number of time instants of measurement, sensors, and locations. This paper characterizes the computational structure in terms of three critical tasks, (1) detection of flow anomaly events, (2) identification of candidate pairs of events, and (3) evaluation of candidate pairs for possible teleconnection. The first task was addressed in our recent work. In this paper, we propose a RAD (Relationship Analysis of spatiotemporal Dynamic neighborhoods) approach for steps 2 and 3 to discover teleconnected flow anomalies. Computational overhead is brought down significantly by utilizing our proposed spatiotemporal dynamic neighborhood model as an index and a pruning strategy. We prove correctness and completeness for the proposed approaches. We also experimentally show the efficacy of our proposed methods using both synthetic and real datasets. 1
Graduate Program in Water Resources Science,
"... Given a percentagethreshold and readings from a pair of consecutive upstream and downstream sensors, flow anomaly discovery identifies dominant time intervals where the fraction of time instants of significantly mismatched sensor readings exceed the given percentagethreshold. Discovering flow ano ..."
Abstract
 Add to MetaCart
(Show Context)
Given a percentagethreshold and readings from a pair of consecutive upstream and downstream sensors, flow anomaly discovery identifies dominant time intervals where the fraction of time instants of significantly mismatched sensor readings exceed the given percentagethreshold. Discovering flow anomalies (FA) is an important problem in environmental flow monitoring networks and early warning detection systems for water quality problems. However, mining FAs is computationally expensive because of the large (potentially infinite) number of time instants of measurement and potentially long delays due to stagnant (e.g. lakes) or slow moving (e.g. wetland) water bodies between consecutive sensors. Traditional outlier detection methods (e.g. ttest) are suited for detecting transient FAs (i.e., time instants of significant mismatches across consecutive sensors) and cannot detect persistent FAs (i.e., long variable timewindows with a high fraction of time instant transient FAs) due to a lack of a predefined window size. In contrast, we propose a Smart Window Enumeration and Evaluation of persistenceThresholds (SWEET) method to efficiently explore the search space of all possible window lengths. Computation overhead is brought down significantly by restricting the start and end points of a window to coincide with transient FAs, using a smart counter and efficient pruning techniques. Experimental evaluation using a real dataset shows our proposed approach outperforms Naïve alternatives. 1.
A Fast Algorithm for the Inexact Characteristic StringProblem
, 2003
"... Abstract We present a new algorithm to solve the INEXACT CHARACTERISTIC STRING PROBLEM using Hamming distance instead of Levenshtein distance as a measure. We embedour new algorithm and the previously known algorithm for Levenshtein distance in a common framework which reveals an additional improvem ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract We present a new algorithm to solve the INEXACT CHARACTERISTIC STRING PROBLEM using Hamming distance instead of Levenshtein distance as a measure. We embedour new algorithm and the previously known algorithm for Levenshtein distance in a common framework which reveals an additional improvement to the Levenshtein distance algorithm. The I NEXACT CHARACTERISTIC STRING PROBLEM can thus be solved in time O(jjT jj + l \Delta jjS n T jj) for Hamming distance and in time O(jjT jj + k \Delta l \Delta jjS n T jj) forLevenshtein distance, where S ` \Sigma
Synthesizing Arbitrary Genomes
"... Suppose a researcher wants a long arbitrary sequence of nucleotides and asks a lab to synthesize it. Oligonucleotides are generally of length less than 100, so it is necessary to resort to WatsonCrick pairing and ligation for longer strands. Some of the longest strands so far generated have had 1 ..."
Abstract
 Add to MetaCart
Suppose a researcher wants a long arbitrary sequence of nucleotides and asks a lab to synthesize it. Oligonucleotides are generally of length less than 100, so it is necessary to resort to WatsonCrick pairing and ligation for longer strands. Some of the longest strands so far generated have had 15,000 bases, but those strands weren't arbitrary since they were specially designed for the purpose of DNA computing. Our algorithm produces a recipe for making a length n strand of arbitrary DNA. The algorithm requires an expected computation time of O(n 2 ), though its worst case time is O(n 4 ). Its expected total laboratory time is O(n), assuming that an arbitrary number of oligonucleotides can hybridize in constant time. We illustrate its application on long sequences from Human Chromosome 7 and random sequences of length 10 million. Our algorithm requires the invention of some laboratory techniques, since it requires the use of special enzymes, techniques to prevent shearin...
A Fast Algorithm for the Inexact Characteristic String Problem
, 2003
"... We present a new algorithm to solve the �CHARACTERISTIC STRING PROBLEM using Hamming distance instead of Levenshtein distance as a measure. We embed our new algorithm and the previously known algorithm for Levenshtein distance in a common framework which reveals an additional improvement to the Lev ..."
Abstract
 Add to MetaCart
(Show Context)
We present a new algorithm to solve the �CHARACTERISTIC STRING PROBLEM using Hamming distance instead of Levenshtein distance as a measure. We embed our new algorithm and the previously known algorithm for Levenshtein distance in a common framework which reveals an additional improvement to the Levenshtein distance algorithm. The �CHARACTERISTIC STRING PROBLEM can thus be solved in time � � � � � � �� for Hamming distance and in time � � � � � � � � � for Levenshtein distance, where � , ( � � �) is the target set, and is the length of a shortest string in. The �CHARACTERISTIC STRING PROBLEM has applications in probe and primer design. Both algorithms need to solve the COMMON SUBSTRING PROBLEM for more than two strings. We present an improved algorithm for this problem being simpler and faster in practice by a constant factor than the previous algorithm.