Results 1  10
of
12
Sharp tractability borderlines for finding connected motifs in vertexcolored graphs.
 In Proc. ICALP’07, LNCS 4596,
, 2007
"... Abstract. We study the problem of finding occurrences of motifs in vertexcolored graphs, where a motif is a multiset of colors, and an occurrence of a motif is a subset of connected vertices whose multiset of colors equals the motif. This problem has applications in metabolic network analysis, an ..."
Abstract

Cited by 19 (8 self)
 Add to MetaCart
Abstract. We study the problem of finding occurrences of motifs in vertexcolored graphs, where a motif is a multiset of colors, and an occurrence of a motif is a subset of connected vertices whose multiset of colors equals the motif. This problem has applications in metabolic network analysis, an important area in bioinformatics. We give two positive results and three negative results that together draw sharp borderlines between tractable and intractable instances of the problem.
Pattern Matching in Compressed Text and Images
, 2001
"... Normally compressed data needs to be decompressed before it is processed, but if the compression has been done in the fight way, it is often possible to search the data without having to decompress it, or at least only partially decompress it. The problem can be divided into lossless and lossy c ..."
Abstract

Cited by 13 (11 self)
 Add to MetaCart
Normally compressed data needs to be decompressed before it is processed, but if the compression has been done in the fight way, it is often possible to search the data without having to decompress it, or at least only partially decompress it. The problem can be divided into lossless and lossy compression methods, and then in each of these cases the pattern matching can be either exact or inexact. Much work has been reported in the literature on techniques for all of these cases, including algorithms that are suitable for pattern matching for various compression methods, and compression methods designed specifically for pattern matching. This work is surveyed in this paper. The paper also exposes the important relationship between pattern matching and compression, and proposes some performance measures for compressed pattern matching algorithms. Ideas and directions for future work are also described.
Pattern discovery and the algorithmics of surprise
 Proceedings of the NATO ASI on Arti Intelligence and Heuristic Methods for Bioinformatics
, 2003
"... ..."
(Show Context)
String Pattern Matching For A Deluge Survival Kit
, 2000
"... String Pattern Matching concerns itself with algorithmic and combinatorial issues related to matching and searching on linearly arranged sequences of symbols, arguably the simplest possible discrete structures. As unprecedented volumes of sequence data are amassed, disseminated and shared at an incr ..."
Abstract

Cited by 7 (1 self)
 Add to MetaCart
String Pattern Matching concerns itself with algorithmic and combinatorial issues related to matching and searching on linearly arranged sequences of symbols, arguably the simplest possible discrete structures. As unprecedented volumes of sequence data are amassed, disseminated and shared at an increasing pace, effective access to, and manipulation of such data depend crucially on the efficiency with which strings are structured, compressed, transmitted, stored, searched and retrieved. This paper samples from this perspective, and with the authors' own bias, a rich arsenal of ideas and techniques developed in more than three decades of history.
P.: Discovering Flow Anomalies: A SWEET Approach
 In: University of Minnesota, MN, Technical Report
, 2009
"... Given a percentagethreshold and readings from a pair of consecutive upstream and downstream sensors, flow anomaly discovery identifies dominant time intervals where the fraction of time instants of significantly mismatched sensor readings exceed the given percentagethreshold. Discovering flow ano ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
(Show Context)
Given a percentagethreshold and readings from a pair of consecutive upstream and downstream sensors, flow anomaly discovery identifies dominant time intervals where the fraction of time instants of significantly mismatched sensor readings exceed the given percentagethreshold. Discovering flow anomalies (FA) is an important problem due to applications such as environmental flow monitoring networks and early warning detection systems for water quality problems. However, mining FAs is computationally expensive because of the large (potentially infinite) number of time instants of measurement and potentially long delays due to stagnant (e.g. lakes) or slow moving (e.g. wetland) water bodies between consecutive sensors. Traditional outlier detection methods (e.g. ttest) are suited for detecting transient FAs (i.e., time instants of significant mismatches across consecutive sensors) and cannot detect persistent FAs (i.e., long variable timewindows with a high fraction of time instant transient FAs) due to a lack of a predefined window size. In contrast, we propose a Smart Window Enumeration and Evaluation of persistenceThresholds (SWEET) method to efficiently explore the search space of all possible window lengths. Computation overhead is brought down significantly by restricting the start and end points of a window to coincide with transient FAs, using a smart counter and efficient pruning techniques. Analytical evaluation show that the proposed method is correct and complete. Experimental evaluation using synthetic and real datasets shows our proposed approach outperforms Naïve alternatives. 1.
Discovering Teleconnected Flow Anomalies: A Relationship Analysis of Dynamic Neighborhoods (RAD) Approach
"... Abstract. Given a collection of sensors monitoring a flow network, the problem of discovering teleconnected flow anomalies aims to identify strongly connected pairs of events (e.g., introduction of a contaminant and its removal from a river). The ability to mine teleconnected flow anomalies is impor ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
(Show Context)
Abstract. Given a collection of sensors monitoring a flow network, the problem of discovering teleconnected flow anomalies aims to identify strongly connected pairs of events (e.g., introduction of a contaminant and its removal from a river). The ability to mine teleconnected flow anomalies is important for applications related to environmental science, video surveillance, and transportation systems. However, this problem is computationally hard because of the large number of time instants of measurement, sensors, and locations. This paper characterizes the computational structure in terms of three critical tasks, (1) detection of flow anomaly events, (2) identification of candidate pairs of events, and (3) evaluation of candidate pairs for possible teleconnection. The first task was addressed in our recent work. In this paper, we propose a RAD (Relationship Analysis of spatiotemporal Dynamic neighborhoods) approach for steps 2 and 3 to discover teleconnected flow anomalies. Computational overhead is brought down significantly by utilizing our proposed spatiotemporal dynamic neighborhood model as an index and a pruning strategy. We prove correctness and completeness for the proposed approaches. We also experimentally show the efficacy of our proposed methods using both synthetic and real datasets. 1
Department of Mathematical Sciences, Umm AlQura University
"... The string searching problem consists of finding all occurrences of a pattern of m characters in a text of length n. To solve this problem, twostep processing, checking step and skipping step, are required. The motivation of this paper is to design and implement an optimal string searching method t ..."
Abstract
 Add to MetaCart
(Show Context)
The string searching problem consists of finding all occurrences of a pattern of m characters in a text of length n. To solve this problem, twostep processing, checking step and skipping step, are required. The motivation of this paper is to design and implement an optimal string searching method to produce a fast solution. The method has two main sections: the preprocessing phase, and the search phase. The search phase contains the checking step, and the skipping step. To get the optimal comparison order, the method forms the comparison order according to the mismatched result in the checking step while the search proceeds. The method has been implemented as a C++ program and many examples have been tried. It has been used to search texts written in multiple languages (Arabic and English). Quantitative and qualitative comparisons have been made between the method of this paper and other related methods. The comparisons show that the method of this paper is much faster than other related methods.
A Fast Algorithm for the Inexact Characteristic String Problem
, 2003
"... We present a new algorithm to solve the �CHARACTERISTIC STRING PROBLEM using Hamming distance instead of Levenshtein distance as a measure. We embed our new algorithm and the previously known algorithm for Levenshtein distance in a common framework which reveals an additional improvement to the Lev ..."
Abstract
 Add to MetaCart
(Show Context)
We present a new algorithm to solve the �CHARACTERISTIC STRING PROBLEM using Hamming distance instead of Levenshtein distance as a measure. We embed our new algorithm and the previously known algorithm for Levenshtein distance in a common framework which reveals an additional improvement to the Levenshtein distance algorithm. The �CHARACTERISTIC STRING PROBLEM can thus be solved in time � � � � � � �� for Hamming distance and in time � � � � � � � � � for Levenshtein distance, where � , ( � � �) is the target set, and is the length of a shortest string in. The �CHARACTERISTIC STRING PROBLEM has applications in probe and primer design. Both algorithms need to solve the COMMON SUBSTRING PROBLEM for more than two strings. We present an improved algorithm for this problem being simpler and faster in practice by a constant factor than the previous algorithm.
A Fast Algorithm for the Inexact Characteristic StringProblem
, 2003
"... Abstract We present a new algorithm to solve the INEXACT CHARACTERISTIC STRING PROBLEM using Hamming distance instead of Levenshtein distance as a measure. We embedour new algorithm and the previously known algorithm for Levenshtein distance in a common framework which reveals an additional improvem ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract We present a new algorithm to solve the INEXACT CHARACTERISTIC STRING PROBLEM using Hamming distance instead of Levenshtein distance as a measure. We embedour new algorithm and the previously known algorithm for Levenshtein distance in a common framework which reveals an additional improvement to the Levenshtein distance algorithm. The I NEXACT CHARACTERISTIC STRING PROBLEM can thus be solved in time O(jjT jj + l \Delta jjS n T jj) for Hamming distance and in time O(jjT jj + k \Delta l \Delta jjS n T jj) forLevenshtein distance, where S ` \Sigma
Graduate Program in Water Resources Science,
"... Given a percentagethreshold and readings from a pair of consecutive upstream and downstream sensors, flow anomaly discovery identifies dominant time intervals where the fraction of time instants of significantly mismatched sensor readings exceed the given percentagethreshold. Discovering flow ano ..."
Abstract
 Add to MetaCart
(Show Context)
Given a percentagethreshold and readings from a pair of consecutive upstream and downstream sensors, flow anomaly discovery identifies dominant time intervals where the fraction of time instants of significantly mismatched sensor readings exceed the given percentagethreshold. Discovering flow anomalies (FA) is an important problem in environmental flow monitoring networks and early warning detection systems for water quality problems. However, mining FAs is computationally expensive because of the large (potentially infinite) number of time instants of measurement and potentially long delays due to stagnant (e.g. lakes) or slow moving (e.g. wetland) water bodies between consecutive sensors. Traditional outlier detection methods (e.g. ttest) are suited for detecting transient FAs (i.e., time instants of significant mismatches across consecutive sensors) and cannot detect persistent FAs (i.e., long variable timewindows with a high fraction of time instant transient FAs) due to a lack of a predefined window size. In contrast, we propose a Smart Window Enumeration and Evaluation of persistenceThresholds (SWEET) method to efficiently explore the search space of all possible window lengths. Computation overhead is brought down significantly by restricting the start and end points of a window to coincide with transient FAs, using a smart counter and efficient pruning techniques. Experimental evaluation using a real dataset shows our proposed approach outperforms Naïve alternatives. 1.