Results 1 
7 of
7
Algorithms for Identifying Boolean Networks and Related Biological Networks Based on Matrix Multiplication and Fingerprint Function
 J. Comp. Biol
"... Due to the recent progress of the DNA microarray technology, a large number of gene expression pro � le data are being produced. How to analyze gene expression data is an important topic in computational molecular biology. Several studies have been done using the Boolean network as a model of a gene ..."
Abstract

Cited by 50 (6 self)
 Add to MetaCart
Due to the recent progress of the DNA microarray technology, a large number of gene expression pro � le data are being produced. How to analyze gene expression data is an important topic in computational molecular biology. Several studies have been done using the Boolean network as a model of a genetic network. This paper proposes ef � cient algorithms for identifying Boolean networks of bounded indegree and related biological networks, where identi� cation of a Boolean network can be formalized as a problem of identifying many Boolean functions simultaneously. For the identi � cation of a Boolean network, an 1 time naive algorithm and a simple time algorithm are known, where denotes the number of nodes, denotes the number of examples, and denotes the maximum indegree. This paper presents an improved 2 3 time MonteCarlo type randomized algorithm, where is the exponent of matrix multiplication (currently, 2 376). The algorithm is obtained by combining fast matrix multiplication with the randomized � ngerprint function for string matching. Although the algorithm and its analysis are simple, the result is nontrivial and the technique can be applied to several related problems.
On the use of Regular Expressions for Searching Text
 ACM Transactions on Programming Languages and Systems
, 1995
"... The use of regular expressions to search text is well known and understood as a useful technique. It is then surprising that the standard techniques and tools prove to be of limited use for searching text formatted with SGML or other similar markup languages. Experience with structured text search h ..."
Abstract

Cited by 38 (3 self)
 Add to MetaCart
The use of regular expressions to search text is well known and understood as a useful technique. It is then surprising that the standard techniques and tools prove to be of limited use for searching text formatted with SGML or other similar markup languages. Experience with structured text search has caused us to carefully reexamine the current practice. The generally accepted rule of "leftmost longest match" is an unfortunate choice and is at the root of the difficulties. We instead propose a rule which is semantically cleaner and is incidentally more simple and efficient to implement. This rule is generally applicable to any text search application. 1 Introduction Regular expressions are widely regarded as a precise, succinct notation for specifying a text search, with a straightforward efficient implementation. Many people routinely use regular expressions to specify searches in text editors and with standalone search tools such as the Unix grep utility. A regular expression ...
Efficient detection of unusual words
 J. COMP. BIOL
, 2000
"... Words that are, by some measure, over or underrepresented in the context of larger sequences have been variously implicated in biological functions and mechanisms. In most approaches to such anomaly detections, the words (up to a certain length) are enumerated more or less exhaustively and are indi ..."
Abstract

Cited by 37 (8 self)
 Add to MetaCart
Words that are, by some measure, over or underrepresented in the context of larger sequences have been variously implicated in biological functions and mechanisms. In most approaches to such anomaly detections, the words (up to a certain length) are enumerated more or less exhaustively and are individually checked in terms of observed and expected frequencies, variances, and scores of discrepancy and significance thereof. Here we take the global approach of annotating the suffix tree of a sequence with some such values and scores, having in mind to use it as a collective detector of all unexpected behaviors, or perhaps just as a preliminary filter for words suspicious enough to undergo a more accurate scrutiny. We consider in depth the simple probabilistic model in which sequences are produced by a random source emitting symbols from a known alphabet independently and according to a given distribution. Our main result consists of showing that, within this model, full tree annotations can be carried out in a timeandspace optimal fashion for the mean, variance and some of the adopted measures of significance. This result is achieved by an ad hoc embedding in statistical expressions of the combinatorial structure of the periods of a string. Specifically,
Discovering frequent episodes in sequences (Extended Abstract)
 IN 1ST CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING
, 1995
"... Sequences of events describing the behavior and actions of users or systems can be collected in several domains. In this paper we consider the problem of recognizing frequent episodes in such sequences of events. An episode is de ned to be a collection of events that occur within time intervals of a ..."
Abstract

Cited by 7 (0 self)
 Add to MetaCart
Sequences of events describing the behavior and actions of users or systems can be collected in several domains. In this paper we consider the problem of recognizing frequent episodes in such sequences of events. An episode is de ned to be a collection of events that occur within time intervals of a given size in a given partial order. Once such episodes are known, one can produce rules for describing or predicting the behavior of the sequence. We describe an efficient algorithm for the discovery of all frequent episodes from a given class of episodes, and present experimental results.
Efficient Submatch Addressing for Regular Expressions
, 2001
"... String pattern matching in its different forms is an important topic in theoretical computer science. This thesis concentrates on the problem of regular expression matching with submatch addressing, where the position and extent of the substrings matched by given subexpressions must be provided. The ..."
Abstract

Cited by 6 (0 self)
 Add to MetaCart
String pattern matching in its different forms is an important topic in theoretical computer science. This thesis concentrates on the problem of regular expression matching with submatch addressing, where the position and extent of the substrings matched by given subexpressions must be provided. The algorithms in widespread use at the time either take exponential worstcase time to find a match, can handle only a subset of all regular expressions, or use space proportional to the length of the input string where constant space would suffice. This thesis proposes a new method for solving the submatch addressing problem using nondeterministic finite automata with transitions augmented by copyonwrite update operations. The resulting algorithm makes a single pass over the input string, always using time linearly proportional to the input. Space consumption depends only on the used regular expression, and not on the input string. To the author's knowledge, this is a new result. A prototype of a POSIX.2 compatible regular expression matcher using the algorithm was done. Benchmarking results indicate that the prototype compares favorably against some popular implementations. Furthermore, absence of exponential or polynomial time worst cases makes it possible to use any regular expression without performance problems, which is not the case with previous implementations or algorithms.
The Complexity of Testing Ground Reducibility for Linear Word Rewriting Systems with Variables
 In Proceedings 4th International Workshop on Conditional and Typed Term Rewriting Systems
, 1995
"... In [9] we proved that for a word rewriting system with variables R and a word with variables w, it is undecidable if w is ground reducible by R, that is if all the instances of w obtained by substituting its variables by nonempty words are reducible by R. On the other hand, if R is linear, the ques ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
In [9] we proved that for a word rewriting system with variables R and a word with variables w, it is undecidable if w is ground reducible by R, that is if all the instances of w obtained by substituting its variables by nonempty words are reducible by R. On the other hand, if R is linear, the question is decidable for arbitrary (linear or nonlinear) w. In this paper we futher study the complexity of the above problem and prove that it is coNPcomplete if both R and w are restricted to be linear. The proof is based on the construction of a deterministic finite automaton for the language of words reducible by R. The construction generalizes the wellknown AhoCorasick automaton for string matching against a set of keywords.