Results 1  10
of
10
Transducers and repetitions
 Theoretical Computer Science
, 1986
"... Abstract. The factor transducer of a word associates to each of its factors (or subwc~rds) their first occurrence. Optimal bounds on the size of minimal factor transducers together with an algorithm for building them are given. Analogue results and a simple algorithm are given for the case of subseq ..."
Abstract

Cited by 94 (19 self)
 Add to MetaCart
Abstract. The factor transducer of a word associates to each of its factors (or subwc~rds) their first occurrence. Optimal bounds on the size of minimal factor transducers together with an algorithm for building them are given. Analogue results and a simple algorithm are given for the case of subsequential suffix transducers. Algorithms are applied to repetition searching in words. Rl~sum~. Le transducteur des facteurs d'un mot associe a chacun de ses facteurs leur premiere occurrence. On donne des bornes optimales sur la taille du transducteur minimal d'un mot ainsi qu'un algorithme pour sa construction. On donne des r6sultats analogues et un algorithme simple dans le cas du transducteur souss~luentiel des suffixes d'un mot. On donne une application la d6tection de r6p6titions dans les mots. Contents
Complete inverted files for efficient text retrieval and analysis
 Journal of the ACM
, 1987
"... Abstract. Given a finite set of texts S = (wi, *.., wk) over some fixed finite alphabet 2, a complete inverted tile for S is an abstract data type that provides the functionsfind ( which returns the longest prefix of w that occurs (as a subword of a word) in S, freq(w), which returns the number of t ..."
Abstract

Cited by 59 (1 self)
 Add to MetaCart
Abstract. Given a finite set of texts S = (wi, *.., wk) over some fixed finite alphabet 2, a complete inverted tile for S is an abstract data type that provides the functionsfind ( which returns the longest prefix of w that occurs (as a subword of a word) in S, freq(w), which returns the number of times w occurs in S, and locations(w), which returns the set of positions where w occurs in S. A data structure. that implements a complete inverted file for S that occupies linear space and can be built in linear time, using the uniformcost RAM model, is given. Using this data structure, the time for each of the above query functions is optimal. To accomplish this, techniques from the theory of finite automata and the work on suffix trees are used to build a deterministic finite automaton that recognizes the set of all subwords of the set S. This automaton is then annotated with additional information and compacted to facilitate the desired query functions. The result is a data structure that is smaller and more flexible than the s&ix tree.
Feedback shift registers, 2adic span, and combiners with memory
 Journal of Cryptology
, 1997
"... Feedback shift registers with carry operation (FCSR’s) are described, implemented, and analyzed with respect to memory requirements, initial loading, period, and distributional properties of their output sequences. Many parallels with the theory of linear feedback shift registers (LFSR’s) are presen ..."
Abstract

Cited by 50 (7 self)
 Add to MetaCart
Feedback shift registers with carry operation (FCSR’s) are described, implemented, and analyzed with respect to memory requirements, initial loading, period, and distributional properties of their output sequences. Many parallels with the theory of linear feedback shift registers (LFSR’s) are presented, including a synthesis algorithm (analogous to the BerlekampMassey algorithm for LFSR’s) which, for any pseudorandom sequence, constructs the smallest FCSR which will generate the sequence. These techniques are used to attack the summation cipher. This analysis gives a unified approach to the study of pseudorandom sequences, arithmetic codes, combiners with memory, and the MarsagliaZaman random number generator. Possible variations on the FCSR architecture are indicated at the end. Index Terms – Binary sequence, shift register, stream cipher, combiner with memory, cryptanalysis, 2adic numbers, arithmetic code, 1/q sequence, linear span. 1
The ContextTree Weighting Method: Extensions
 IEEE Transactions on Information Theory
, 1994
"... . First we modify the basic (binary) contexttree weighting method such that the past symbols x 1\GammaD ; x 2\GammaD ; \Delta \Delta \Delta ; x 0 are not needed by the encoder and the decoder. Then we describe how to make the context tree depth D infinite, which results in optimal redundancy behavi ..."
Abstract

Cited by 35 (1 self)
 Add to MetaCart
. First we modify the basic (binary) contexttree weighting method such that the past symbols x 1\GammaD ; x 2\GammaD ; \Delta \Delta \Delta ; x 0 are not needed by the encoder and the decoder. Then we describe how to make the context tree depth D infinite, which results in optimal redundancy behavior for all tree sources, while the number of records in the context tree is not larger than 2T \Gamma 1. Here T is the length of the source sequence. For this extended contexttree weighting algorithm we show that with probability one the compression ratio is not larger than the source entropy for source sequence length T ! 1 for stationary and ergodic sources. Keywords Sequential data compression, universal source coding, tree sources, modeling procedure, cumulative redundancy bounds, binary stationary and ergodic sources. 1. Introduction The contexttree weighting method, first presented at the San Antonio ISIT [7], appears to be an efficient implementation for weighting (mixing) the cod...
On Compact Directed Acyclic Word Graphs
 Structures in Logic and Computer Science
, 1997
"... The Directed Acyclic Word Graph (DAWG) is a spaceefficient data structure to treat and analyze repetitions in a text, especially in DNA genomic sequences. Here, we consider the Compact Directed Acyclic Word Graph of a word. We give the first direct algorithm to construct it. It runs in time lin ..."
Abstract

Cited by 18 (3 self)
 Add to MetaCart
The Directed Acyclic Word Graph (DAWG) is a spaceefficient data structure to treat and analyze repetitions in a text, especially in DNA genomic sequences. Here, we consider the Compact Directed Acyclic Word Graph of a word. We give the first direct algorithm to construct it. It runs in time linear in the length of the string on a fixed alphabet. Our implementation requires half the memory space used by DAWGs.
Suffix Trees and String Complexity
 Advances in Cryptology: Proc. of EUROCRYPT, LNCS 658
, 1992
"... Let s = (s 1 ; s 2 ; : : : ; s n ) be a sequence of characters where s i 2 Z p for 1 i n. One measure of the complexity of the sequence s is the length of the shortest feedback shift register that will generate s, which is known as the maximum order complexity of s [17, 18]. We provide a proof th ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
Let s = (s 1 ; s 2 ; : : : ; s n ) be a sequence of characters where s i 2 Z p for 1 i n. One measure of the complexity of the sequence s is the length of the shortest feedback shift register that will generate s, which is known as the maximum order complexity of s [17, 18]. We provide a proof that the expected length of the shortest feedback register to generate a sequence of length n is less than 2 log p n+ o(1), and also give several other statistics of interest for distinguishing random strings. The proof is based on relating the maximum order complexity to a data structure known as a suffix tree. 1 Introduction A common form of stream cipher are the socalled running key ciphers [4, 9] which are deterministic approximations to the one time pad. A running key cipher generates an ultimately periodic sequence s = (s 1 ; s 2 ; : : : ; s n ), s i 2 Z p ; 1 i n, for a given seed or key K. Encryption is performed as with the one time pad, using s as the key stream, but perfect secu...
Efficient Variants of the BackwardOracleMatching Algorithm
"... Abstract. In this article we present two efficient variants of the BOM string matching algorithm which are more efficient and flexible than the original algorithm. We also present bitparallel versions of them obtaining an efficient variant of the BNDM algorithm. Then we compare the newly presented ..."
Abstract

Cited by 3 (3 self)
 Add to MetaCart
Abstract. In this article we present two efficient variants of the BOM string matching algorithm which are more efficient and flexible than the original algorithm. We also present bitparallel versions of them obtaining an efficient variant of the BNDM algorithm. Then we compare the newly presented algorithms with some of the most recent and effective string matching algorithms. It turns out that the new proposed variants are very flexible and achieve very good results, especially in the case of large alphabets.
Factor oracle, Suffix oracle
, 1999
"... We introduce a new automaton on a word p, sequence of letters taken in an alphabet \Sigma, that we call factor oracle. This automaton is acyclic, recognizes at least the factors of p, has m + 1 states and a linear number of transitions. We give an online construction algorithm of the factor orac ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
We introduce a new automaton on a word p, sequence of letters taken in an alphabet \Sigma, that we call factor oracle. This automaton is acyclic, recognizes at least the factors of p, has m + 1 states and a linear number of transitions. We give an online construction algorithm of the factor oracle. The tight links between this structure and the suffix automaton allows us to introduce a second structure : the suffix oracle. We use these two structures in string matching algorithms that we conjecture optimal according to the experimental results. These algorithms are as efficient as the ones that already exist using less memory and being more easy to implement. 1 Introduction A word p is a finite sequence p = p 1 p 2 : : : p m of letters taken in an alphabet \Sigma. We will keep the notation p along this paper to designate the word on which we are working. We want to build an automaton (a) that is acyclic (b) that recognizes at least the factors of p (c) that has the fewer sta...
India.
"... In this study, a new algorithm for the traditional pattern matching problem has been proposed. This algorithm is a modified version of KMP algorithm and using bitwise XOR operation to process two characters (or bytes) in parallel, to speed up the pattern matching process. An additional loop to avoid ..."
Abstract
 Add to MetaCart
In this study, a new algorithm for the traditional pattern matching problem has been proposed. This algorithm is a modified version of KMP algorithm and using bitwise XOR operation to process two characters (or bytes) in parallel, to speed up the pattern matching process. An additional loop to avoid the undesirable comparison(s) also been introduced and let the algorithm to initiate, and continue only the essential comparisons from the required location. As the new algorithm uses the principle of Finite automata which is used by KMP algorithm and Bitwise XOR operation to speed up the character match, it shows some reasonable performance improvement. Also this new algorithm is easy to implement as it doesn't require any additional/complex data structure(s) and suitable for DNA sequence search.