Results 1  10
of
10
Information Theory and Statistics
, 1968
"... Entropy and relative entropy are proposed as features extracted from symbol sequences. Firstly, a proper Iterated Function System is driven by the sequence, producing a fractaMike representation (CSR) with a low computational cost. Then, two entropic measures are applied to the CSR histogram of th ..."
Abstract

Cited by 1146 (0 self)
 Add to MetaCart
Entropy and relative entropy are proposed as features extracted from symbol sequences. Firstly, a proper Iterated Function System is driven by the sequence, producing a fractaMike representation (CSR) with a low computational cost. Then, two entropic measures are applied to the CSR histogram of the CSR and theoretically justified. Examples are included.
Spatial Representation of Symbolic Sequences through Iterative Function Systems
 IEEE Transactions on Systems, Man, and Cybernetics Part A: Systems and Humans
, 1998
"... Jeffrey proposed a graphic representation of DNA sequences using Barnsley's iterative function systems. In spite of further developments in this direction, the proposed graphic representation of DNA sequences has been lacking a rigorous connection between its spatial scaling characteristics and the ..."
Abstract

Cited by 24 (11 self)
 Add to MetaCart
Jeffrey proposed a graphic representation of DNA sequences using Barnsley's iterative function systems. In spite of further developments in this direction, the proposed graphic representation of DNA sequences has been lacking a rigorous connection between its spatial scaling characteristics and the statistical characteristics of the DNA sequences themselves. We 1) generalize Jeffrey's graphic representation to accommodate (possibly infinite) sequences over an arbitrary finite number of symbols, 2) establish a direct correspondence between the statistical characterization of symbolic sequences via R'enyi entropy spectra and the multifractal characteristics (R'enyi generalized dimensions) of the sequences' spatial representations, 3) show that for general symbolic dynamical systems, the multifractal f H  spectra in the sequence space coincide with the f H spectra on spatial sequence representations. Keywords Multifractal theory, Iterative function systems, Chaos game representation...
Constructing FiniteContext Sources From Fractal Representations of Symbolic Sequences
, 1998
"... We propose a novel approach to constructing predictive models on long complex symbolic sequences. The models are constructed by first transforming the training sequence nblock structure into a spatial structure of points in a unit hypercube. The transformation between the symbolic and Euclidean spa ..."
Abstract

Cited by 6 (4 self)
 Add to MetaCart
We propose a novel approach to constructing predictive models on long complex symbolic sequences. The models are constructed by first transforming the training sequence nblock structure into a spatial structure of points in a unit hypercube. The transformation between the symbolic and Euclidean spaces embodies a natural smoothness assumption (nblocks with long common suffices are likely to produce similar continuations) in that the longer is the common suffix shared by any two nblocks, the closer lie their point representations. Finding a set of prediction contexts is then formulated as a resource allocation problem solved by vector quantizing the spatial representation of the training sequence nblock structure. Our predictive models are similar in spirit to variable memory length Markov models (VLMMs). We compare the proposed models with both the classical and variable memory length Markov models on two chaotic symbolic sequences with different levels of subsequence distribution ...
Extracting Finite State Representations from Recurrent Neural Networks trained on Chaotic Symbolic Sequences
 IEEE Transactions on Neural Networks
, 1999
"... While much work has been done in neural based modeling of real valued chaotic time series, little effort has been devoted to address similar problems in the symbolic domain. We investigate the knowledge induction process associated with training recurrent neural networks (RNNs) on single long chaoti ..."
Abstract

Cited by 5 (5 self)
 Add to MetaCart
While much work has been done in neural based modeling of real valued chaotic time series, little effort has been devoted to address similar problems in the symbolic domain. We investigate the knowledge induction process associated with training recurrent neural networks (RNNs) on single long chaotic symbolic sequences. Even though training RNNs to predict the next symbol leaves the standard performance measures such as the mean square error on the network output virtually unchanged, the networks nevertheless do extract a lot of knowledge. We monitor the knowledge extraction process by considering the networks stochastic sources and letting them generate sequences which are then confronted with the training sequence via information theoretic entropy and crossentropy measures. We also study the possibility of reformulating the knowledge gained by RNNs in a compact and easytoanalyze form of finite state stochastic machines. The experiments are performed on two sequences with different...
Universal Sequence Map (USM) of Arbitrary Discrete Sequences
"... For over a decade the idea of representing biological sequences in a continuous coordinate space has maintained its appeal but not been fully realized. The basic idea is that any sequence of symbols may define trajectories in the continuous space conserving all its statistical properties. Ideally, s ..."
Abstract

Cited by 5 (2 self)
 Add to MetaCart
For over a decade the idea of representing biological sequences in a continuous coordinate space has maintained its appeal but not been fully realized. The basic idea is that any sequence of symbols may define trajectories in the continuous space conserving all its statistical properties. Ideally, such a representation would allow scale independent sequence analysis without the context of fixed memory length. A simple example would consist on being able to infer the homology between two sequences solely by comparing the coordinates of any two homologous units.
Dynamics and topographic organization in recursive selforganizing map
 NEURAL COMPUTATION
, 2006
"... Recently, there has been an outburst of interest in extending topographic maps of vectorial data to more general data structures, such as sequences or trees. However, at present, there is no general consensus as to how best to process sequences using topographic maps and this topic remains a very a ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
Recently, there has been an outburst of interest in extending topographic maps of vectorial data to more general data structures, such as sequences or trees. However, at present, there is no general consensus as to how best to process sequences using topographic maps and this topic remains a very active focus of current neurocomputational research. The representational capabilities and internal representations of the models are not well understood. We rigorously analyze a generalization of the SelfOrganizing Map (SOM) for processing sequential data, Recursive SOM (RecSOM) (Voegtlin, 2002), as a nonautonomous dynamical system consisting of a set of fixed input maps. We argue that contractive fixed input maps are likely to produce Markovian organizations of receptive fields on the RecSOM map. We derive bounds on parameter β (weighting the importance of importing past information when processing sequences) under which contractiveness of the fixed input maps is guaranteed. Some generalizations of SOM contain a dynamic module responsible for processing temporal contexts as an integral part of the model. We show that Markovian topographic maps of sequential data can be produced using a simple fixed (nonadaptable) dynamic module externally feeding a standard topographic model designed to process static vectorial data of fixed dimensionality (e.g. SOM). However, by allowing trainable feedback connections one can obtain Markovian maps with superior memory depth and topography preservation. We elaborate upon the importance of nonMarkovian organizations in topographic maps of 2sequential data.
Local Renyi entropic profiles of DNA sequences
 BMC BIOINFORMATICS
, 2007
"... Background
In a recent report the authors presented a new measure of continuous entropy for DNA sequences, which allows the estimation of their randomness level. The definition therein explored was based on the Rényi entropy of probability density estimation (pdf) using the Parzen's window method a ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Background
In a recent report the authors presented a new measure of continuous entropy for DNA sequences, which allows the estimation of their randomness level. The definition therein explored was based on the Rényi entropy of probability density estimation (pdf) using the Parzen's window method and applied to Chaos Game Representation/Universal Sequence Maps (CGR/USM). Subsequent work proposed a fractal pdf kernel as a more exact solution for the iterated map representation. This report extends the concepts of continuous entropy by defining DNA sequence entropic profiles using the new pdf estimations to refine the density estimation of motifs.
Results
The new methodology enables two results. On the one hand it shows that the entropic profiles are directly related with the statistical significance of motifs, allowing the study of under and overrepresentation of segments. On the other hand, by spanning the parameters of the kernel function it is possible to extract important information about the scale of each conserved DNA region. The computational applications, developed in Matlab mcode, the corresponding binary executables and additional material and examples are made publicly available at http://kdbio.inescid.pt/~svinga/ep/ webcite.
Conclusion
The ability to detect local conservation from a scaleindependent representation of symbolic sequences is particularly relevant for biological applications where conserved motifs occur in multiple, overlapping scales, with significant future applications in the recognition of foreign genomic material and inference of motif structures.
Open Access
, 2006
"... Background: Chaos game representation of genome sequences has been used for visual representation of genome sequence patterns as well as alignmentfree comparisons of sequences based on oligonucleotide frequencies. However the potential of this representation for making alignmentbased comparisons o ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Background: Chaos game representation of genome sequences has been used for visual representation of genome sequence patterns as well as alignmentfree comparisons of sequences based on oligonucleotide frequencies. However the potential of this representation for making alignmentbased comparisons of whole genome sequences has not been exploited. Results: We present here a fast algorithm for identifying all local alignments between two long DNA sequences using the sequence information contained in CGR points. The local alignments can be depicted graphically in a dotmatrix plot or in text form, and the significant similarities and differences between the two sequences can be identified. We demonstrate the method through comparison of whole genomes of several microbial species. Given two closely related genomes we generate information on mismatches, insertions, deletions and shuffles that differentiate the two genomes. Conclusion: Addition of the possibility of large scale sequence alignment to the repertoire of alignmentfree sequence analysis applications of chaos game representation, positions CGR as a powerful sequence analysis tool.
APPLICATION OF INFORMATION THEORY TO DNA SEQUENCE ANALYSIS: A REVIEW
"... The analysis of DNA sequences through information theory methods is reviewed from the beginning in the 70s. The subject is addressed within a broad context, describing in some detail the cornerstone contributions in the field. The emerging interest concerning longrange correlations and the mosai ..."
Abstract
 Add to MetaCart
The analysis of DNA sequences through information theory methods is reviewed from the beginning in the 70s. The subject is addressed within a broad context, describing in some detail the cornerstone contributions in the field. The emerging interest concerning longrange correlations and the mosaic structure of DNA sequences is considered from our own point of view. A recent procedure developed by the authors is also outlined. Copyright (Q 1996 Pattern Recognition Society. Published by Elsevier Science Ltd.
unknown title
, 2005
"... www.elsevier.com/locate/gene The spectrum of genomic signatures: from dinucleotides to chaos game representation ..."
Abstract
 Add to MetaCart
www.elsevier.com/locate/gene The spectrum of genomic signatures: from dinucleotides to chaos game representation