Results 1 -
8 of
8
Information Theory and Statistics
, 1968
"... Entropy and relative entropy are proposed as features extracted from symbol sequences. Firstly, a proper Iterated Function System is driven by the sequence, producing a fractaMike representation (CSR) with a low computational cost. Then, two entropic measures are applied to the CSR histogram of th ..."
Abstract
-
Cited by 873 (0 self)
- Add to MetaCart
Entropy and relative entropy are proposed as features extracted from symbol sequences. Firstly, a proper Iterated Function System is driven by the sequence, producing a fractaMike representation (CSR) with a low computational cost. Then, two entropic measures are applied to the CSR histogram of the CSR and theoretically justified. Examples are included.
Spatial Representation of Symbolic Sequences through Iterative Function Systems
- IEEE Transactions on Systems, Man, and Cybernetics Part A: Systems and Humans
, 1998
"... Jeffrey proposed a graphic representation of DNA sequences using Barnsley's iterative function systems. In spite of further developments in this direction, the proposed graphic representation of DNA sequences has been lacking a rigorous connection between its spatial scaling characteristics and the ..."
Abstract
-
Cited by 21 (11 self)
- Add to MetaCart
Jeffrey proposed a graphic representation of DNA sequences using Barnsley's iterative function systems. In spite of further developments in this direction, the proposed graphic representation of DNA sequences has been lacking a rigorous connection between its spatial scaling characteristics and the statistical characteristics of the DNA sequences themselves. We 1) generalize Jeffrey's graphic representation to accommodate (possibly infinite) sequences over an arbitrary finite number of symbols, 2) establish a direct correspondence between the statistical characterization of symbolic sequences via R'enyi entropy spectra and the multifractal characteristics (R'enyi generalized dimensions) of the sequences' spatial representations, 3) show that for general symbolic dynamical systems, the multifractal f H - spectra in the sequence space coincide with the f H -spectra on spatial sequence representations. Keywords--- Multifractal theory, Iterative function systems, Chaos game representation...
Constructing Finite-Context Sources From Fractal Representations of Symbolic Sequences
, 1998
"... We propose a novel approach to constructing predictive models on long complex symbolic sequences. The models are constructed by first transforming the training sequence n-block structure into a spatial structure of points in a unit hypercube. The transformation between the symbolic and Euclidean spa ..."
Abstract
-
Cited by 6 (4 self)
- Add to MetaCart
We propose a novel approach to constructing predictive models on long complex symbolic sequences. The models are constructed by first transforming the training sequence n-block structure into a spatial structure of points in a unit hypercube. The transformation between the symbolic and Euclidean spaces embodies a natural smoothness assumption (n-blocks with long common suffices are likely to produce similar continuations) in that the longer is the common suffix shared by any two n-blocks, the closer lie their point representations. Finding a set of prediction contexts is then formulated as a resource allocation problem solved by vector quantizing the spatial representation of the training sequence n-block structure. Our predictive models are similar in spirit to variable memory length Markov models (VLMMs). We compare the proposed models with both the classical and variable memory length Markov models on two chaotic symbolic sequences with different levels of subsequence distribution ...
Extracting Finite State Representations from Recurrent Neural Networks trained on Chaotic Symbolic Sequences
- IEEE Transactions on Neural Networks
, 1999
"... While much work has been done in neural based modeling of real valued chaotic time series, little effort has been devoted to address similar problems in the symbolic domain. We investigate the knowledge induction process associated with training recurrent neural networks (RNNs) on single long chaoti ..."
Abstract
-
Cited by 5 (5 self)
- Add to MetaCart
While much work has been done in neural based modeling of real valued chaotic time series, little effort has been devoted to address similar problems in the symbolic domain. We investigate the knowledge induction process associated with training recurrent neural networks (RNNs) on single long chaotic symbolic sequences. Even though training RNNs to predict the next symbol leaves the standard performance measures such as the mean square error on the network output virtually unchanged, the networks nevertheless do extract a lot of knowledge. We monitor the knowledge extraction process by considering the networks stochastic sources and letting them generate sequences which are then confronted with the training sequence via information theoretic entropy and cross-entropy measures. We also study the possibility of reformulating the knowledge gained by RNNs in a compact and easy-to-analyze form of finite state stochastic machines. The experiments are performed on two sequences with different...
Dynamics and topographic organization in recursive self-organizing map
- NEURAL COMPUTATION
, 2006
"... Recently, there has been an outburst of interest in extending topo-graphic maps of vectorial data to more general data structures, such as sequences or trees. However, at present, there is no general consensus as to how best to process sequences using topographic maps and this topic remains a very a ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
Recently, there has been an outburst of interest in extending topo-graphic maps of vectorial data to more general data structures, such as sequences or trees. However, at present, there is no general consensus as to how best to process sequences using topographic maps and this topic remains a very active focus of current neurocomputational research. The representational capabilities and internal representations of the models are not well understood. We rigorously analyze a generalization of the Self-Organizing Map (SOM) for processing sequential data, Recursive SOM (RecSOM) (Voegtlin, 2002), as a non-autonomous dynamical system consisting of a set of fixed input maps. We argue that contractive fixed input maps are likely to produce Markovian organizations of re-ceptive fields on the RecSOM map. We derive bounds on parameter β (weighting the importance of importing past information when process-ing sequences) under which contractiveness of the fixed input maps is guaranteed. Some generalizations of SOM contain a dynamic module responsible for processing temporal contexts as an integral part of the model. We show that Markovian topographic maps of sequential data can be produced using a simple fixed (non-adaptable) dynamic module externally feeding a standard topographic model designed to process static vectorial data of fixed dimensionality (e.g. SOM). However, by allowing trainable feedback connections one can obtain Markovian maps with superior memory depth and topography preservation. We elaborate upon the importance of non-Markovian organizations in topographic maps of 2sequential data.
Universal sequence map (USM) of arbitrary discrete sequences
, 2002
"... Background: For over a decade the idea of representing biological sequences in a continuous coordinate space has maintained its appeal but not been fully realized. The basic idea is that any sequence of symbols may define trajectories in the continuous space conserving all its statistical propert ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
Background: For over a decade the idea of representing biological sequences in a continuous coordinate space has maintained its appeal but not been fully realized. The basic idea is that any sequence of symbols may define trajectories in the continuous space conserving all its statistical properties. Ideally, such a representation would allow scale independent sequence analysis -- without the context of fixed memory length. A simple example would consist on being able to infer the homology between two sequences solely by comparing the coordinates of any two homologous units.
Local Renyi entropic profiles of DNA sequences
- BMC BIOINFORMATICS
, 2007
"... Background
In a recent report the authors presented a new measure of continuous entropy for DNA sequences, which allows the estimation of their randomness level. The definition therein explored was based on the Rényi entropy of probability density estimation (pdf) using the Parzen's window method a ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Background
In a recent report the authors presented a new measure of continuous entropy for DNA sequences, which allows the estimation of their randomness level. The definition therein explored was based on the Rényi entropy of probability density estimation (pdf) using the Parzen's window method and applied to Chaos Game Representation/Universal Sequence Maps (CGR/USM). Subsequent work proposed a fractal pdf kernel as a more exact solution for the iterated map representation. This report extends the concepts of continuous entropy by defining DNA sequence entropic profiles using the new pdf estimations to refine the density estimation of motifs.
Results
The new methodology enables two results. On the one hand it shows that the entropic profiles are directly related with the statistical significance of motifs, allowing the study of under and over-representation of segments. On the other hand, by spanning the parameters of the kernel function it is possible to extract important information about the scale of each conserved DNA region. The computational applications, developed in Matlab m-code, the corresponding binary executables and additional material and examples are made publicly available at http://kdbio.inesc-id.pt/~svinga/ep/ webcite.
Conclusion
The ability to detect local conservation from a scale-independent representation of symbolic sequences is particularly relevant for biological applications where conserved motifs occur in multiple, overlapping scales, with significant future applications in the recognition of foreign genomic material and inference of motif structures.
APPLICATION OF INFORMATION THEORY TO DNA SEQUENCE ANALYSIS: A REVIEW
"... The analysis of DNA sequences through information theory methods is reviewed from the beginning in the 70s. The subject is addressed within a broad context, describing in some detail the cornerstone contributions in the field. The emerging interest concerning long-range correlations and the mosai ..."
Abstract
- Add to MetaCart
The analysis of DNA sequences through information theory methods is reviewed from the beginning in the 70s. The subject is addressed within a broad context, describing in some detail the cornerstone contributions in the field. The emerging interest concerning long-range correlations and the mosaic structure of DNA sequences is considered from our own point of view. A recent procedure developed by the authors is also outlined. Copyright (Q 1996 Pattern Recognition Society. Published by Elsevier Science Ltd.

