Results 11 - 20
of
30
Generativity and Systematicity in Neural Network Combinatorial Learning
, 1993
"... This thesis addresses a set of problems faced by connectionist learning that have originated from the observation that connectionist cognitive models lack two fundamental properties of the mind: Generativity, stemming from the boundless cognitive competence one can exhibit, and systematicity, due to ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
This thesis addresses a set of problems faced by connectionist learning that have originated from the observation that connectionist cognitive models lack two fundamental properties of the mind: Generativity, stemming from the boundless cognitive competence one can exhibit, and systematicity, due to the existence of symmetries within them. Such properties have seldom been seen in neural networks models, which have typically suffered from problems of inadequate generalization, as examplified both by small number of generalizations relative to training set sizes and heavy interference between newly learned items and previously learned information. Symbolic theories, arguing that mental representations have syntactic and semantic structure built from structured combinations of symbolic constituents, can in principle account for these properties (both arise from the sensitivity of structured semantic content with a generative and systematic syntax). This thesis studies the question of whe...
Entropy of English text: experiments with humans and a machine learning system based on rough sets
- Information Sciences
, 1998
"... Abstract. The goal of this paper is to show the dependency of measured entropy of English text on subject of the experiment, the type of English text, and the methodology used to estimate the entropy. 1. ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
Abstract. The goal of this paper is to show the dependency of measured entropy of English text on subject of the experiment, the type of English text, and the methodology used to estimate the entropy. 1.
Lossless Compression for Text and Images
- International Journal of High Speed Electronics and Systems
, 1995
"... Most data that is inherently discrete needs to be compressed in such a way that it can be recovered exactly, without any loss. Examples include text of all kinds, experimental results, and statistical databases. Other forms of data may need to be stored exactly, such as images---particularly bilevel ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
Most data that is inherently discrete needs to be compressed in such a way that it can be recovered exactly, without any loss. Examples include text of all kinds, experimental results, and statistical databases. Other forms of data may need to be stored exactly, such as images---particularly bilevel ones, or ones arising in medical and remotesensing applications, or ones that may be required to be certified true for legal reasons. Moreover, during the process of lossy compression, many occasions for lossless compression of coefficients or other information arise. This paper surveys techniques for lossless compression. The process of compression can be broken down into modeling and coding. We provide an extensive discussion of coding techniques, and then introduce methods of modeling that are appropriate for text and images. Standard methods used in popular utilities (in the case of text) and international standards (in the case of images) are described. Keywords Text compression, ima...
The Complexity and Entropy of Literary Styles
, 1996
"... Since Shannon's original experiment in 1951, several methods have been applied to the problem of determining the entropy of English text. These methods were based either on prediction by human subjects, or on computer-implemented parametric models for the data, of a certain Markov order. We ask why ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
Since Shannon's original experiment in 1951, several methods have been applied to the problem of determining the entropy of English text. These methods were based either on prediction by human subjects, or on computer-implemented parametric models for the data, of a certain Markov order. We ask why computer-based experiments almost always yield much higher entropy estimates than the ones produced by humans. We argue that there are two main reasons for this discrepancy. First, the long-range correlations of English text are not captured by Markovian models and, second, computerbased models only take advantage of the text statistics without being able to "understand" the contextual structure and the semantics of the given text. The second question we address is what does the "entropy" of a text say about the author's literary style. In particular, is there an intuitive notion of "complexity of style" that is captured by the entropy? We present preliminary results based on a non-parametric entropy estimation algorithm that o er partial answers to these questions. These results indicate that taking long-range correlations into account significantly improves the entropy estimates. We get an estimate of 1.77 bits-per-character for a onemillion-character sample taken from Jane Austen's works. Also comparing the estimates obtained from several di erent texts provides some insight into the interpretation of the notion of "entropy" when applied to English text rather than to random processes, and the relationship between the entropy and the "literary complexity" of an author's style. Advantages of this entropy estimation method are that it does not require prior training, it is uniformly good over different styles and languages, and it seems to converge reasonably fast.
1989b] \Mutual information functions of natural language texts," Santa Fe Institute preprint
, 1989
"... Abstract. The mutual information function M(d), which is a quantity used to detect correlations in symbolic sequences, is applied to natural language texts. For some English and German texts being analyzed, M(d)’s for both the letter sequences and letter-type sequences exhibit approximate inverse po ..."
Abstract
-
Cited by 4 (3 self)
- Add to MetaCart
Abstract. The mutual information function M(d), which is a quantity used to detect correlations in symbolic sequences, is applied to natural language texts. For some English and German texts being analyzed, M(d)’s for both the letter sequences and letter-type sequences exhibit approximate inverse power law function at shorter distance with exponents close to 3. This decay of M(d) is too fast to lead a 1/f power spectrum. Due to finite size effects, it is not conclusive as to whether the same inverse power law function extends beyond short distances. Also included are discussions on various topics concerning other scaling phenomena in formal and natural languages. 1.
Estimating the potential of signal and interlocutor-track information for language modeling
- In: Interspeech
, 2009
"... Although today most language models treat language purely as word sequences, there is recurring interest in tapping new sources of information, such as disfluencies, prosody, the interlocutor’s dialog act, and the interlocutor’s recent words. In order to estimate the potential value of such sources ..."
Abstract
-
Cited by 4 (4 self)
- Add to MetaCart
Although today most language models treat language purely as word sequences, there is recurring interest in tapping new sources of information, such as disfluencies, prosody, the interlocutor’s dialog act, and the interlocutor’s recent words. In order to estimate the potential value of such sources of information, we extend Shannon’s guessing-game method for estimating entropy to work for spoken dialog. Four teams of two subjects each predicted the next word in a dialog using various amounts of context: one word, two words, all the words spoken so far, or the full dialog audio so far. The entropy benefit in the full-audio condition over the full text condition was substantial,.64 bits per word, greater than the.54 bit benefit of full text context over trigrams. This suggests that language models may be improved by use of the prosody of the speaker and context from the interlocutor. Index Terms: entropy, perplexity, Shannon’s guessing game, prediction, context, prosody
Symbol-driven compression of burrows wheeler transformed text
, 2000
"... Despite the enormous growth in storage capacity in recent years, the search for fast and effi-cient text compression algorithms continues. As processor speed is increasing at a higher rate than disk access time is decreasing, there is now even more reason to store information in a compressed form th ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Despite the enormous growth in storage capacity in recent years, the search for fast and effi-cient text compression algorithms continues. As processor speed is increasing at a higher rate than disk access time is decreasing, there is now even more reason to store information in a compressed form than there was previously. Prediction by Partial Matching (PPM), first published in 1984, was a significant step forward in the quest for efficient text compression. The Burrows Wheeler transform (BWT), introduced ten years later, has been the next significant breakthrough; its best implementations rank along-side those of PPM. In most BWT implementations, transformed text is converted to a string of ranks with a move-to-front (MTF) or similar mechanism before being compressed. Ranks are then encoded with an Order- model or a hierarchy of such models, with some substrings of repeated ranks encoded as run lengths. Although these rank based methods perform very well, the transfor-mation to MTF numbers blurs the distinction between individual symbols and is a possible cause of ineffectiveness. Instead of relying on symbol ranking, we examine the problem of modelling the transformed text as a sequence of segments with iid symbols, using three different techniques.
Universal erasure entropy estimation
- In Proc. of the 2006 IEEE Intl. Symp. on Inform. Theory, (ISIT’06
, 2006
"... Abstract — Erasure entropy rate (introduced recently by Verdú and Weissman) differs from Shannon’s entropy rate in that the conditioning occurs with respect to both the past and the future, as opposed to only the past (or the future). In this paper, universal algorithms for estimating erasure entrop ..."
Abstract
-
Cited by 3 (3 self)
- Add to MetaCart
Abstract — Erasure entropy rate (introduced recently by Verdú and Weissman) differs from Shannon’s entropy rate in that the conditioning occurs with respect to both the past and the future, as opposed to only the past (or the future). In this paper, universal algorithms for estimating erasure entropy rate are proposed based on the basic and extended context-tree weighting (CTW) algorithms. Consistency results are shown for those CTW based algorithms. Simulation results for those algorithms applied to Markov sources, tree sources and English texts are compared to those obtained by fixed-order plug-in estimators with different orders. An estimate of the erasure entropy of English texts based on the proposed algorithms is about 0.22 bits per letter, which can be compared to an estimate of about 1.3 bits per letter for the entropy rate of English texts by a similar CTW based algorithm.
Compression of Parallel Texts
- Information Processing and Management
, 1992
"... The world-wide use of digital storage and communications devices is increasing the need to make texts available in multiple languages. To minimise the cost of storing and transmitting multiple translations of a text, one could store the text in just one language, from which other translations can be ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
The world-wide use of digital storage and communications devices is increasing the need to make texts available in multiple languages. To minimise the cost of storing and transmitting multiple translations of a text, one could store the text in just one language, from which other translations can be created. Unfortunately, the quality of machine translation techniques is not good enough for this to be feasible. An alternative is to store a compressed form of translated versions of a text, taking advantage of the availability of the original text. The original text provides some of the semantic content of the text that is to be compressed, and therefore makes it possible for compression to be more efficient than if that information were not available. This paper reports investigations into the use of a parallel text to represent its translated version compactly. We begin with an experiment to evaluate the information content of a text when a parallel translation is available. This is a...
Information theory and learning: a physical approach
, 2000
"... We try to establish a unified information theoretic approach to learning and to ex-plore some of its applications. First, we define predictive information as the mutual information between the past and the future of a time series, discuss its behav-ior as a function of the length of the series, and ..."
Abstract
-
Cited by 3 (3 self)
- Add to MetaCart
We try to establish a unified information theoretic approach to learning and to ex-plore some of its applications. First, we define predictive information as the mutual information between the past and the future of a time series, discuss its behav-ior as a function of the length of the series, and explain how other quantities of interest studied previously in learning theory—as well as in dynamical systems and statistical mechanics—emerge from this universally definable concept. We then prove that predictive information provides the unique measure for the com-plexity of dynamics underlying the time series and show that there are classes of models characterized by power–law growth of the predictive information that are qualitatively more complex than any of the systems that have been investigated before. Further, we investigate numerically the learning of a nonparametric prob-ability density, which is an example of a problem with power–law complexity, and show that the proper Bayesian formulation of this problem provides for the ‘Occam ’ factors that punish overly complex models and thus allow one to learn

