Results 1 - 10
of
77
Penalized Discriminant Analysis
- Annals of Statistics
, 1995
"... Fisher's linear discriminant analysis (LDA) is a popular data-analytic tool for studying the relationship between a set of predictors and a categorical response. In this paper we describe a penalized version of LDA. It is designed for situations in which there are many highly correlated predictors, ..."
Abstract
-
Cited by 98 (8 self)
- Add to MetaCart
Fisher's linear discriminant analysis (LDA) is a popular data-analytic tool for studying the relationship between a set of predictors and a categorical response. In this paper we describe a penalized version of LDA. It is designed for situations in which there are many highly correlated predictors, such as those obtained by discretizing a function, or the greyscale values of the pixels in a series of images. In cases such as these it is natural, efficient, and sometimes essential to impose a spatial smoothness constraint on the coefficients, both for improved prediction performance and interpretability. We cast the classification problem into a regression framework via optimal scoring. Using this, our proposal facilitates the use of any penalized regression technique in the classification setting. The technique is illustrated with examples in speech recognition and handwritten character recognition. AMS 1991 Classifications: Primary 62H30, Secondary 62G07 1 Introduction Linear discrim...
Canonical correlation analysis; An overview with application to learning methods
, 2007
"... We present a general method using kernel Canonical Correlation Analysis to learn a semantic representation to web images and their associated text. The semantic space provides a common representation and enables a comparison between the text and images. In the experiments we look at two approaches o ..."
Abstract
-
Cited by 98 (11 self)
- Add to MetaCart
We present a general method using kernel Canonical Correlation Analysis to learn a semantic representation to web images and their associated text. The semantic space provides a common representation and enables a comparison between the text and images. In the experiments we look at two approaches of retrieving images based only on their content from a text query. We compare the approaches against a standard cross-representation retrieval technique known as the Generalised Vector Space Model.
Neural Networks and Statistical Models
, 1994
"... There has been much publicity about the ability of artificial neural networks to learn and generalize. In fact, the most commonly used artificial neural networks, called multilayer perceptrons, are nothing more than nonlinear regression and discriminant models that can be implemented with standard s ..."
Abstract
-
Cited by 82 (1 self)
- Add to MetaCart
There has been much publicity about the ability of artificial neural networks to learn and generalize. In fact, the most commonly used artificial neural networks, called multilayer perceptrons, are nothing more than nonlinear regression and discriminant models that can be implemented with standard statistical software. This paper explains what neural networks are, translates neural network jargon into statistical jargon, and shows the relationships between neural networks and statistical models such as generalized linear models, maximum redundancy analysis, projection pursuit, and cluster analysis.
Talking Probabilities: Communicating Probabilistic Information With Words And Numbers
- International Journal of Approximate Reasoning
, 1999
"... The number of knowledge-based systems that build on Bayesian belief networks is increasing. The construction of such a network however requires a large number of probabilities in numerical form. This is often considered a major obstacle, one of the reasons being that experts are reluctant to provide ..."
Abstract
-
Cited by 22 (4 self)
- Add to MetaCart
The number of knowledge-based systems that build on Bayesian belief networks is increasing. The construction of such a network however requires a large number of probabilities in numerical form. This is often considered a major obstacle, one of the reasons being that experts are reluctant to provide numerical probabilities. The use of verbal probability expressions as an additional method of eliciting probabilistic information may to some extent remove this obstacle. In this paper, we review studies that address the communication of probabilities in words and/or numbers. We then describe our own experiments concerning the development of a probability scale that contains words as well as numbers. This scale appears to be an aid for researchers and domain experts during the elicitation phase of building a belief network and might help users understand the output of the network.
Block-relaxation Algorithms in Statistics
, 1994
"... this paper we discuss four such classes of algorithms. Or, more precisely, we discuss a single class of algorithms, and we show how some well-known classes of statistical algorithms fit in this common class. The subclasses are, in logical order, block-relaxation methods augmentation methods majoriza ..."
Abstract
-
Cited by 20 (1 self)
- Add to MetaCart
this paper we discuss four such classes of algorithms. Or, more precisely, we discuss a single class of algorithms, and we show how some well-known classes of statistical algorithms fit in this common class. The subclasses are, in logical order, block-relaxation methods augmentation methods majorization methods Expectation-Maximization Alternating Least Squares Alternating Conditional Expectations
Statistics and Data Mining: Intersecting Disciplines
- SIGKDD Explorations
, 1999
"... is generally meant by data mining nowadays. Statistics and data mining have much in common, but they also have differences. The nature of the two disciplines is examined, with emphasis on their similarities and differences. ..."
Abstract
-
Cited by 17 (1 self)
- Add to MetaCart
is generally meant by data mining nowadays. Statistics and data mining have much in common, but they also have differences. The nature of the two disciplines is examined, with emphasis on their similarities and differences.
Towards comprehensive foundations of computational intelligence
- In: Duch W, Mandziuk J, Eds, Challenges for Computational Intelligence
, 2007
"... Abstract. Although computational intelligence (CI) covers a vast variety of different methods it still lacks an integrative theory. Several proposals for CI foundations are discussed: computing and cognition as compression, meta-learning as search in the space of data models, (dis)similarity based m ..."
Abstract
-
Cited by 14 (11 self)
- Add to MetaCart
Abstract. Although computational intelligence (CI) covers a vast variety of different methods it still lacks an integrative theory. Several proposals for CI foundations are discussed: computing and cognition as compression, meta-learning as search in the space of data models, (dis)similarity based methods providing a framework for such meta-learning, and a more general approach based on chains of transformations. Many useful transformations that extract information from features are discussed. Heterogeneous adaptive systems are presented as particular example of transformation-based systems, and the goal of learning is redefined to facilitate creation of simpler data models. The need to understand data structures leads to techniques for logical and prototype-based rule extraction, and to generation of multiple alternative models, while the need to increase predictive power of adaptive models leads to committees of competent models. Learning from partial observations is a natural extension towards reasoning based on perceptions, and an approach to intuitive solving of such problems is presented. Throughout the paper neurocognitive inspirations are frequently used and are especially important in modeling of the higher cognitive functions. Promising directions such as liquid and laminar computing are identified and many open problems presented. 1
Another Look at Principal Curves and Surfaces
, 2001
"... INTRODUCTION Consider a multivariate random variable X in R p with density function f and a random sample from X, namely X 1 , ..., X n . The first principal component can be viewed as the straight line which best fits the cloud of data (see, e.g., [17, pp. 386#387]). When the distribution of X is e ..."
Abstract
-
Cited by 13 (2 self)
- Add to MetaCart
INTRODUCTION Consider a multivariate random variable X in R p with density function f and a random sample from X, namely X 1 , ..., X n . The first principal component can be viewed as the straight line which best fits the cloud of data (see, e.g., [17, pp. 386#387]). When the distribution of X is ellipsoidal the population first principal component is the main axis of the ellipsoids of equal concentration. In the past 40 years many works have appeared proposing extensions of principal components to distributions with nonlinear structure. We cite Shepard and Carroll [24], Gnanadesikan and Wilk [13], Srivastava [27], Etezadi-Amoli and McDonald [10], Yohai, Ackermann and Haigh [33], Koyak [19] and Gifi [12], among others. Some of them look for nonlinear transformations of the observable variables into spaces admitting a doi:10
Contextually Guided Unsupervised Learning Using Local Multivariate Binary Processors
, 1996
"... We consider the role of contextual guidance in learning and processing within multi-stream neural networks. Earlier work (Kay & Phillips, 1994, 1996; Phillips et al., 1995) showed how the goals of feature discovery and associative learning could be fused within a single objective, and made precise u ..."
Abstract
-
Cited by 12 (1 self)
- Add to MetaCart
We consider the role of contextual guidance in learning and processing within multi-stream neural networks. Earlier work (Kay & Phillips, 1994, 1996; Phillips et al., 1995) showed how the goals of feature discovery and associative learning could be fused within a single objective, and made precise using information theory, in such a way that local binary processors could extract a single feature that is coherent across streams. In this paper we consider multi-unit local processors with multivariate binary outputs that enable a greater number of coherent features to be extracted. Using the Ising model, we define a class of information-theoretic objective functions and also local approximations, and derive the learning rules in both cases. These rules have similarities to, and differences from, the celebrated BCM rule. Local and global versions of Infomax appear as by-products of the general approach, as well as multivariate versions of Coherent Infomax. Focussing on the more biologicall...

