Results 1  10
of
28
Sparse Bayesian Learning and the Relevance Vector Machine
, 2001
"... This paper introduces a general Bayesian framework for obtaining sparse solutions to regression and classication tasks utilising models linear in the parameters. Although this framework is fully general, we illustrate our approach with a particular specialisation that we denote the `relevance vec ..."
Abstract

Cited by 552 (5 self)
 Add to MetaCart
This paper introduces a general Bayesian framework for obtaining sparse solutions to regression and classication tasks utilising models linear in the parameters. Although this framework is fully general, we illustrate our approach with a particular specialisation that we denote the `relevance vector machine' (RVM), a model of identical functional form to the popular and stateoftheart `support vector machine' (SVM). We demonstrate that by exploiting a probabilistic Bayesian learning framework, we can derive accurate prediction models which typically utilise dramatically fewer basis functions than a comparable SVM while oering a number of additional advantages. These include the benets of probabilistic predictions, automatic estimation of `nuisance' parameters, and the facility to utilise arbitrary basis functions (e.g. non`Mercer' kernels).
A Survey of Dimension Reduction Techniques
, 2002
"... this paper, we assume that we have n observations, each being a realization of the p dimensional random variable x = (x 1 , . . . , x p ) with mean E(x) = = ( 1 , . . . , p ) and covariance matrix E{(x )(x = # pp . We denote such an observation matrix by X = i,j : 1 p, 1 ..."
Abstract

Cited by 87 (0 self)
 Add to MetaCart
this paper, we assume that we have n observations, each being a realization of the p dimensional random variable x = (x 1 , . . . , x p ) with mean E(x) = = ( 1 , . . . , p ) and covariance matrix E{(x )(x = # pp . We denote such an observation matrix by X = i,j : 1 p, 1 n}. If i and # i = # (i,i) denote the mean and the standard deviation of the ith random variable, respectively, then we will often standardize the observations x i,j by (x i,j i )/ # i , where i = x i = 1/n j=1 x i,j , and # i = 1/n j=1 (x i,j x i )
Fast Marginal Likelihood Maximisation for Sparse Bayesian Models
 Proceedings of the Ninth International Workshop on Artificial Intelligence and Statistics
, 2003
"... The 'sparse Bayesian' modelling approach, as exemplified by the 'relevance vector machine ', enables sparse classification and regression functions to be obtained by linearlyweighting a small nmnber of fixed basis functions from a large dictionary of potential candidates. Such a model conveys ..."
Abstract

Cited by 65 (0 self)
 Add to MetaCart
The 'sparse Bayesian' modelling approach, as exemplified by the 'relevance vector machine ', enables sparse classification and regression functions to be obtained by linearlyweighting a small nmnber of fixed basis functions from a large dictionary of potential candidates. Such a model conveys a nmnber of advantages over the related and very popular 'support vector machine', but the necessary 'training' procedure optimisation of the marginal likelihood function is typically much slower. We describe a new and highly accelerated algorithm which exploits recentlyelucidated properties of the marginal likelihood function to enable maximisation via a principled and efficient sequential addition and deletion of candidate basis functions.
FeedForward Neural Networks and Topographic Mappings for Exploratory Data Analysis
 Neural Computing and Applications
, 1996
"... A recent novel approach to the visualisation and analysis of datasets, and one which is particularly applicable to those of a high dimension, is discussed in the context of real applications. A feedforward neural network is utilised to effect a topographic, structurepreserving, dimensionreducing ..."
Abstract

Cited by 42 (2 self)
 Add to MetaCart
A recent novel approach to the visualisation and analysis of datasets, and one which is particularly applicable to those of a high dimension, is discussed in the context of real applications. A feedforward neural network is utilised to effect a topographic, structurepreserving, dimensionreducing transformation of the data, with an additional facility to incorporate different degrees of associated subjective information. The properties of this transformation are illustrated on synthetic and real datasets, including the 1992 UK Research Assessment Exercise for funding in higher education. The method is compared and contrasted to established techniques for feature extraction, and related to topographic mappings, the Sammon projection and the statistical field of multidimensional scaling. 1 INTRODUCTION The visualisation and analysis of highdimensional data is a difficult problem and one that may be helpfully viewed in the context of feature extraction, which provides a useful commo...
Predicting Performance via Automated FeatureInteraction Detection
"... Abstract—Customizable programs and program families provide userselectable features to allow users to tailor a program to an application scenario. Knowing in advance which feature selection yields the best performance is difficult because a direct measurement of all possible feature combinations is ..."
Abstract

Cited by 15 (10 self)
 Add to MetaCart
Abstract—Customizable programs and program families provide userselectable features to allow users to tailor a program to an application scenario. Knowing in advance which feature selection yields the best performance is difficult because a direct measurement of all possible feature combinations is infeasible. Our work aims at predicting program performance based on selected features. However, when features interact, accurate predictions are challenging. An interaction occurs when a particular feature combination has an unexpected influence on performance. We present a method that automatically detects performancerelevant feature interactions to improve prediction accuracy. To this end, we propose three heuristics to reduce the number of measurements required to detect interactions. Our evaluation consists of six realworld case studies from varying domains (e.g., databases, encoding libraries, and web servers) using different configuration techniques (e.g., configuration files and preprocessor flags). Results show an average prediction accuracy of 95 %. I.
Array signal processing for radio astronomy,” in The Square Kilometre Array: An Engineering Perspective
, 2005
"... Abstract. Radio astronomy forms an interesting application area for array signal processing techniques. Current synthesis imaging telescopes consist of a small number of identical dishes, which track a fixed patch in the sky and produce estimates of the timevarying spatial covariance matrix. The ob ..."
Abstract

Cited by 5 (3 self)
 Add to MetaCart
Abstract. Radio astronomy forms an interesting application area for array signal processing techniques. Current synthesis imaging telescopes consist of a small number of identical dishes, which track a fixed patch in the sky and produce estimates of the timevarying spatial covariance matrix. The observations sometimes are distorted by interference, e.g., from radio, TV, radar or satellite tranmissions. We describe some of the tools that array signal processing offers to filter out the interference, based on eigenvalue decompositions and factor analysis, a more general technique applicable to partially calibrated arrays. We consider spatial filtering techniques using projections and interference subtraction, and discuss how a reference antenna pointed at the interferer can improve the performance. We also consider image formation and its relation to beamforming. Finally, we briefly discuss some future large scale radio telescopes.
Design and Choice of Projection Indices
, 1992
"... D graphics package to display 3D data as "real" objects. We introduce two new projection indices. One is based on divergence from the Student's tdistribution and leads naturally on to discussion of robust projection indices (which we investigate). We also investigate the topological properties, if ..."
Abstract

Cited by 4 (2 self)
 Add to MetaCart
D graphics package to display 3D data as "real" objects. We introduce two new projection indices. One is based on divergence from the Student's tdistribution and leads naturally on to discussion of robust projection indices (which we investigate). We also investigate the topological properties, if any, of various indices. The other index arises from the idea of nonparametric projection indices and provides a measure of multimodality useful not only for projection pursuit, but other statistical methods, such as kernel density estimation. We develop and implement a variant of projection pursuit useful for discrimination purposes. We call the method discriminatory projection pursuit (DPP) and examine the application of DPP to a statistical problem in chemometrics. In addition, we also examine the role of sphering in projection pursuit, and its effect on normally distributed data. Contents 1 Introduction 7 1.1 What is projection pursuit? . . . .
Developments and Applications of Nonlinear Principal Component Analysis  a Review
"... Although linear principal component analysis (PCA) originates from the work of Sylvester [67] and Pearson [51], the development of nonlinear counterparts has only received attention from the 1980s. Work on nonlinear PCA, or NLPCA, can be divided into the utilization of autoassociative neural networ ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Although linear principal component analysis (PCA) originates from the work of Sylvester [67] and Pearson [51], the development of nonlinear counterparts has only received attention from the 1980s. Work on nonlinear PCA, or NLPCA, can be divided into the utilization of autoassociative neural networks, principal curves and manifolds, kernel approaches or the combination of these approaches. This article reviews existing algorithmic work, shows how a given data set can be examined to determine whether a conceptually more demanding NLPCA model is required and lists developments of NLPCA algorithms. Finally, the paper outlines problem areas and challenges that require future work to mature the NLPCA research field.
Comparison of Two DimensionReduction Methods for Network Simulation Models
 NIST Publication #906588, presented at the Winter Simulation Conference
"... Experimenters characterize the behavior of simulation models for data communications networks by measuring multiple responses under selected parameter combinations. The resulting multivariate data may include redundant responses reflecting aspects of a smaller number of underlying behaviors. Reducin ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
Experimenters characterize the behavior of simulation models for data communications networks by measuring multiple responses under selected parameter combinations. The resulting multivariate data may include redundant responses reflecting aspects of a smaller number of underlying behaviors. Reducing the dimension of multivariate responses can reveal the most significant model behaviors, allowing subsequent analyses to focus on one response per behavior. This paper investigates two methods for reducing dimension in multivariate data generated from simulation models. One method combines correlation analysis and clustering. The second method uses principal components analysis. We apply both methods to reduce a 22dimensional dataset generated by a network simulator. We identify issues that an analyst must decide, and we compare the reductions suggested by the methods. We have used these methods to identify significant behaviors in simulated networks, and we suspect they may be applied to reduce the dimension of empirical data measured from real networks. Key words: correlation analysis; dimension reduction; network simulation; principal components analysis.
Bayesian inference and the parametric bootstrap
"... The parametric bootstrap can be used for the efficient computation of Bayes posterior distributions. Importance sampling formulas take on an easy form relating to the deviance in exponential families, and are particularly simple starting from Jeffreys invariant prior. Because of the i.i.d. nature of ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
The parametric bootstrap can be used for the efficient computation of Bayes posterior distributions. Importance sampling formulas take on an easy form relating to the deviance in exponential families, and are particularly simple starting from Jeffreys invariant prior. Because of the i.i.d. nature of bootstrap sampling, familiar formulas describe the computational accuracy of the Bayes estimates. Besides computational methods, the theory provides a connection between Bayesian and frequentist analysis. Efficient algorithms for the frequentist accuracy of Bayesian inferences are developed and demonstrated in a model selection example. Keywords: Jeffreys prior, exponential families, deviance, generalized linear models