Results 1  10
of
12
Robust 1Bit Compressive Sensing via Binary Stable Embeddings of Sparse Vectors
, 2011
"... The Compressive Sensing (CS) framework aims to ease the burden on analogtodigital converters (ADCs) by reducing the sampling rate required to acquire and stably recover sparse signals. Practical ADCs not only sample but also quantize each measurement to a finite number of bits; moreover, there is ..."
Abstract

Cited by 85 (26 self)
 Add to MetaCart
The Compressive Sensing (CS) framework aims to ease the burden on analogtodigital converters (ADCs) by reducing the sampling rate required to acquire and stably recover sparse signals. Practical ADCs not only sample but also quantize each measurement to a finite number of bits; moreover, there is an inverse relationship between the achievable sampling rate and the bit depth. In this paper, we investigate an alternative CS approach that shifts the emphasis from the sampling rate to the number of bits per measurement. In particular, we explore the extreme case of 1bit CS measurements, which capture just their sign. Our results come in two flavors. First, we consider ideal reconstruction from noiseless 1bit measurements and provide a lower bound on the best achievable reconstruction error. We also demonstrate that a large class of measurement mappings achieve this optimal bound. Second, we consider reconstruction robustness to measurement errors and noise and introduce the Binary ɛStable Embedding (BɛSE) property, which characterizes the robustness measurement process to sign changes. We show the same class of matrices that provide optimal noiseless performance also enable such a robust mapping. On the practical side, we introduce the Binary Iterative Hard Thresholding (BIHT) algorithm for signal reconstruction from 1bit measurements that offers stateoftheart performance.
Improved Bayesian Logistic Supervised Topic Models with Data Augmentation
"... Supervised topic models with a logistic likelihood have two issues that potentially limit their practical use: 1) response variables are usually overweighted by document word counts; and 2) existing variational inference methods make strict meanfield assumptions. We address these issues by: 1) int ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
Supervised topic models with a logistic likelihood have two issues that potentially limit their practical use: 1) response variables are usually overweighted by document word counts; and 2) existing variational inference methods make strict meanfield assumptions. We address these issues by: 1) introducing a regularization constant to better balance the two parts based on an optimization formulation of Bayesian inference; and 2) developing a simple Gibbs sampling algorithm by introducing auxiliary PolyaGamma variables and collapsing out Dirichlet variables. Our augmentandcollapse sampling algorithm has analytical forms of each conditional distribution without making any restricting assumptions and can be easily parallelized. Empirical results demonstrate significant improvements on prediction performance and time efficiency. 1
Regularization Approaches in Learning Theory
, 2006
"... Learning from examples can be seen as a very general framework for modeling a variety of different statistical inference problems. Such statistical problems are at the basis of the design of programs which are trained, instead of programmed, to perform a task. In particular supervised learning aim ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
Learning from examples can be seen as a very general framework for modeling a variety of different statistical inference problems. Such statistical problems are at the basis of the design of programs which are trained, instead of programmed, to perform a task. In particular supervised learning aims at finding an unknown inputoutput relation given a (possibly small) number of inputoutput instances (the examples). The main goal in this setting is not to describe the available data but to predict the output when a new input is given, that is to be able to generalize. A learning algorithm should be able to avoid overfitting the data that is to overestimate the importance of the available information loosing generalization properties. Regularization Theory was originally developed and formalized as a way to find stable solutions to illposed problems. Eventually some regularization techniques became popular in the context of machine learning as an effective way to avoid
Noiseadaptive marginbased active learning for multidimensional data,” arXiv Preprint
"... We present and analyze an adaptive marginbased algorithm that actively learns the optimal linear separator for multidimensional data. The algorithm has the capacity of adapting to unknown level of label noise in the underlying distribution, making it suitable for model selection under the active l ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
(Show Context)
We present and analyze an adaptive marginbased algorithm that actively learns the optimal linear separator for multidimensional data. The algorithm has the capacity of adapting to unknown level of label noise in the underlying distribution, making it suitable for model selection under the active learning setting. Compared to other alternative agnostic active learning algorithms, our proposed method is much simpler and achieves the optimal convergence rate in query budget T and data dimension d, if logarithm factors are ignored. Furthermore, our algorithm can handle classification loss functions other than the 01 loss, such as hinge and logistic loss, and hence is computationally feasible. 1
B.: Leveraging over prior knowledge for online learning of visual categories
, 2012
"... ..."
(Show Context)
Timely Event Detection by Networked Learners
"... AbstractWe consider a set of distributed learners that are interconnected via an exogenouslydetermined network. The learners observe different data streams that are related to common events of interest, which need to be detected in a timely manner. Each learner is equipped with a set of local cla ..."
Abstract
 Add to MetaCart
(Show Context)
AbstractWe consider a set of distributed learners that are interconnected via an exogenouslydetermined network. The learners observe different data streams that are related to common events of interest, which need to be detected in a timely manner. Each learner is equipped with a set of local classifiers, which generate local predictions about the common event based on the locally observed data streams. In this work, we address the following key questions: (1) Can the learners improve their detection accuracy by exchanging and aggregating information? (2) Can the learners improve the timeliness of their detections by forming clusters, i.e., by collecting information only from surrounding learners? (3) Given a specific tradeoff between detection accuracy and detection delay, is it desirable to aggregate a large amount of information, or is it better to focus on the most recent and relevant information? To address these questions, we propose a cooperative online learning scheme in which each learner maintains a set of weight vectors (one for each possible cluster), selects a cluster and the corresponding weight vector, generates a local prediction, disseminates it through the network, and combines all the received local predictions from the learners belonging to the selected cluster by using a weighted majority rule. The optimal cluster and weight vector that a learner should adopt depend on the specific network topology, on the location of the learner in the network, and on the characteristics of the data streams. To learn such optimal values, we propose a general online learning rule that exploits only the feedbacks that the learners receive. We determine an upper bound for the worstcase misdetection probability and for the worstcase prediction delay of our scheme in the realizable case. Numerical simulations show that the proposed scheme is able to successfully adapt to the unknown characteristics of the data streams and can achieve substantial performance gains with respect to a scheme in which the learners act individually or a scheme in which the learners always aggregate all available local predictions. We numerically evaluate the impact that different network topologies have on the final performance. Finally, we discuss several surprising existing tradeoffs.
Experimental Evaluation of Multilayer Perceptrons with Entropic Risk Functionals on
"... We investigate the performance of MLPs with four risk functionals: the classical mean square error (MSE), the crossentropy (CE), a generalized exponential risk (EXP), and the Shannon entropy of the classifier’s output error (HS). The performance is compared with an SVM with RBF kernel in terms of a ..."
Abstract
 Add to MetaCart
(Show Context)
We investigate the performance of MLPs with four risk functionals: the classical mean square error (MSE), the crossentropy (CE), a generalized exponential risk (EXP), and the Shannon entropy of the classifier’s output error (HS). The performance is compared with an SVM with RBF kernel in terms of average balanced and unbalanced error rates, and their generalization, on practical classification tasks. For this purpose we carried out experiments on 35 public realworld datasets. A battery of statistical tests applied to the experimental results showed no significant difference among the classifiers in terms of unbalanced error rates. However, in terms of balanced error rates SVMRBF performed significantly worse than MLPCE and MLPEXP. Regarding generalization, SVMRBF and MLPEXP scored as the classification methods with significantly better generalization, both in terms of balanced and unbalanced error rates. 1
T. TOMMASI ET AL.: LEVERAGING PRIOR KNOWLEDGE FOR ONLINE LEARNING 1 Leveraging over prior knowledge for online learning of visual categories
"... Open ended learning is a dynamic process based on the continuous analysis of new data, guided by past experience. On one side it is helpful to take advantage of prior knowledge when only few information on a new task is available (transfer learning). On the other, it is important to continuously upd ..."
Abstract
 Add to MetaCart
(Show Context)
Open ended learning is a dynamic process based on the continuous analysis of new data, guided by past experience. On one side it is helpful to take advantage of prior knowledge when only few information on a new task is available (transfer learning). On the other, it is important to continuously update an existing model so to exploit the new incoming data, especially if their informative content is very different from what is already known (online learning). Until today these two aspects of the learning process have been tackled separately. In this paper we propose an algorithm that takes the best of both worlds: we consider a sequential learning setting, and we exploit the potentiality of knowledge transfer with a computationally cheap solution. At the same time, by relying on past experience we boost online learning to predict reliably on future problems. A theoretical analysis, coupled with extensive experiments, show that our approach performs well in terms of the online number of training mistakes, as well as in terms of performance on separate test sets. 1
Team SequeL
"... A commonly used approach to multiclass classification is to replace the 0 − 1 loss with a convex surrogate so as to make empirical risk minimization computationally tractable. Previous work has uncovered sufficient and necessary conditions for the consistency of the resulting procedures. In this pap ..."
Abstract
 Add to MetaCart
(Show Context)
A commonly used approach to multiclass classification is to replace the 0 − 1 loss with a convex surrogate so as to make empirical risk minimization computationally tractable. Previous work has uncovered sufficient and necessary conditions for the consistency of the resulting procedures. In this paper, we strengthen these results by showing how the 0 − 1 excess loss of a predictor can be upper bounded as a function of the excess loss of the predictor measured using the convex surrogate. The bound is developed for the case of costsensitive multiclass classification and a convex surrogate loss that goes back to the work of Lee, Lin and Wahba. The bounds are as easy to calculate as in binary classification. Furthermore, we also show that our analysis extends to the analysis of the recently introduced “Simplex Coding ” scheme. 1.
doi:10.1155/2012/478467 Research Article Investigation of Super Learner Methodology on HIV1 Small Sample: Application on Jaguar Trial Data
"... Copyright © 2012 Allal Houssaïni et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Background. Many statistical models have been ..."
Abstract
 Add to MetaCart
(Show Context)
Copyright © 2012 Allal Houssaïni et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Background. Many statistical models have been tested to predict phenotypic or virological response from genotypic data. A statistical framework called Super Learner has been introduced either to compare different methods/learners (discrete Super Learner) or to combine them in a Super Learner prediction method. Methods. The Jaguar trial is used to apply the Super Learner framework. The Jaguar study is an “addon ” trial comparing the efficacy of adding didanosine to an ongoing failing regimen. Our aim was also to investigate the impact on the use of different crossvalidation strategies and different loss functions. Four different repartitions between training set and validations set were tested through two loss functions. Six statistical methods were compared. We assess performance by evaluating R 2 values and accuracy by calculating the rates of patients being correctly classified. Results. Our results indicated that the more recent Super Learner methodology of building a new predictor based on a weighted combination of different methods/learners provided good performance. A simple linear model provided similar results to those of this new predictor. Slight discrepancy arises between the two loss functions investigated, and slight difference arises also between