Results 11 - 20
of
64
A CLUE for CLUster Ensembles
- Journal of Statistical Software
, 2005
"... Cluster ensembles are collections of individual solutions to a given clustering problem which are useful or necessary to consider in a wide range of applications. The R package clue provides an extensible computational environment for creating and analyzing cluster ensembles, with basic data structu ..."
Abstract
-
Cited by 11 (2 self)
- Add to MetaCart
Cluster ensembles are collections of individual solutions to a given clustering problem which are useful or necessary to consider in a wide range of applications. The R package clue provides an extensible computational environment for creating and analyzing cluster ensembles, with basic data structures for representing partitions and hierarchies, and facilities for computing on these, including methods for measuring proximity and obtaining consensus and “secondary ” clusterings. 1
Early detection of changes in the North Atlantic meridional overturning circulation: Implications for the design of ocean observation systems
- JOURNAL OF CLIMATE
, 2007
"... Many climate models predict that anthropogenic greenhouse gas emissions may cause a threshold response of the North Atlantic meridional overturning circulation (MOC). These model predictions are, ..."
Abstract
-
Cited by 9 (6 self)
- Add to MetaCart
Many climate models predict that anthropogenic greenhouse gas emissions may cause a threshold response of the North Atlantic meridional overturning circulation (MOC). These model predictions are,
Learning Bayes net structure from sparse data sets
, 2001
"... There are essentially two kinds of approaches for learning the structure of Bayesian Networks (BNs) from data. The first approach tries to find a graph which satis es all the constraints implied by the empirical conditional independencies measured in the data [PV91, SGS00a, Shi00]. The second approa ..."
Abstract
-
Cited by 8 (2 self)
- Add to MetaCart
There are essentially two kinds of approaches for learning the structure of Bayesian Networks (BNs) from data. The first approach tries to find a graph which satis es all the constraints implied by the empirical conditional independencies measured in the data [PV91, SGS00a, Shi00]. The second approach searches through the space of models (either DAGs or PDAGs), and uses some scoring metric (typically Bayesian or some approximation, such as BIC/MDL) to evaluate the models [CH92, Hec95, Hec98, Kra98], typically returning the highest scoring model found. Our main interest is in learning BN structure from gene expression data [FLNP00, HGJY01, MM99, SGS00b]. In domains such as this, where the ratio of the number of observations to the number of variables is low (i.e., when we have sparse data), selecting a threshold for the conditional independence (CI) tests can be tricky, and repeated use of such tests can lead to inconsistencies [DD99]. Bayesian s...
Graph-based consensus maximization among multiple supervised and unsupervised models
- Advances in Neural Information Processing Systems (NIPS
, 2009
"... Ensemble classifiers such as bagging, boosting and model averaging are known to have improved accuracy and robustness over a single model. Their potential, however, is limited in applications which have no access to raw data but to the meta-level model output. In this paper, we study ensemble learni ..."
Abstract
-
Cited by 8 (4 self)
- Add to MetaCart
Ensemble classifiers such as bagging, boosting and model averaging are known to have improved accuracy and robustness over a single model. Their potential, however, is limited in applications which have no access to raw data but to the meta-level model output. In this paper, we study ensemble learning with output from multiple supervised and unsupervised models, a topic where little work has been done. Although unsupervised models, such as clustering, do not directly generate label prediction for each individual, they provide useful constraints for the joint prediction of a set of related objects. We propose to consolidate a classification solution by maximizing the consensus among both supervised predictions and unsupervised constraints. We cast this ensemble task as an optimization problem on a bipartite graph, where the objective function favors the smoothness of the prediction over the graph, as well as penalizing deviations from the initial labeling provided by supervised models. We solve this problem through iterative propagation of probability estimates among neighboring nodes. Our method can also be interpreted as conducting a constrained embedding in a transformed space, or a ranking on the graph. Experimental results on three real applications demonstrate the benefits of the proposed method over existing alternatives 1. 1
Effective estimation of posterior probabilities: Explaining the accuracy of randomized decision tree approaches
- In Proceedings of the 5th IEEE International Conference on Data Mining (ICDM 2005
, 2005
"... There has been increasing number of independently proposed randomization methods in different stages of decision tree construction to build multiple trees. Randomized decision tree methods have been reported to be significantly more accurate than widely-accepted single decision trees, although the t ..."
Abstract
-
Cited by 7 (3 self)
- Add to MetaCart
There has been increasing number of independently proposed randomization methods in different stages of decision tree construction to build multiple trees. Randomized decision tree methods have been reported to be significantly more accurate than widely-accepted single decision trees, although the training procedure of some methods incorporates a surprisingly random factor and therefore opposes the generally accepted idea of employing gain functions to choose optimum features at each node and compute a single tree that fits the data. One important question that is not well understood yet is the reason behind the high accuracy. We provide an insight based on posterior probability estimations. We first establish the relationship between effective posterior probability estimation and effective loss reduction. We argue that randomized decision tree methods effectively approximate the true probability distribution using the decision tree hypothesis space. We conduct experiments using both synthetic and real-world datasets under both 0-1 and cost-sensitive loss functions. 1
Model-based assignment and inference of protein backbone nuclear magnetic resonances
- STATISTICAL APPLICATIONS IN GENETICS AND MOLECULAR BIOLOGY
, 2004
"... ..."
To Select or To Weigh: A Comparative Study of Linear Combination Schemes for SuperParent-One-Dependence Estimators
"... We conduct a large-scale comparative study on linearly combining superparent-one-dependence estimators (SPODEs), a popular family of semi-naive Bayesian classifiers. Altogether 16 model selection and weighing schemes, 58 benchmark data sets, as well as various statistical tests are employed. This p ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
We conduct a large-scale comparative study on linearly combining superparent-one-dependence estimators (SPODEs), a popular family of semi-naive Bayesian classifiers. Altogether 16 model selection and weighing schemes, 58 benchmark data sets, as well as various statistical tests are employed. This paper’s main contributions are three-fold. First, it formally presents each scheme’s definition, rationale and time complexity; and hence can serve as a comprehensive reference for researchers interested in ensemble learning. Second, it offers bias-variance analysis for each scheme’s classification error performance. Third, it identifies effective schemes that meet various needs in practice. This leads to accurate and fast classification algorithms with immediate and significant impact on real-world applications. Another important feature of our study is using a variety of statistical tests to evaluate multiple learning methods across multiple data sets.
Accounting for input-model and input-parameter uncertainties in simulation
, 2004
"... To account for the input-model and input-parameter uncertainties inherent in many simulations as well as the usual stochastic uncertainty, we present a Bayesian input-modeling technique that yields improved point and confidence-interval estimators for a selected posterior mean response. Exploiting p ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
To account for the input-model and input-parameter uncertainties inherent in many simulations as well as the usual stochastic uncertainty, we present a Bayesian input-modeling technique that yields improved point and confidence-interval estimators for a selected posterior mean response. Exploiting prior information to specify the prior probabilities of the postulated input models and the associated prior input-parameter distributions, we use sample data to compute the posterior input-model and input-parameter distributions. Our Bayesian simulation replication algorithm involves: (i) estimating parameter uncertainty by randomly sampling the posterior input-parameter distributions; (ii) estimating stochastic uncertainty by running independent replications of the simulation using each set of input-model parameters sampled in (i); and (iii) estimating input-model uncertainty by weighting the responses generated in (ii) using the corresponding posterior input-model probabilities. Sampling effort is allocated among input models to minimize final point-estimator variance subject to a computing-budget constraint. A queueing simulation demonstrates the advantages of this approach.
Model Averaging in Risk Management with an Application to Futures Markets
, 2008
"... This paper considers the problem of model uncertainty in the case of multi-asset volatility models and discusses the use of model averaging techniques as a way of dealing with the risk of inadvertently using false models in portfolio management. Evaluation of volatility models is then considered and ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
This paper considers the problem of model uncertainty in the case of multi-asset volatility models and discusses the use of model averaging techniques as a way of dealing with the risk of inadvertently using false models in portfolio management. Evaluation of volatility models is then considered and a simple Value-at-Risk (VaR) diagnostic test is proposed for individual as well as ‘average ’ models. The asymptotic as well as the exact finite-sample distribution of the test statistic, dealing with the possibility of parameter uncertainty, are established. The model averaging idea and the VaR diagnostic tests are illustrated by an application to portfolios of daily returns on six currencies, four equity indices, four ten year government bonds and four commodities over the period 1991-2007. The empirical evidence supports the use of ‘thick’ model averaging strategies over single models or Bayesian type model averaging procedures.

