Results 1  10
of
13
True Path Rule hierarchical ensembles for genomewide gene function prediction
 IEEE ACM Transactions on Computational Biology and Bioinformatics
, 2011
"... gene function prediction ..."
(Show Context)
Boosting for highmultivariate responses in highdimensional linear regression
 Statist. Sin
, 2006
"... Abstract: We propose a boosting method, multivariate L2Boosting, for multivariate linear regression based on some squared error loss for multivariate data. It can be applied to multivariate linear regression with continuous responses and to vector autoregressive time series. We prove, for i.i.d. as ..."
Abstract

Cited by 11 (1 self)
 Add to MetaCart
Abstract: We propose a boosting method, multivariate L2Boosting, for multivariate linear regression based on some squared error loss for multivariate data. It can be applied to multivariate linear regression with continuous responses and to vector autoregressive time series. We prove, for i.i.d. as well as time series data, that multivariate L2Boosting can consistently recover sparse highdimensional multivariate linear functions, even when the number of predictor variables pn and the dimension of the response qn grow almost exponentially with sample size n, pn = qn = O(exp(Cn 1)) (0 < < 1, 0 < C < 1), but the `1norm of the true underlying function is nite. Our theory seems to be among the rst to address the issue of large dimension of the response variable; the relevance of such settings is brie
y outlined. We also identify empirically some cases where our multivariate L2Boosting is better than multiple application of univariate methods to single response components, thus demonstrating that the multivariate approach can be very useful. Key words and phrases: Highmultivariate highdimensional linear regression, L2Boosting, vector AR time series. 1.
CorrelationBased Pruning of Stacked Binary Relevance Models for MultiLabel Learning
"... Abstract. Binary relevance (BR) learns a single binary model for each different label of multilabel data. It has linear complexity with respect to the number of labels, but does not take into account label correlations and may fail to accurately predict label combinations and rank labels according ..."
Abstract

Cited by 11 (0 self)
 Add to MetaCart
(Show Context)
Abstract. Binary relevance (BR) learns a single binary model for each different label of multilabel data. It has linear complexity with respect to the number of labels, but does not take into account label correlations and may fail to accurately predict label combinations and rank labels according to relevance with a new instance. Stacking the models of BR in order to learn a model that associates their output to the true value of each label is a way to alleviate this problem. In this paper we propose the pruning of the models participating in the stacking process, by explicitly measuring the degree of label correlation using the phi coefficient. Exploratory analysis of phi shows that the correlations detected are meaningful and useful. Empirical evaluation of the pruning approach shows that it leads to substantial reduction of the computational cost of stacking and occasional improvements in predictive performance. 1
Hierarchical genre classification for large music collections
 2006 IEEE International 373 International Society for Music Information Retrieval Conference (ISMIR 2010) Conference on Multimedia and Expo  ICME 2006, Vols 15, Proceedings
, 2006
"... The rapid progress in digital music distribution has lead to the creation of large collections of music. There is a need for contentbased music classification methods to organize these collections automatically using a given genre taxonomy. To provide a versatile description of the music content, s ..."
Abstract

Cited by 8 (2 self)
 Add to MetaCart
(Show Context)
The rapid progress in digital music distribution has lead to the creation of large collections of music. There is a need for contentbased music classification methods to organize these collections automatically using a given genre taxonomy. To provide a versatile description of the music content, several kinds of features like rhythm, pitch or timbre characteristics are commonly used. Taking the highly dynamic nature of music into account, each of these features should be calculated up to several hundreds of times per second. Thus, a piece of music is represented by a complex object given by several large sets of feature vectors. In this paper, we propose a novel approach for the hierarchical classification of music pieces into a genre taxonomy. Our approach is able to handle multiple characteristics of music content and achieves a high classification accuracy efficiently, as shown in our experiments performed on a real world data set. 1.
Learning Classifiers Using Hierarchically Structured Class Taxonomies
 Proceedings of the Symposium on Abstraction, Reformulation, and Approximation (SARA
, 2005
"... Abstract. We consider classification problems in which the class labels are organized into an abstraction hierarchy in the form of a class taxonomy. We define a structured label classification problem. We explore two approaches for learning classifiers in such a setting. We also develop a class of p ..."
Abstract

Cited by 6 (1 self)
 Add to MetaCart
(Show Context)
Abstract. We consider classification problems in which the class labels are organized into an abstraction hierarchy in the form of a class taxonomy. We define a structured label classification problem. We explore two approaches for learning classifiers in such a setting. We also develop a class of performance measures for evaluating the resulting classifiers. We present preliminary results that demonstrate the promise of the proposed approaches. 1
Multirepresented kNNClassification for Large Class Sets
 10 th Intl. Proc
, 2005
"... Abstract. The amount of stored information in modern database applications increased tremendously in recent years. Besides their sheer amount, the stored data objects are also more and more complex. Therefore, classification of these complex objects is an important data mining task that yields sever ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
(Show Context)
Abstract. The amount of stored information in modern database applications increased tremendously in recent years. Besides their sheer amount, the stored data objects are also more and more complex. Therefore, classification of these complex objects is an important data mining task that yields several new challenges. In many applications, the data objects provide multiple representations. E.g. proteins can be described by text, amino acid sequences or 3D structures. Additionally, many realworld applications need to distinguish thousands of classes. Last but not least, many complex objects are not directly expressible by feature vectors. To cope with all these requirements, we introduce a novel approach to classification of multirepresented objects that is capable to distinguish large numbers of classes. Our method is based on k nearest neighbor classification and employs densitybased clustering as a new approach to reduce the training instances for instancebased classification. To predict the most likely class, our classifier employs a new method to use several object representations for making accurate class predictions. The introduced method is evaluated by classifying proteins according to the classes of Gene Ontology, one of the most established class systems for biomolecules that comprises several thousand classes. Keywords: Multirepresented objects, classification, instance based learning, k nearest neighbor classifier.
Muscle: Music classification engine with user feedback
 in Proc. EDBT
, 2006
"... Abstract. Nowadays, powerful music compression tools and cheap mass storage devices have become widely available. This allows average consumers to transfer entire music collections from the distribution medium, such as CDs and DVDs, to their computer hard drive. To locate specific pieces of music, t ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
(Show Context)
Abstract. Nowadays, powerful music compression tools and cheap mass storage devices have become widely available. This allows average consumers to transfer entire music collections from the distribution medium, such as CDs and DVDs, to their computer hard drive. To locate specific pieces of music, they are usually labeled with artist and title. Yet the user would benefit from a more intuitive organization based on music style to get an overview of the music collection. We have developed a novel tool called MUSCLE which fills this gap. While there exist approaches in the field of musical genre classification, none of them features a hierarchical classification in combination with interactive user feedback and a flexible multiple assignment of songs to classes. In this paper, we present MUSCLE, a tool which allows the user to organize large music collections in a genre taxonomy and to modify class assignments on the fly. 1
Learning to Predict One or More Ranks in Ordinal Regression Tasks
"... www.aic.uniovi.es Abstract. We present nondeterministic hypotheses learned from an ordinal regression task. They try to predict the true rank for an entry, but when the classification is uncertain the hypotheses predict a set of consecutive ranks (an interval). The aim is to keep the set of ranks as ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
(Show Context)
www.aic.uniovi.es Abstract. We present nondeterministic hypotheses learned from an ordinal regression task. They try to predict the true rank for an entry, but when the classification is uncertain the hypotheses predict a set of consecutive ranks (an interval). The aim is to keep the set of ranks as small as possible, while still containing the true rank. The justification for learning such a hypothesis is based on a real world problem arisen in breeding beef cattle. After defining a family of loss functions inspired in Information Retrieval, we derive an algorithm for minimizing them. The algorithm is based on posterior probabilities of ranks given an entry. A couple of implementations are compared: one based on a multiclass SVM and other based on Gaussian processes designed to minimize the linear loss in ordinal regression tasks. 1
Advisor Date DEDICATION To my beloved parents,
, 2012
"... Hierarchical multilabel classification for protein function prediction going beyond traditional approaches Noor Al aydie ..."
Abstract
 Add to MetaCart
(Show Context)
Hierarchical multilabel classification for protein function prediction going beyond traditional approaches Noor Al aydie
Learning Nondeterministic Classifiers Juan José
"... Nondeterministic classifiers are defined as those allowed to predict more than one class for some entries from an input space. Given that the true class should be included in predictions and the number of classes predicted should be as small as possible, these kind of classifiers can be considered a ..."
Abstract
 Add to MetaCart
Nondeterministic classifiers are defined as those allowed to predict more than one class for some entries from an input space. Given that the true class should be included in predictions and the number of classes predicted should be as small as possible, these kind of classifiers can be considered as Information Retrieval (IR) procedures. In this paper, we propose a family of IR loss functions to measure the performance of nondeterministic learners. After discussing such measures, we derive an algorithm for learning optimal nondeterministic hypotheses. Given an entry from the input space, the algorithm requires the posterior probabilities to compute the subset of classes with the lowest expected loss. From a general point of view, nondeterministic classifiers provide an improvement in the proportion of predictions that include the true class compared to their deterministic counterparts; the price to be paid for this increase is usually a tiny proportion of predictions with more than one class. The paper includes an extensive experimental study using three deterministic learners to estimate posterior probabilities: a multiclass Support Vector Machine (SVM), a Logistic Regression, and a Naïve Bayes. The data sets considered comprise both UCI multiclass learning tasks and microarray expressions of different kinds of cancer. We successfully compare nondeterministic classifiers with other alternative approaches. Additionally, we shall see how the quality of posterior probabilities (measured by the Brier score) determines the goodness of nondeterministic predictions.