Results 1  10
of
87
Just relax: Convex programming methods for subset selection and sparse approximation
, 2004
"... Abstract. Subset selection and sparse approximation problems request a good approximation of an input signal using a linear combination of elementary signals, yet they stipulate that the approximation may only involve a few of the elementary signals. This class of problems arises throughout electric ..."
Abstract

Cited by 92 (4 self)
 Add to MetaCart
(Show Context)
Abstract. Subset selection and sparse approximation problems request a good approximation of an input signal using a linear combination of elementary signals, yet they stipulate that the approximation may only involve a few of the elementary signals. This class of problems arises throughout electrical engineering, applied mathematics and statistics, but small theoretical progress has been made over the last fifty years. Subset selection and sparse approximation both admit natural convex relaxations, but the literature contains few results on the behavior of these relaxations for general input signals. This report demonstrates that the solution of the convex program frequently coincides with the solution of the original approximation problem. The proofs depend essentially on geometric properties of the ensemble of elementary signals. The results are powerful because sparse approximation problems are combinatorial, while convex programs can be solved in polynomial time with standard software. Comparable new results for a greedy algorithm, Orthogonal Matching Pursuit, are also stated. This report should have a major practical impact because the theory applies immediately to many realworld signal processing problems. 1.
A tutorial introduction to the minimum description length principle
 in Advances in Minimum Description Length: Theory and Applications. 2005
"... ..."
(Show Context)
Towards an effective cooperation of the user and the computer for classification
 Proc. 6th Intl. Conf. on Knowledge Discovery and Data Mining (KDD ’00
, 2000
"... Decision trees have been successfully used for the task of classification. However, stateoftheart algorithms do not incorporate the user in the tree construction process. This paper presents a new usercentered approach to this process where the user and the computer can both contribute their str ..."
Abstract

Cited by 45 (2 self)
 Add to MetaCart
(Show Context)
Decision trees have been successfully used for the task of classification. However, stateoftheart algorithms do not incorporate the user in the tree construction process. This paper presents a new usercentered approach to this process where the user and the computer can both contribute their strengths: the user provides domain knowledge and evaluates intermediate results of the algorithm, the computer automatically creates patterns satisfying user constraints and generates appropriate visualizations of these patterns. In this cooperative approach, domain knowledge of the user can direct the search of the algorithm. Additionally, by providing adequate data and knowledge visualizations, the pattern recognition capabilities of the human can be used to increase the effectivity of decision tree construction. Furthermore, the user gets a deeper understanding of the decision tree than just obtaining it as a result of an algorithm. To achieve the intended level of cooperation, we introduce a new visualization of data with categorical and numerical attributes. A novel technique for visualizing decision trees is presented which provides deep insights into the process of decision tree construction. As a key contribution, we integrate a stateoftheart algorithm for decision tree construction such that many different styles of cooperation ranging from completely manual over combined to completely automatic classification are supported. An experimental performance evaluation demonstrates that our cooperative approach yields an efficient construction of decision trees that have a small size, but a high accuracy. 1.
PNrule: A new framework for learning classifier models in data mining (a casestudy in network intrusion detection
 IBM Research Report, Computer Science/Mathematics
, 2000
"... Learning classifier models is an important problem in data mining. Observations from the real world are often recorded as a set of records, each characterized by multiple attributes. Associated with each record is a categorical attribute called class. Given a training set of records with known class ..."
Abstract

Cited by 30 (1 self)
 Add to MetaCart
(Show Context)
Learning classifier models is an important problem in data mining. Observations from the real world are often recorded as a set of records, each characterized by multiple attributes. Associated with each record is a categorical attribute called class. Given a training set of records with known class labels, the problem is to
An MDL Method for Finding Haplotype Blocks and for Estimating the Strength of Haplotype Block Boundaries
, 2003
"... ..."
An Empirical Comparison of Supervised Machine Learning Techniques in Bioinformatics
, 2003
"... Research in bioinformatics is driven by the experimental data. Current biological databases are populated by vast amounts of experimental data. Machine learning has been widely applied to bioinformatics and has gained a lot of success in this research area. At present, with various learning algorith ..."
Abstract

Cited by 20 (2 self)
 Add to MetaCart
Research in bioinformatics is driven by the experimental data. Current biological databases are populated by vast amounts of experimental data. Machine learning has been widely applied to bioinformatics and has gained a lot of success in this research area. At present, with various learning algorithms available in the literature, researchers are facing difficulties in choosing the best method that can apply to their data. We performed an empirical study on 7 individual learning systems and 9 different combined methods on 4 different biological data sets, and provide some suggested issues to be considered when answering the following questions: (i) How does one choose which algorithm is best suitable for their data set? (ii) Are combined methods better than a single approach? (iii) How does one compare the effectiveness of a particular algorithm to the others? Keywords: Supervised machine learning, bioinformatics, ensemble methods, performance evaluation.
Learning of boolean functions using support vector machines
 In Proc. of the 12th International Conference on Algorithmic Learning Theory
, 2001
"... Abstract. This paper concerns the design of a Support Vector Machine (SVM) appropriate for the learning of Boolean functions. This is motivated by the need of a more sophisticated algorithm for classification in discrete attribute spaces. Classification in discrete attribute spaces is reduced to the ..."
Abstract

Cited by 17 (0 self)
 Add to MetaCart
(Show Context)
Abstract. This paper concerns the design of a Support Vector Machine (SVM) appropriate for the learning of Boolean functions. This is motivated by the need of a more sophisticated algorithm for classification in discrete attribute spaces. Classification in discrete attribute spaces is reduced to the problem of learning Boolean functions from examples of its input/output behavior. Since any Boolean function can be written in Disjunctive Normal Form (DNF), it can be represented as a weighted linear sum of all possible conjunctions of Boolean literals. This paper presents a particular kernel function called the DNF kernel which enables SVMs to efficiently learn such linear functions in the highdimensional space whose coordinates correspond to all possible conjunctions. For a limited form of DNF consisting of positive Boolean literals, the monotone DNF kernel is also presented. SVMs employing these kernel functions can perform the learning in a highdimensional feature space whose features are derived from given basic attributes. In addition, it is expected that SVMs’ wellfounded capacity control alleviates overfitting. In fact, an empirical study on learning of randomly generated Boolean functions shows that the resulting algorithm outperforms C4.5. Furthermore, in comparison with SVMs employing the Gaussian kernel, it is shown that DNF kernel produces accuracy comparable to best adjusted Gaussian kernels. 1
Feature selection for Descriptor based Classification Models
 Part II  Human Intestinal Absorption (HIA). J. Chem. Inf. Comput. Sci
, 2003
"... The paper describes different aspects of classification models based on molecular data sets with the focus on feature selection methods. Especially model quality and avoiding a high variance on unseen data (overfitting) will be discussed with respect to the feature selection problem. We present seve ..."
Abstract

Cited by 15 (2 self)
 Add to MetaCart
(Show Context)
The paper describes different aspects of classification models based on molecular data sets with the focus on feature selection methods. Especially model quality and avoiding a high variance on unseen data (overfitting) will be discussed with respect to the feature selection problem. We present several standard approaches and modifications of our Genetic Algorithm based on the Shannon Entropy Cliques (GASEC) algorithm and the extension for classification problems using boosting.
Learning from ambiguously labeled examples
 Intell. Data Anal
, 2006
"... Inducing a classification function from a set of examples in the form of labeled instances is a standard problem in supervised machine learning. In this paper, we are concerned with ambiguous label classification (ALC), an extension of this setting in which several candidate labels may be assigned t ..."
Abstract

Cited by 13 (1 self)
 Add to MetaCart
(Show Context)
Inducing a classification function from a set of examples in the form of labeled instances is a standard problem in supervised machine learning. In this paper, we are concerned with ambiguous label classification (ALC), an extension of this setting in which several candidate labels may be assigned to a single example. By extending three concrete classification methods to the ALC setting (nearest neighbor classification, decision tree learning, and rule induction) and evaluating their performance on benchmark data sets, we show that appropriately designed learning algorithms can successfully exploit the information contained in ambiguously labeled examples. Our results indicate that the fundamental idea of the extended methods, namely to disambiguate the label information by means of the inductive bias underlying (heuristic) machine learning methods, works well in practice.
A BiasVariance Analysis of a Real World Learning Problem: The CoIL Challenge 2000
 Machine Learning
, 2004
"... Abstract. The CoIL Challenge 2000 data mining competition attracted a wide variety of solutions, both in terms of approaches and performance. The goal of the competition was to predict who would be interested in buying a specific insurance product and to explain why people would buy. Unlike in most ..."
Abstract

Cited by 13 (0 self)
 Add to MetaCart
(Show Context)
Abstract. The CoIL Challenge 2000 data mining competition attracted a wide variety of solutions, both in terms of approaches and performance. The goal of the competition was to predict who would be interested in buying a specific insurance product and to explain why people would buy. Unlike in most other competitions, the majority of participants provided a report describing the path to their solution. In this article we use the framework of biasvariance decomposition of error to analyze what caused the wide range of prediction performance. We characterize the challenge problem to make it comparable to other problems and evaluate why certain methods work or not. We also include an evaluation of the submitted explanations by a marketing expert. We find that variance is the key component of error for this problem. Participants use various strategies in data preparation and model development that reduce variance error, such as feature selection and the use of simple, robust and low variance learners like Naive Bayes. Adding constructed features, modeling with complex, weak bias learners and extensive fine tuning by the participants often increase the variance error.