Results 1 - 10
of
79
Hierarchical discriminant regression
- IEEE Trans. Pattern Anal. Mach. Intell
, 2000
"... AbstractÐThe main motivation of this paper is to propose a new classification and regression method for challenging highdimensional data. The proposed new technique casts classification problems (class labels as output) and regression problems (numeric values as output) into a unified regression pro ..."
Abstract
-
Cited by 38 (21 self)
- Add to MetaCart
AbstractÐThe main motivation of this paper is to propose a new classification and regression method for challenging highdimensional data. The proposed new technique casts classification problems (class labels as output) and regression problems (numeric values as output) into a unified regression problem. This unified view enables classification problems to use numeric information in the output space that is available for regression problems but are traditionally not readily available for classification problemsÐdistance metric among clustered class labels for coarse and fine classifications. A doubly clustered subspace-based hierarchical discriminating regression (HDR) method is proposed in this work. The major characteristics include: 1) Clustering is performed in both output space and input space at each internal node, termed ªdoubly clustered.º Clustering in the output space provides virtual labels for computing clusters in the input space. 2) Discriminants in the input space are automatically derived from the clusters in the input space. These discriminants span the discriminating subspace at each internal node of the tree. 3) A hierarchical probability distribution model is applied to the resulting discriminating subspace at each internal node. This realizes a coarse-to-fine approximation of probability distribution of the input samples, in the hierarchical discriminating subspaces. No global distribution models are assumed. 4) To relax the per class sample requirement of traditional discriminant analysis techniques, a sample-size dependent negative-log-likelihood (NLL) is introduced. This new technique is designed for automatically dealing with small-sample applications, large-sample applications, and unbalanced-sample applications. 5) The execution of HDR method is fast, due to the empirical logarithmic time complexity of the HDR algorithm. Although the method is applicable to any data, we report the experimental results for three types of data: synthetic data for examining the near-optimal performance, large raw face-image data bases, and traditional databases with manually selected features along with a comparison with some major existing methods, such as CART,
Hierarchical Discriminant Analysis for Image Retrieval
- IEEE Trans. PAMI
, 1999
"... Abstract—A self-organizing framework for object recognition is described. We describe a hierarchical database structure for image retrieval. The Self-Organizing Hierarchical Optimal Subspace Learning and Inference Framework (SHOSLIF) system uses the theories of optimal linear projection for automati ..."
Abstract
-
Cited by 33 (3 self)
- Add to MetaCart
Abstract—A self-organizing framework for object recognition is described. We describe a hierarchical database structure for image retrieval. The Self-Organizing Hierarchical Optimal Subspace Learning and Inference Framework (SHOSLIF) system uses the theories of optimal linear projection for automatic optimal feature derivation and a hierarchical structure to achieve a logarithmic retrieval complexity. A Space-Tessellation Tree is automatically generated using the Most Expressive Features (MEFs) and the Most Discriminating Features (MDFs) at each level of the tree. The major characteristics of the proposed hierarchical discriminant analysis include: 1) avoiding the limitation of global linear features (hyperplanes as separators) by deriving a recursively better-fitted set of features for each of the recursively subdivided sets of training samples; 2) generating a smaller tree whose cell boundaries separate the samples along the class boundaries better than the principal component analysis, thereby giving a better generalization capability (i.e., better recognition rate in a disjoint test); 3) accelerating the retrieval using a tree structure for data pruning, utilizing a different set of discriminant features at each level of the tree. We allow for perturbations in the size and position of objects in the images through learning. We demonstrate the technique on a large image database of widely varying real-world objects taken in natural settings, and show the applicability of the approach for variability in position, size, and 3D orientation. This paper concentrates on the hierarchical partitioning of the feature spaces. Index Terms—Principal component analysis, discriminant analysis, hierarchical image database, image retrieval, tessellation, partitioning, object recognition, face recognition, complexity with large image databases.
Discovering Interesting Patterns for Investment Decision Making with GLOWER - A Genetic Learner Overlaid With Entropy Reduction
, 2000
"... Prediction in financial domains is notoriously difficult for a number of reasons. First, theories tend to be weak or non-existent, which makes problem formulation open ended by forcing us to consider a large number of independent variables and thereby increasing the dimensionality of the search spac ..."
Abstract
-
Cited by 27 (0 self)
- Add to MetaCart
Prediction in financial domains is notoriously difficult for a number of reasons. First, theories tend to be weak or non-existent, which makes problem formulation open ended by forcing us to consider a large number of independent variables and thereby increasing the dimensionality of the search space. Second, the weak relationships among variables tend to be nonlinear, and may hold only in limited areas of the search space. Third, in financial practice, where analysts conduct extensive manual analysis of historically well performing indicators, a key is to find the hidden interactions among variables that perform well in combination. Unfortunately, these are exactly the patterns that the greedy search biases incorporated by many standard rule learning algorithms will miss. In this paper, we describe and evaluate several variations of a new genetic learning algorithm (GLOWER) on a variety of data sets. The design of GLOWER has been motivated by financial prediction problems, but incorpo...
Shared Memory Parallelization of Data Mining Algorithms: Techniques, Programming Interface, and Performance
- In Proceedings of the second SIAM conference on Data Mining
, 2002
"... With recent technological advances, shared memory parallel machines have become more scalable, and oer large main memories and high bus bandwidths. They are emerging as good platforms for data warehousing and data mining. In this paper, we focus on shared memory parallelization of data mining alg ..."
Abstract
-
Cited by 22 (7 self)
- Add to MetaCart
With recent technological advances, shared memory parallel machines have become more scalable, and oer large main memories and high bus bandwidths. They are emerging as good platforms for data warehousing and data mining. In this paper, we focus on shared memory parallelization of data mining algorithms.
Minimax-optimal classification with dyadic decision trees
- IEEE TRANSACTIONS ON INFORMATION THEORY
, 2006
"... Decision trees are among the most popular types of classifiers, with interpretability and ease of im-plementation being among their chief attributes. Despite the widespread use of decision trees, theoretical analysis of their performance has only begun to emerge in recent years. In this paper it is ..."
Abstract
-
Cited by 20 (4 self)
- Add to MetaCart
Decision trees are among the most popular types of classifiers, with interpretability and ease of im-plementation being among their chief attributes. Despite the widespread use of decision trees, theoretical analysis of their performance has only begun to emerge in recent years. In this paper it is shown that a new family of decision trees, dyadic decision trees (DDTs), attain nearly optimal (in a minimax sense) rates of convergence for a broad range of classification problems. Furthermore, DDTs are surprisingly adaptive in three important respects: They automatically (1) adapt to favorable conditions near the Bayes decision boundary; (2) focus on data distributed on lower dimensional manifolds; and (3) reject irrelevant features. DDTs are constructed by penalized empirical risk minimization using a new data-dependent penalty and may be computed exactly with computational complexity that is nearly linear in the training sample size. DDTs are the first classifier known to achieve nearly optimal rates for the diverse class of distributions studied here while also being practical and implementable. This is also the first study (of which we are aware) to consider rates for adaptation to intrinsic data dimension and relevant features.
Application of Genetic Programming to Induction of Linear Classification Trees
- In Proceedings of the Third European Conference on Genetic Programming
, 2000
"... . A common problem in datamining is to find accurate classifiers for a dataset. For this purpose, genetic programming (GP) is applied to a set of benchmark classification problems. Using GP we are able to induce decision trees with a linear combination of variables in each function node. A new r ..."
Abstract
-
Cited by 19 (1 self)
- Add to MetaCart
. A common problem in datamining is to find accurate classifiers for a dataset. For this purpose, genetic programming (GP) is applied to a set of benchmark classification problems. Using GP we are able to induce decision trees with a linear combination of variables in each function node. A new representation of decision trees using strong typing in GP is introduced. With this representation it is possible to let the GP classify into any number of classes. Results indicate that GP can be applied successfully to classification problems. Comparisons with current state-of-the-art algorithms in machine learning are presented and areas of future research are identified. 1 Introduction Classification problems form an important area in datamining. For example, a bank may want to classify its clients in good and bad credit risks or a doctor may want to classify his patients as having diabetes or not. Classifiers may take the form of decision trees [11] (see Figure 1). In each node, a...
An Incremental Learning Algorithm with Automatically Derived Discriminating Features
, 2000
"... We propose a new technique which incrementally derive discriminating features in the input space. This technique casts both classification problems (class labels as outputs) and regression problems (numerical values as outputs) into a unified regression problem. The virtual labels are formed by clus ..."
Abstract
-
Cited by 13 (7 self)
- Add to MetaCart
We propose a new technique which incrementally derive discriminating features in the input space. This technique casts both classification problems (class labels as outputs) and regression problems (numerical values as outputs) into a unified regression problem. The virtual labels are formed by clustering in the output space. We use these virtual labels to extract discriminating features in the input space. This procedure is performed recursively. We organize the resulting discriminating subspace in a coarse-to-fine fashion and store the information in a decision tree. Such an incrementally hierarchical discriminating regression (IHDR) decision tree can be realized as a hierarchical probability distribution model. We also introduce a sample size dependent negativelog -likelihood (NLL) metric to deal with large-sample size cases, small-sample size cases, and unbalanced-sample size cases. This is very essential since the number of training samples per class are different at each internal node of the IHDR tree. We report experimental results for two types of data: face image data along with comparison with some major appearance-based method and decision trees, hall way images with driving directions as outputs for the automatic navigation problem -- a regression application.
Developmental Robots: Theory, Method and Experimental Results
- In Proc. of the International Symposium on Humanoid Robots
"... It is very challenging for humans to program a humanoid robot to act properly in human environment. Humans have a fundamental limitation in constructing an adequate model for the world or an adequate behavior model for the robot, because of the complexity of such models and the unpredictable unknown ..."
Abstract
-
Cited by 13 (5 self)
- Add to MetaCart
It is very challenging for humans to program a humanoid robot to act properly in human environment. Humans have a fundamental limitation in constructing an adequate model for the world or an adequate behavior model for the robot, because of the complexity of such models and the unpredictable unknown environments that the models must apply. This article introduces a new approach to intelligent robots, the developmental approach, which is different from other existing major approaches: knowledge-based, behaviorbased, learning-based, and evolutionary approaches. The developmental approach is motivated by human mental development from infancy to adulthood, during which each human individual develops his cognitive and behavioral capabilities through interactions with the environment. This approach results in a new kind of robots, developmental robots --- robots that can develop automatically. These robots require a new kind of algorithm --- developmental algorithm --- which enables the robo...
Learning sorting and decision trees with POMDPs
- In Proceedings of the Fifteenth International Conference on Machine Learning
, 1998
"... pomdps are general models of sequential decisions in which both actions and observations can be probabilistic. Many problems of interest, including extracting decision trees from data, can be formulated as pomdps yet the use of pomdps has been limited by the lack of effective algorithms. Recently th ..."
Abstract
-
Cited by 11 (5 self)
- Add to MetaCart
pomdps are general models of sequential decisions in which both actions and observations can be probabilistic. Many problems of interest, including extracting decision trees from data, can be formulated as pomdps yet the use of pomdps has been limited by the lack of effective algorithms. Recently this has started to change and a number of problems such as robot navigation and planning are beginning to be formulated and solved as pomdps. The advantage of the pomdp approach is its clean semantics and its ability to produce principled solutions that integrate physical and information gathering actions. In this paper we pursue this approach in the context of two learning tasks: learning to sort a vector of numbers and learning decision trees from data. Both problems are formulated as pomdps and solved by a general pomdp algorithm. The main lessons and results are the following: 1. the use of suitable heuristics and representations allows us to solve sorting and classification pomdps of n...
Omnivariate Decision Trees
"... Univariate decision trees at each decision node consider the value of only one feature leading to axis-aligned splits. In a linear multivariate decision tree, each decision node divides the input space into two with a hyperplane. In a nonlinear multivariate tree, a multilayer perceptron at each node ..."
Abstract
-
Cited by 11 (6 self)
- Add to MetaCart
Univariate decision trees at each decision node consider the value of only one feature leading to axis-aligned splits. In a linear multivariate decision tree, each decision node divides the input space into two with a hyperplane. In a nonlinear multivariate tree, a multilayer perceptron at each node divides the input space arbitrarily, at the expense of increased complexity and higher risk of overfitting. We propose omnivariate trees where the decision node may be univariate, linear, or nonlinear depending on the outcome of comparative statistical tests on accuracy thus matching automatically the complexity of the node with the subproblem defined by the data reaching that node. Such an architecture frees the designer from choosing the appropriate node type, doing model selection automatically at each node. Our simulation results indicate that such a decision tree induction method generalizes better than trees with the same types of nodes everywhere and induces small trees.

