Results 1 - 10
of
20
The Bayesian image retrieval system, PicHunter: Theory, implementation, and psychophysical experiments
- IEEE TRANSACTIONS ON IMAGE PROCESSING
, 2000
"... This paper presents the theory, design principles, implementation, and performance results of PicHunter, a prototype content-based image retrieval (CBIR) system that has been developed over the past three years. In addition, this document presents the rationale, design, and results of psychophysica ..."
Abstract
-
Cited by 150 (2 self)
- Add to MetaCart
This paper presents the theory, design principles, implementation, and performance results of PicHunter, a prototype content-based image retrieval (CBIR) system that has been developed over the past three years. In addition, this document presents the rationale, design, and results of psychophysical experiments that were conducted to address some key issues that arose during PicHunter’s development. The PicHunter project makes four primary contributions to research on content-based image retrieval. First, PicHunter represents a simple instance of a general Bayesian framework we describe for using relevance feedback to direct a search. With an explicit model of what users would do, given what target image they want, PicHunter uses Bayes’s rule to predict what is the target they want, given their actions. This is done via a probability distribution over possible image targets, rather than by refining a query. Second, an entropy-minimizing display algorithm is described that attempts to maximize the information obtained from a user at each iteration of the search. Third, PicHunter makes use of hidden annotation rather than a possibly inaccurate/inconsistent annotation structure that the user must learn and make queries in. Finally, PicHunter introduces two experimental paradigms to quantitatively evaluate the performance of the system, and psychophysical experiments are presented that support the theoretical claims.
BOAT -- Optimistic Decision Tree Construction
, 1999
"... Classification is an important data mining problem. Given a training database of records, each tagged with a class label, the goal of classification is to build a concise model that can be used to predict the class label of future, unlabeled records. A very popular class of classifiers are decision ..."
Abstract
-
Cited by 97 (1 self)
- Add to MetaCart
Classification is an important data mining problem. Given a training database of records, each tagged with a class label, the goal of classification is to build a concise model that can be used to predict the class label of future, unlabeled records. A very popular class of classifiers are decision trees. All current algorithms to construct decision trees, including all main-memory algorithms, make one scan over the training database per level of the tree. We introduce a new algorithm (BOAT) for decision tree construction that improves upon earlier algorithms in both performance and functionality. BOAT constructs several levels of the tree in only two scans over the training database, resulting in an average performance gain of 300% over previous work. The key to this performance improvement is a novel optimistic approach to tree construction in which we construct an initial tree using a small subset of the data and refine it to arrive at the final tree. We guarantee that any differen...
RainForest - a Framework for Fast Decision Tree Construction of Large Datasets
- In VLDB
, 1998
"... Classification of large datasets is an important data mining problem. Many classification algorithms have been proposed in the literature, but studies have shown that so far no algorithm uniformly outperforms all other algorithms in terms of quality. In this paper, we present a unifying framework fo ..."
Abstract
-
Cited by 85 (8 self)
- Add to MetaCart
Classification of large datasets is an important data mining problem. Many classification algorithms have been proposed in the literature, but studies have shown that so far no algorithm uniformly outperforms all other algorithms in terms of quality. In this paper, we present a unifying framework for decision tree classifiers that separates the scalability aspects of algorithms for constructing a decision tree from the central features that determine the quality of the tree. This generic algorithm is easy to instantiate with specific algorithms from the literature (including C4.5, CART,
An Optimized Interaction Strategy for Bayesian Relevance Feedback
- In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’98
, 1998
"... A new algorithm and systematic evaluation is presented for searching a database via relevance feedback. It represents a new image display strategy for the PicHunter system [2, 1]. The algorithm takes feedback in the form of relative judgments ("item A is more relevant than item B") as opposed to the ..."
Abstract
-
Cited by 52 (1 self)
- Add to MetaCart
A new algorithm and systematic evaluation is presented for searching a database via relevance feedback. It represents a new image display strategy for the PicHunter system [2, 1]. The algorithm takes feedback in the form of relative judgments ("item A is more relevant than item B") as opposed to the stronger assumption of categorical relevance judgments ("item A is relevant but item B is not"). It also exploits a learned probabilistic model of human behavior to make better use of the feedback it obtains. The algorithm can be viewed as an extension of indexing schemes like the k-d tree to a stochastic setting, hence the name "stochastic-comparison search." In simulations, the amount of feedback required for the new algorithm scales like log 2 |D|, where |D| is the size of the database, while a simple query-by-exampleapproach scales like |D| a , where a < 1 depends on the structure of the database. This theoretical advantage is reflected by experiments with real users on a database of 1500 stock photographs. 1
Scalable Mining for Classification Rules in Relational Databases
- in Proceedings of the International Database Engineering & Application Symposium
, 1998
"... Classification is a key function of many "business intelligence" toolkits and a fundamental building block in data mining. Immense data may be needed to train a classifier for good accuracy. The state-of-art classifiers [21, 25] need an in-memory data structure of size O(N), where N is the size of t ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
Classification is a key function of many "business intelligence" toolkits and a fundamental building block in data mining. Immense data may be needed to train a classifier for good accuracy. The state-of-art classifiers [21, 25] need an in-memory data structure of size O(N), where N is the size of the training data, to achieve efficiency. For large data sets, such a data structure will not fit in the internal memory. The best previously known classifier does a quadratic number of I/Os for large N . In this paper, we propose a novel classification algorithm (classifier) called MIND (MINing in Databases). MIND can be phrased in such a way that its implementation is very easy using the extended relational calculus SQL, and this in turn allows the classifier to be built into a relational database system directly. MIND is truly scalable with respect to I/O efficiency, which is important since scalability is a key requirement for any data mining algorithm. We built a prototype of MIND in the...
MML Inference of Oblique Decision Trees
- In Lecture Notes in Artificial Intelligence (LNAI) 3339 (Springer), Proc. 17th Australian Joint Conf. on AI
, 2004
"... Abstract. We propose a multivariate decision tree inference scheme by using the minimum message length (MML) principle (Wallace and Boulton, 1968; Wallace and Dowe, 1999). The scheme uses MML coding as an objective (goodness-of-fit) function on model selection and searches with a simple evolution st ..."
Abstract
-
Cited by 8 (5 self)
- Add to MetaCart
Abstract. We propose a multivariate decision tree inference scheme by using the minimum message length (MML) principle (Wallace and Boulton, 1968; Wallace and Dowe, 1999). The scheme uses MML coding as an objective (goodness-of-fit) function on model selection and searches with a simple evolution strategy. We test our multivariate tree inference scheme on UCI machine learning repository data sets and compare with the decision tree programs C4.5 and C5. The preliminary results show that on average and on most data-sets, MML oblique trees clearly perform better than both C4.5 and C5 on both “right”/“wrong ” accuracy and probabilistic prediction- and with smaller trees, i.e., less leaf nodes. 1
Decision trees: an overview and their use in medicine
- Journal of Medical Systems
, 2002
"... In medical decision making (classification, diagnosing, etc.) there are many situations where decision must be made effectively and reliably. Conceptual simple decision making models with the possibility of automatic learning are the most appropriate for performing such tasks. Decision trees are a r ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
In medical decision making (classification, diagnosing, etc.) there are many situations where decision must be made effectively and reliably. Conceptual simple decision making models with the possibility of automatic learning are the most appropriate for performing such tasks. Decision trees are a reliable and effective decision making technique that provide high classification accuracy with a simple representation of gathered knowledge and they have been used in different areas of medical decision making. In the paper we present the basic characteristics of decision trees and the successful alternatives to the traditional induction approach with the emphasis on existing and possible future applications in medicine. Key words: decision trees, classification, decision making, machine learning 1.
Approximating Optimal Binary Decision Trees
"... Abstract. We give a (ln n + 1)-approximation for the decision tree (DT) problem. We also show that DT does not have a PTAS unless P=NP. An instance of DT is a set of m binary tests T = (T1,..., Tm) and a set of n items X = (X1,..., Xn). The goal is to output a binary tree where each internal node is ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
Abstract. We give a (ln n + 1)-approximation for the decision tree (DT) problem. We also show that DT does not have a PTAS unless P=NP. An instance of DT is a set of m binary tests T = (T1,..., Tm) and a set of n items X = (X1,..., Xn). The goal is to output a binary tree where each internal node is a test, each leaf is an item and the total external path length of the tree is minimized. DT has a rich history in computer science with applications ranging from medical diagnosis to experiment design. Our work, while providing the first non-trivial upper and lower bounds on approximating DT, also demonstrates that DT and a subtly different problem which also bears the name decision tree (but which we call ConDT) have fundamentally different approximation complexity. We conclude with a stronger lower bound for a third decision tree problem called MinDT. 1
Knowledge discovery with classification rules in a cardiovascular dataset
- Computer Methods and Programs in Biomedicine, 80: S39-S49
, 2005
"... Abstract. In the paper we study an evolutionary machine learning approach to data mining and knowledge discovery based on the classification rules induction. A method for automatic rules induction called AREX using evolutionary induction of decision trees and automatic programming is introduced. The ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Abstract. In the paper we study an evolutionary machine learning approach to data mining and knowledge discovery based on the classification rules induction. A method for automatic rules induction called AREX using evolutionary induction of decision trees and automatic programming is introduced. The proposed algorithm is applied to a cardiovascular dataset consisting of different groups of attributes which should possibly reveal the presence of some specific cardiovascular problems in young patients. A case study is presented that shows the use of AREX for the classification of patients and for discovering possible new medical knowledge from the dataset. The defined knowledge discovery loop comprises a medical expert’s assessment of induced rules to drive the evolution of rule sets towards more appropriate solutions. The final result is the discovery of a possible new medical knowledge in the field of pediatric cardiology. Index terms: machine learning, knowledge discovery, classification rules, pediatric cardiology, medical data mining 1.
Search-based Algorithms for Multilayer Perceptrons
, 2005
"... Algorithms based on systematic search techniques can be successfully applied for multilayer perceptron (MLP) training and for logical rule extraction from data using MLP networks. The proposed solutions are easier to implement and frequently outperform gradient-based optimization algorithms. Search- ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
Algorithms based on systematic search techniques can be successfully applied for multilayer perceptron (MLP) training and for logical rule extraction from data using MLP networks. The proposed solutions are easier to implement and frequently outperform gradient-based optimization algorithms. Search-based techniques, popular in artificial intelligence and almost completely neglected in neural networks can be the basis for MLP network training algorithms. There are plenty of well-known search algorithms, however since they are not suitable for MLP training, new algorithms dedicated to this task must be developed. Search algorithms applied to MLP networks change network parameters (weights and biases) and check the influence of the changes on the error function. MLP networks considered in this thesis are used for data classification and logical rule-based understanding of the data. The proposed solutions in many cases outperform gradient-based backpropagation algorithms. The thesis is organized in three parts. The first part of the thesis concentrates on better understanding of MLP properties.

