Results 1  10
of
24
The Bayesian image retrieval system, PicHunter: Theory, implementation, and psychophysical experiments
 IEEE TRANSACTIONS ON IMAGE PROCESSING
, 2000
"... This paper presents the theory, design principles, implementation, and performance results of PicHunter, a prototype contentbased image retrieval (CBIR) system that has been developed over the past three years. In addition, this document presents the rationale, design, and results of psychophysica ..."
Abstract

Cited by 181 (2 self)
 Add to MetaCart
This paper presents the theory, design principles, implementation, and performance results of PicHunter, a prototype contentbased image retrieval (CBIR) system that has been developed over the past three years. In addition, this document presents the rationale, design, and results of psychophysical experiments that were conducted to address some key issues that arose during PicHunter’s development. The PicHunter project makes four primary contributions to research on contentbased image retrieval. First, PicHunter represents a simple instance of a general Bayesian framework we describe for using relevance feedback to direct a search. With an explicit model of what users would do, given what target image they want, PicHunter uses Bayes’s rule to predict what is the target they want, given their actions. This is done via a probability distribution over possible image targets, rather than by refining a query. Second, an entropyminimizing display algorithm is described that attempts to maximize the information obtained from a user at each iteration of the search. Third, PicHunter makes use of hidden annotation rather than a possibly inaccurate/inconsistent annotation structure that the user must learn and make queries in. Finally, PicHunter introduces two experimental paradigms to quantitatively evaluate the performance of the system, and psychophysical experiments are presented that support the theoretical claims.
BOAT  Optimistic Decision Tree Construction
, 1999
"... Classification is an important data mining problem. Given a training database of records, each tagged with a class label, the goal of classification is to build a concise model that can be used to predict the class label of future, unlabeled records. A very popular class of classifiers are decision ..."
Abstract

Cited by 102 (1 self)
 Add to MetaCart
Classification is an important data mining problem. Given a training database of records, each tagged with a class label, the goal of classification is to build a concise model that can be used to predict the class label of future, unlabeled records. A very popular class of classifiers are decision trees. All current algorithms to construct decision trees, including all mainmemory algorithms, make one scan over the training database per level of the tree. We introduce a new algorithm (BOAT) for decision tree construction that improves upon earlier algorithms in both performance and functionality. BOAT constructs several levels of the tree in only two scans over the training database, resulting in an average performance gain of 300% over previous work. The key to this performance improvement is a novel optimistic approach to tree construction in which we construct an initial tree using a small subset of the data and refine it to arrive at the final tree. We guarantee that any differen...
RainForest  a Framework for Fast Decision Tree Construction of Large Datasets
 In VLDB
, 1998
"... Classification of large datasets is an important data mining problem. Many classification algorithms have been proposed in the literature, but studies have shown that so far no algorithm uniformly outperforms all other algorithms in terms of quality. In this paper, we present a unifying framework fo ..."
Abstract

Cited by 95 (9 self)
 Add to MetaCart
Classification of large datasets is an important data mining problem. Many classification algorithms have been proposed in the literature, but studies have shown that so far no algorithm uniformly outperforms all other algorithms in terms of quality. In this paper, we present a unifying framework for decision tree classifiers that separates the scalability aspects of algorithms for constructing a decision tree from the central features that determine the quality of the tree. This generic algorithm is easy to instantiate with specific algorithms from the literature (including C4.5, CART,
An Optimized Interaction Strategy for Bayesian Relevance Feedback
 In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’98
, 1998
"... A new algorithm and systematic evaluation is presented for searching a database via relevance feedback. It represents a new image display strategy for the PicHunter system [2, 1]. The algorithm takes feedback in the form of relative judgments ("item A is more relevant than item B") as opposed to the ..."
Abstract

Cited by 59 (1 self)
 Add to MetaCart
A new algorithm and systematic evaluation is presented for searching a database via relevance feedback. It represents a new image display strategy for the PicHunter system [2, 1]. The algorithm takes feedback in the form of relative judgments ("item A is more relevant than item B") as opposed to the stronger assumption of categorical relevance judgments ("item A is relevant but item B is not"). It also exploits a learned probabilistic model of human behavior to make better use of the feedback it obtains. The algorithm can be viewed as an extension of indexing schemes like the kd tree to a stochastic setting, hence the name "stochasticcomparison search." In simulations, the amount of feedback required for the new algorithm scales like log 2 D, where D is the size of the database, while a simple querybyexampleapproach scales like D a , where a < 1 depends on the structure of the database. This theoretical advantage is reflected by experiments with real users on a database of 1500 stock photographs. 1
Scalable Mining for Classification Rules in Relational Databases
 in Proceedings of the International Database Engineering & Application Symposium
, 1998
"... Classification is a key function of many "business intelligence" toolkits and a fundamental building block in data mining. Immense data may be needed to train a classifier for good accuracy. The stateofart classifiers [21, 25] need an inmemory data structure of size O(N), where N is the size of t ..."
Abstract

Cited by 10 (1 self)
 Add to MetaCart
Classification is a key function of many "business intelligence" toolkits and a fundamental building block in data mining. Immense data may be needed to train a classifier for good accuracy. The stateofart classifiers [21, 25] need an inmemory data structure of size O(N), where N is the size of the training data, to achieve efficiency. For large data sets, such a data structure will not fit in the internal memory. The best previously known classifier does a quadratic number of I/Os for large N . In this paper, we propose a novel classification algorithm (classifier) called MIND (MINing in Databases). MIND can be phrased in such a way that its implementation is very easy using the extended relational calculus SQL, and this in turn allows the classifier to be built into a relational database system directly. MIND is truly scalable with respect to I/O efficiency, which is important since scalability is a key requirement for any data mining algorithm. We built a prototype of MIND in the...
Approximating Optimal Binary Decision Trees
"... Abstract. We give a (ln n + 1)approximation for the decision tree (DT) problem. We also show that DT does not have a PTAS unless P=NP. An instance of DT is a set of m binary tests T = (T1,..., Tm) and a set of n items X = (X1,..., Xn). The goal is to output a binary tree where each internal node is ..."
Abstract

Cited by 10 (0 self)
 Add to MetaCart
Abstract. We give a (ln n + 1)approximation for the decision tree (DT) problem. We also show that DT does not have a PTAS unless P=NP. An instance of DT is a set of m binary tests T = (T1,..., Tm) and a set of n items X = (X1,..., Xn). The goal is to output a binary tree where each internal node is a test, each leaf is an item and the total external path length of the tree is minimized. DT has a rich history in computer science with applications ranging from medical diagnosis to experiment design. Our work, while providing the first nontrivial upper and lower bounds on approximating DT, also demonstrates that DT and a subtly different problem which also bears the name decision tree (but which we call ConDT) have fundamentally different approximation complexity. We conclude with a stronger lower bound for a third decision tree problem called MinDT. 1
MML Inference of Oblique Decision Trees
 In Lecture Notes in Artificial Intelligence (LNAI) 3339 (Springer), Proc. 17th Australian Joint Conf. on AI
, 2004
"... Abstract. We propose a multivariate decision tree inference scheme by using the minimum message length (MML) principle (Wallace and Boulton, 1968; Wallace and Dowe, 1999). The scheme uses MML coding as an objective (goodnessoffit) function on model selection and searches with a simple evolution st ..."
Abstract

Cited by 9 (5 self)
 Add to MetaCart
Abstract. We propose a multivariate decision tree inference scheme by using the minimum message length (MML) principle (Wallace and Boulton, 1968; Wallace and Dowe, 1999). The scheme uses MML coding as an objective (goodnessoffit) function on model selection and searches with a simple evolution strategy. We test our multivariate tree inference scheme on UCI machine learning repository data sets and compare with the decision tree programs C4.5 and C5. The preliminary results show that on average and on most datasets, MML oblique trees clearly perform better than both C4.5 and C5 on both “right”/“wrong ” accuracy and probabilistic prediction and with smaller trees, i.e., less leaf nodes. 1
Finding good itemsets by packing data
 In ICDM
, 2008
"... The problem of selecting small groups of itemsets that represent the data well has recently gained a lot of attention. We approach the problem by searching for the itemsets that compress the data efficiently. As a compression technique we use decision trees combined with a refined version of MDL. Mo ..."
Abstract

Cited by 9 (5 self)
 Add to MetaCart
The problem of selecting small groups of itemsets that represent the data well has recently gained a lot of attention. We approach the problem by searching for the itemsets that compress the data efficiently. As a compression technique we use decision trees combined with a refined version of MDL. More formally, assuming that the items are ordered, we create a decision tree for each item that may only depend on the previous items. Our approach allows us to find complex interactions between the attributes, not just cooccurrences of 1s. Further, we present a link between the itemsets and the decision trees and use this link to export the itemsets from the decision trees. In this paper we present two algorithms. The first one is a simple greedy approach that builds a family of itemsets directly from data. The second one, given a collection of candidate itemsets, selects a small subset of these itemsets. Our experiments show that these approaches result in compact and high quality descriptions of the data. 1
Decision trees: an overview and their use in medicine
 Journal of Medical Systems
, 2002
"... In medical decision making (classification, diagnosing, etc.) there are many situations where decision must be made effectively and reliably. Conceptual simple decision making models with the possibility of automatic learning are the most appropriate for performing such tasks. Decision trees are a r ..."
Abstract

Cited by 6 (2 self)
 Add to MetaCart
In medical decision making (classification, diagnosing, etc.) there are many situations where decision must be made effectively and reliably. Conceptual simple decision making models with the possibility of automatic learning are the most appropriate for performing such tasks. Decision trees are a reliable and effective decision making technique that provide high classification accuracy with a simple representation of gathered knowledge and they have been used in different areas of medical decision making. In the paper we present the basic characteristics of decision trees and the successful alternatives to the traditional induction approach with the emphasis on existing and possible future applications in medicine. Key words: decision trees, classification, decision making, machine learning 1.
Searchbased Algorithms for Multilayer Perceptrons
, 2005
"... Algorithms based on systematic search techniques can be successfully applied for multilayer perceptron (MLP) training and for logical rule extraction from data using MLP networks. The proposed solutions are easier to implement and frequently outperform gradientbased optimization algorithms. Search ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
Algorithms based on systematic search techniques can be successfully applied for multilayer perceptron (MLP) training and for logical rule extraction from data using MLP networks. The proposed solutions are easier to implement and frequently outperform gradientbased optimization algorithms. Searchbased techniques, popular in artificial intelligence and almost completely neglected in neural networks can be the basis for MLP network training algorithms. There are plenty of wellknown search algorithms, however since they are not suitable for MLP training, new algorithms dedicated to this task must be developed. Search algorithms applied to MLP networks change network parameters (weights and biases) and check the influence of the changes on the error function. MLP networks considered in this thesis are used for data classification and logical rulebased understanding of the data. The proposed solutions in many cases outperform gradientbased backpropagation algorithms. The thesis is organized in three parts. The first part of the thesis concentrates on better understanding of MLP properties.