Results 1  10
of
46
Mining DistanceBased Outliers in Near Linear Time with Randomization and a Simple Pruning Rule
, 2003
"... Defining outliers by their distance to neighboring examples is a popular approach to finding unusual examples in a data set. Recently, much work has been conducted with the goal of finding fast algorithms for this task. We show that a simple nested loop algorithm that in the worst case is quadratic ..."
Abstract

Cited by 104 (4 self)
 Add to MetaCart
Defining outliers by their distance to neighboring examples is a popular approach to finding unusual examples in a data set. Recently, much work has been conducted with the goal of finding fast algorithms for this task. We show that a simple nested loop algorithm that in the worst case is quadratic can give near linear time performance when the data is in random order and a simple pruning rule is used. We test our algorithm on real highdimensional data sets with millions of examples and show that the near linear scaling holds over several orders of magnitude. Our average case analysis suggests that much of the e#ciency is because the time to process nonoutliers, which are the majority of examples, does not depend on the size of the data set.
Topdown induction of clustering trees
 In 15th Int’l Conf. on Machine Learning
, 1998
"... An approach to clustering is presented that adapts the basic topdown induction of decision trees method towards clustering. To this aim, it employs the principles of instance based learning. The resulting methodology is implemented in the TIC (Top down Induction of Clustering trees) system for firs ..."
Abstract

Cited by 99 (22 self)
 Add to MetaCart
An approach to clustering is presented that adapts the basic topdown induction of decision trees method towards clustering. To this aim, it employs the principles of instance based learning. The resulting methodology is implemented in the TIC (Top down Induction of Clustering trees) system for first order clustering. The TIC system employs the first order logical decision tree representation of the inductive logic programming system Tilde. Various experiments with TIC are presented, in both propositional and relational domains. 1
A Simple Relational Classifier
 Proceedings of the Second Workshop on MultiRelational Data Mining (MRDM2003) at KDD2003
, 2003
"... We analyze a Relational Neighbor (RN) classifier, a simple relational predictive model that predicts only based on class labels of related neighbors, using no learning and no inherent attributes. We show that it performs surprisingly well by comparing it to more complex models such as Probabilist ..."
Abstract

Cited by 82 (14 self)
 Add to MetaCart
We analyze a Relational Neighbor (RN) classifier, a simple relational predictive model that predicts only based on class labels of related neighbors, using no learning and no inherent attributes. We show that it performs surprisingly well by comparing it to more complex models such as Probabilistic Relational Models and Relational Probability Trees on three data sets from published work.
Kernels and Distances for Structured Data
 Machine Learning
, 2004
"... This paper brings together two strands of machine learning of increasing importance: kernel methods and highly structured data. We propose a general method for constructing a kernel following the syntactic structure of the data, as defined by its type signature in a higherorder logic. Our main theo ..."
Abstract

Cited by 49 (3 self)
 Add to MetaCart
This paper brings together two strands of machine learning of increasing importance: kernel methods and highly structured data. We propose a general method for constructing a kernel following the syntactic structure of the data, as defined by its type signature in a higherorder logic. Our main theoretical result is the positive definiteness of any kernel thus defined. We report encouraging experimental results on a range of realworld datasets. By converting our kernel to a distance pseudometric for 1nearest neighbour, we were able to improve the best accuracy from the literature on the Diterpene dataset by more than 10%.
A Polynomial Time Computable Metric Between Point Sets
, 2000
"... Measuring the similarity or distance between two sets of points in a metric space is an important problem in machine learning and has also applications in other disciplines e.g. in computational geometry, philosophy of science, methods for updating or changing theories, . . . . Recently Eiter and Ma ..."
Abstract

Cited by 40 (3 self)
 Add to MetaCart
Measuring the similarity or distance between two sets of points in a metric space is an important problem in machine learning and has also applications in other disciplines e.g. in computational geometry, philosophy of science, methods for updating or changing theories, . . . . Recently Eiter and Mannila have proposed a new measure which is computable in polynomial time. However, it is not a distance function in the mathematical sense because it does not satisfy the triangle inequality.
Distance Between Herbrand Interpretations: a measure for approximations to a target concept
, 1997
"... . We can use a metric to measure the di#erences between elements in a domain or subsets of that domain #i.e. concepts#. Which particular metric should be chosen, depends on the kind of di#erence wewant to measure. The well known Euclidean metric on # n and its generalizations are often used f ..."
Abstract

Cited by 38 (0 self)
 Add to MetaCart
. We can use a metric to measure the di#erences between elements in a domain or subsets of that domain #i.e. concepts#. Which particular metric should be chosen, depends on the kind of di#erence wewant to measure. The well known Euclidean metric on # n and its generalizations are often used for this purpose, but such metrics are not always suitable for concepts where elements have some structure di#erent from real numbers. For example, in #Inductive# Logic Programming a concept is often expressed as an Herbrand interpretation of some #rstorder language. Every element in an Herbrand interpretation is a ground atom which has a tree structure. We start by de#ning a metric d on the set of expressions #ground atoms and ground terms#, motivated by the structure and complexity of the expressions and the symbols used therein. This metric induces the Hausdor # metric h on the set of all sets of ground atoms, which allows us to measure the distance between Herbrand interpretatio...
Topdown induction of logical decision trees
 Artificial Intelligence
, 1998
"... Topdown induction of decision trees (TDIDT) is a very popular machine learning technique. Up till now, it has mainly been used for propositional learning, but seldomly for relational learning or inductive logic programming. The main contribution of this paper is the introduction of logical decision ..."
Abstract

Cited by 31 (1 self)
 Add to MetaCart
Topdown induction of decision trees (TDIDT) is a very popular machine learning technique. Up till now, it has mainly been used for propositional learning, but seldomly for relational learning or inductive logic programming. The main contribution of this paper is the introduction of logical decision trees, which make it possible to use TDIDT in inductive logic programming. An implementation of this topdown induction of logical decision trees, the Tilde system, is presented and experimentally evaluated. 1
The omnipresence of casebased reasoning in science and application
 KNOWLEDGEBASED SYSTEMS
, 1998
"... A surprisingly large number of research disciplines have contributed towards the development of knowledge on lazy problem solving, which is characterized by its storage of ground cases and its demand driven response to queries. Casebased reasoning (CBR) is an alternative, increasingly popular appro ..."
Abstract

Cited by 29 (0 self)
 Add to MetaCart
A surprisingly large number of research disciplines have contributed towards the development of knowledge on lazy problem solving, which is characterized by its storage of ground cases and its demand driven response to queries. Casebased reasoning (CBR) is an alternative, increasingly popular approach for designing expert systems that implements this approach. This paper lists pointers to some contributions in some related disciplines that offer insights for CBR research. We then outline a small number of Navy applications based on this approach that demonstrate its breadth of applicability. Finally, we list a few successful and failed attempts to apply CBR, and list some predictions on the future roles of CBR in applications.
Using Logical Decision Trees for Clustering
 In Proceedings of the 7th International Workshop on Inductive Logic Programming
, 1997
"... A novel first order clustering system, called C 0.5, is presented. It inherits its logical decision tree formalism from the TILDE system, but instead of using class information to guide the search, it employs the principles of instance based learning in order to perform clustering. Various experimen ..."
Abstract

Cited by 22 (2 self)
 Add to MetaCart
A novel first order clustering system, called C 0.5, is presented. It inherits its logical decision tree formalism from the TILDE system, but instead of using class information to guide the search, it employs the principles of instance based learning in order to perform clustering. Various experiments are discussed, which show the promise of the approach. 1 Introduction A decision tree is usually seen as representing a theory for classification of examples. If the examples are positive and negative examples for one specific concept, then the tree defines these two concepts. One could also say, if there are k classes, that the tree defines k concepts. Another viewpoint is taken in Langley's Elements of Machine Learning [ Langley, 1996 ] . Langley sees decision tree induction as a special case of the induction of concept hierarchies. A concept is associated with each node of the tree, and as such the tree represents a kind of taxonomy, a hierarchy of many concepts. This is very similar...