Results 1  10
of
2,743
Normalized Cuts and Image Segmentation
 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
, 2000
"... ..."
(Show Context)
Mean shift: A robust approach toward feature space analysis
 In PAMI
, 2002
"... A general nonparametric technique is proposed for the analysis of a complex multimodal feature space and to delineate arbitrarily shaped clusters in it. The basic computational module of the technique is an old pattern recognition procedure, the mean shift. We prove for discrete data the convergence ..."
Abstract

Cited by 2375 (40 self)
 Add to MetaCart
(Show Context)
A general nonparametric technique is proposed for the analysis of a complex multimodal feature space and to delineate arbitrarily shaped clusters in it. The basic computational module of the technique is an old pattern recognition procedure, the mean shift. We prove for discrete data the convergence of a recursive mean shift procedure to the nearest stationary point of the underlying density function and thus its utility in detecting the modes of the density. The equivalence of the mean shift procedure to the Nadaraya–Watson estimator from kernel regression and the robust Mestimators of location is also established. Algorithms for two lowlevel vision tasks, discontinuity preserving smoothing and image segmentation are described as applications. In these algorithms the only user set parameter is the resolution of the analysis, and either gray level or color images are accepted as input. Extensive experimental results illustrate their excellent performance.
Statistical pattern recognition: A review
 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
, 2000
"... The primary goal of pattern recognition is supervised or unsupervised classification. Among the various frameworks in which pattern recognition has been traditionally formulated, the statistical approach has been most intensively studied and used in practice. More recently, neural network techniques ..."
Abstract

Cited by 998 (30 self)
 Add to MetaCart
(Show Context)
The primary goal of pattern recognition is supervised or unsupervised classification. Among the various frameworks in which pattern recognition has been traditionally formulated, the statistical approach has been most intensively studied and used in practice. More recently, neural network techniques and methods imported from statistical learning theory have bean receiving increasing attention. The design of a recognition system requires careful attention to the following issues: definition of pattern classes, sensing environment, pattern representation, feature extraction and selection, cluster analysis, classifier design and learning, selection of training and test samples, and performance evaluation. In spite of almost 50 years of research and development in this field, the general problem of recognizing complex patterns with arbitrary orientation, location, and scale remains unsolved. New and emerging applications, such as data mining, web searching, retrieval of multimedia data, face recognition, and cursive handwriting recognition, require robust and efficient pattern recognition techniques. The objective of this review paper is to summarize and compare some of the wellknown methods used in various stages of a pattern recognition system and identify research topics and applications which are at the forefront of this exciting and challenging field.
Efficient GraphBased Image Segmentation
"... This paper addresses the problem of segmenting an image into regions. We define a predicate for measuring the evidence for a boundary between two regions using a graphbased representation of the image. We then develop an efficient segmentation algorithm based on this predicate, and show that althou ..."
Abstract

Cited by 931 (1 self)
 Add to MetaCart
(Show Context)
This paper addresses the problem of segmenting an image into regions. We define a predicate for measuring the evidence for a boundary between two regions using a graphbased representation of the image. We then develop an efficient segmentation algorithm based on this predicate, and show that although this algorithm makes greedy decisions it produces segmentations that satisfy global properties. We apply the algorithm to image segmentation using two different kinds of local neighborhoods in constructing the graph, and illustrate the results with both real and synthetic images. The algorithm runs in time nearly linear in the number of graph edges and is also fast in practice. An important characteristic of the method is its ability to preserve detail in lowvariability image regions while ignoring detail in highvariability regions.
Automatic Subspace Clustering of High Dimensional Data
 Data Mining and Knowledge Discovery
, 2005
"... Data mining applications place special requirements on clustering algorithms including: the ability to find clusters embedded in subspaces of high dimensional data, scalability, enduser comprehensibility of the results, nonpresumption of any canonical data distribution, and insensitivity to the or ..."
Abstract

Cited by 724 (12 self)
 Add to MetaCart
(Show Context)
Data mining applications place special requirements on clustering algorithms including: the ability to find clusters embedded in subspaces of high dimensional data, scalability, enduser comprehensibility of the results, nonpresumption of any canonical data distribution, and insensitivity to the order of input records. We present CLIQUE, a clustering algorithm that satisfies each of these requirements. CLIQUE identifies dense clusters in subspaces of maximum dimensionality. It generates cluster descriptions in the form of DNF expressions that are minimized for ease of comprehension. It produces identical results irrespective of the order in which input records are presented and does not presume any specific mathematical form for data distribution. Through experiments, we show that CLIQUE efficiently finds accurate clusters in large high dimensional datasets.
Mtree: An Efficient Access Method for Similarity Search in Metric Spaces
, 1997
"... A new access meth d, called Mtree, is proposed to organize and search large data sets from a generic "metric space", i.e. whE4 object proximity is only defined by a distance function satisfyingth positivity, symmetry, and triangle inequality postulates. We detail algorith[ for insertion o ..."
Abstract

Cited by 652 (38 self)
 Add to MetaCart
A new access meth d, called Mtree, is proposed to organize and search large data sets from a generic "metric space", i.e. whE4 object proximity is only defined by a distance function satisfyingth positivity, symmetry, and triangle inequality postulates. We detail algorith[ for insertion of objects and split management, whF h keep th Mtree always balanced  severalheralvFV split alternatives are considered and experimentally evaluated. Algorithd for similarity (range and knearest neigh bors) queries are also described. Results from extensive experimentationwith a prototype system are reported, considering as th performance criteria th number of page I/O's and th number of distance computations. Th results demonstratethm th Mtree indeed extendsth domain of applicability beyond th traditional vector spaces, performs reasonably well inhE[94Kv#E44V[vh data spaces, and scales well in case of growing files. 1
A comparison of document clustering techniques
 In KDD Workshop on Text Mining
, 2000
"... This paper presents the results of an experimental study of some common document clustering techniques: agglomerative hierarchical clustering and Kmeans. (We used both a “standard” Kmeans algorithm and a “bisecting ” Kmeans algorithm.) Our results indicate that the bisecting Kmeans technique is ..."
Abstract

Cited by 600 (29 self)
 Add to MetaCart
(Show Context)
This paper presents the results of an experimental study of some common document clustering techniques: agglomerative hierarchical clustering and Kmeans. (We used both a “standard” Kmeans algorithm and a “bisecting ” Kmeans algorithm.) Our results indicate that the bisecting Kmeans technique is better than the standard Kmeans approach and (somewhat surprisingly) as good or better than the hierarchical approaches that we tested.
Automatic Word Sense Discrimination
 Journal of Computational Linguistics
, 1998
"... This paper presents contextgroup discrimination, a disambiguation algorithm based on clustering. Senses are interpreted as groups (or clusters) of similar contexts of the ambiguous word. Words, contexts, and senses are represented in Word Space, a highdimensional, realvalued space in which closen ..."
Abstract

Cited by 530 (1 self)
 Add to MetaCart
This paper presents contextgroup discrimination, a disambiguation algorithm based on clustering. Senses are interpreted as groups (or clusters) of similar contexts of the ambiguous word. Words, contexts, and senses are represented in Word Space, a highdimensional, realvalued space in which closeness corresponds to semantic similarity. Similarity in Word Space is based on secondorder cooccurrence: two tokens (or contexts) of the ambiguous word are assigned to the same sense cluster if the words they cooccur with in turn occur with similar words in a training corpus. The algorithm is automatic and unsupervised in both training and application: senses are induced from a corpus without labeled training insta,nces or other external knowledge sources. The paper demonstrates good performance of contextgroup discrimination for a sample of natural and artificial ambiguous words
Data Mining: An Overview from Database Perspective
 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING
, 1996
"... Mining information and knowledge from large databases has been recognized by many researchers as a key research topic in database systems and machine learning, and by many industrial companies as an important area with an opportunity of major revenues. Researchers in many different fields have sh ..."
Abstract

Cited by 514 (26 self)
 Add to MetaCart
(Show Context)
Mining information and knowledge from large databases has been recognized by many researchers as a key research topic in database systems and machine learning, and by many industrial companies as an important area with an opportunity of major revenues. Researchers in many different fields have shown great interest in data mining. Several emerging applications in information providing services, such as data warehousing and online services over the Internet, also call for various data mining techniques to better understand user behavior, to improve the service provided, and to increase the business opportunities. In response to such a demand, this article is to provide a survey, from a database researcher's point of view, on the data mining techniques developed recently. A classification of the available data mining techniques is provided and a comparative study of such techniques is presented.