Results 1 - 10
of
101
Survey of clustering data mining techniques
, 2002
"... Accrue Software, Inc. Clustering is a division of data into groups of similar objects. Representing the data by fewer clusters necessarily loses certain fine details, but achieves simplification. It models data by its clusters. Data modeling puts clustering in a historical perspective rooted in math ..."
Abstract
-
Cited by 177 (0 self)
- Add to MetaCart
Accrue Software, Inc. Clustering is a division of data into groups of similar objects. Representing the data by fewer clusters necessarily loses certain fine details, but achieves simplification. It models data by its clusters. Data modeling puts clustering in a historical perspective rooted in mathematics, statistics, and numerical analysis. From a machine learning perspective clusters correspond to hidden patterns, the search for clusters is unsupervised learning, and the resulting system represents a data concept. From a practical perspective clustering plays an outstanding role in data mining applications such as scientific data exploration, information retrieval and text mining, spatial database applications, Web analysis, CRM, marketing, medical diagnostics, computational biology, and many others. Clustering is the subject of active research in several fields such as statistics, pattern recognition, and machine learning. This survey focuses on clustering in data mining. Data mining adds to clustering the complications of very large datasets with very many attributes of different types. This imposes unique
Wavecluster: A multi-resolution clustering approach for very large spatial databases
, 1998
"... Many applications require the management of spatial data. Clustering large spatial databases is an important problem which tries to find the densely populated regions in the feature space to be used in data mining, knowledge discovery, or efficient information retrieval. A good clustering approach s ..."
Abstract
-
Cited by 147 (5 self)
- Add to MetaCart
Many applications require the management of spatial data. Clustering large spatial databases is an important problem which tries to find the densely populated regions in the feature space to be used in data mining, knowledge discovery, or efficient information retrieval. A good clustering approach should be efficient and detect clusters of arbitrary shape. It must be insensitive to the outliers (noise) and the order of input data. We pro-pose WaveCluster, a novel clustering approach based on wavelet transforms, which satisfies all the above requirements. Using multi-resolution property of wavelet transforms, we can effectively identify arbitrary shape clus-ters at different degrees of accuracy. We also demonstrate that WaveCluster is highly effi-cient in terms of time complexity. Experi-mental results on very large data sets are pre-sented which show the efficiency and effective-ness of the proposed approach compared to the other recent clustering methods.
Substructure Discovery Using Minimum Description Length and Background Knowledge
- Journal of Artificial Intelligence Research
, 1994
"... The ability to identify interesting and repetitive substructures is an essential component to discovering knowledge in structural data. We describe a new version of our Subdue substructure discovery system based on the minimum description length principle. The Subdue system discovers substructures ..."
Abstract
-
Cited by 127 (34 self)
- Add to MetaCart
The ability to identify interesting and repetitive substructures is an essential component to discovering knowledge in structural data. We describe a new version of our Subdue substructure discovery system based on the minimum description length principle. The Subdue system discovers substructures that compress the original data and represent structural concepts in the data. By replacing previously-discovered substructures in the data, multiple passes of Subdue produce a hierarchical description of the structural regularities in the data. Subdue uses a computationally-bounded inexact graph match that identifies similar, but not identical, instances of a substructure and finds an approximate measure of closeness of two substructures when under computational constraints. In addition to the minimum description length principle, other background knowledge can be used by Subdue to guide the search towards more appropriate substructures. Experiments in a variety of domains demonstrate Subdu...
Face Detection With Information-Based Maximum Discrimination
- In Computer Vision and Pattern Recognition
, 1997
"... In this paper we present a visual learning technique that maximizes the discrimination between positive and negative examples in a training set. We demonstrate our technique in the context of face detection with complex background without color or motion information, which has proven to be a challen ..."
Abstract
-
Cited by 65 (7 self)
- Add to MetaCart
In this paper we present a visual learning technique that maximizes the discrimination between positive and negative examples in a training set. We demonstrate our technique in the context of face detection with complex background without color or motion information, which has proven to be a challenging problem. We use a family of discrete Markov processes to model the face and background patterns and estimate the probability models using the data statistics. Then, we convert the learning process into an optimization, selecting the Markov process that optimizes the information-based discrimination between the two classes. The detection process is carried out by computing the likelihood ratio using the probability model obtained from the learning procedure. We show that because of the discrete nature of these models, the detection process is, by almost two orders of magnitude, less computationally expensive than neural network approaches. However, no improvement in terms of correct-answ...
Action Recognition using Probabilistic Parsing
- IEEE CVPR’98
, 1998
"... A new approach to the recognition of temporal behaviors and activities is presented. The fundamental idea, inspired by work in speech recognition, is to divide the inference problem into two levels. The lower level is performed using standard independent probabilistic temporal event detectors such a ..."
Abstract
-
Cited by 65 (5 self)
- Add to MetaCart
A new approach to the recognition of temporal behaviors and activities is presented. The fundamental idea, inspired by work in speech recognition, is to divide the inference problem into two levels. The lower level is performed using standard independent probabilistic temporal event detectors such as hidden Markov models (HMMs) to propose candidate detections of low level temporal features. The outputs of these detectors provide the input stream for a stochastic contextfree grammar parsing mechanism. The grammar and parser provide longer range temporal constraints, disambiguate uncertain low level detections, and allow the inclusion of a priori knowledge about the structure of temporal events in a given domain. To achieve such a system we provide techniques for generating a discrete symbol stream from continuous low level detectors, for enforcing temporal exclusion constraints during parsing, and for generating a control method for low level feature application based upon the current parsing state. We demonstrate the approach in several experiments using both visual and other sensing data.
Image segmentation based on oscillatory correlation
- Neural Computation
, 1997
"... We study image segmentation on the basis of locally excitatory globally inhibitory oscillator networks (LEGION), whereby the phases of oscillators encode the binding of pixels. We introduce a potential for each oscillator so that only those oscillators with strong connections from their neighborhood ..."
Abstract
-
Cited by 63 (18 self)
- Add to MetaCart
We study image segmentation on the basis of locally excitatory globally inhibitory oscillator networks (LEGION), whereby the phases of oscillators encode the binding of pixels. We introduce a potential for each oscillator so that only those oscillators with strong connections from their neighborhood can develop high potentials. Based on the concept of potential, a solution to remove noisy regions in an image is proposed for LEGION, so that it suppresses the oscillators corresponding to noisy regions, without affecting those corresponding to major regions. We show analytically that the resulting oscillator network separates an image into several major regions, plus a background consisting of all noisy regions, and illustrate network properties by computer simulation. The network exhibits a natural capacity in segmenting images. The oscillatory dynamics leads to a computer algorithm, which is applied successfully to segmenting real graylevel images. A number of issues regarding biological plausibility and perceptual organization are discussed. We argue that LEGION provides a novel and effective framework for image segmentation and figure-ground segregation. DeLiang Wang and David Terman Image Segmentation 1.
A new methodology of extraction, optimization and application of crisp and fuzzy logical rules
- IEEE TRANSACTIONS ON NEURAL NETWORKS
, 2001
"... A new methodology of extraction, optimization, and application of sets of logical rules is described. Neural networks are used for initial rule extraction, local, or global minimization procedures for optimization, and Gaussian uncertainties of measurements are assumed during application of logical ..."
Abstract
-
Cited by 46 (23 self)
- Add to MetaCart
A new methodology of extraction, optimization, and application of sets of logical rules is described. Neural networks are used for initial rule extraction, local, or global minimization procedures for optimization, and Gaussian uncertainties of measurements are assumed during application of logical rules. Algorithms for extraction of logical rules from data with real-valued features require determination of linguistic variables or membership functions. Context-dependent membership functions for crisp and fuzzy linguistic variables are introduced and methods of their determination described. Several neural and machine learning methods of logical rule extraction generating initial rules are described, based on constrained multilayer perceptron, networks with localized transfer functions or on separability criteria for determination of linguistic variables. A tradeoff between accuracy/simplicity is explored at the rule extraction stage and between rejection/error level at the optimization stage. Gaussian uncertainties of measurements are assumed during application of crisp logical rules, leading to “soft trapezoidal” membership functions and allowing to optimize the linguistic variables using gradient procedures. Numerous applications of this methodology to benchmark and real-life problems are reported and very simple crisp logical rules for many datasets provided.
Nonlinear Operator for Oriented Texture
, 1999
"... Texture is an important part of the visual world of animals and humans and their visual systems successfully detect, discriminate, and segment texture. Relatively recently progress was made concerning structures in the brain that are presumably responsible for texture processing. Neurophysiologists ..."
Abstract
-
Cited by 32 (3 self)
- Add to MetaCart
Texture is an important part of the visual world of animals and humans and their visual systems successfully detect, discriminate, and segment texture. Relatively recently progress was made concerning structures in the brain that are presumably responsible for texture processing. Neurophysiologists reported on the discovery of a new type of orientation selective neuron in areas V1 and V2 of the visual cortex of monkeys which they called grating cells. Such cells respond vigorously to a grating of bars of appropriate orientation, position and periodicity. In contrast to other orientation selective cells, grating cells respond very weakly or not at all to single bars which do not make part of a grating. Elsewhere we proposed a nonlinear model of this type of cell and demonstrated the advantages of grating cells with respect to the separation of texture and form information. In this paper, we use grating cell operators to obtain features and compare these operators in texture analysis tas...
Why Recognition in a Statistics-based Face Recognition System Should be based on the Pure Face Portion: a Probabilistic Decision-based Proof
, 2000
"... It is evident that the process of face recognition, by definition, should be based on the content of a face. The problem is: what is a "face"? Recently, a state-of-the-art statistics-based face recognition system, the PCA plus LDA approach, has been proposed [1]. However, the authors used "face" ..."
Abstract
-
Cited by 25 (0 self)
- Add to MetaCart
It is evident that the process of face recognition, by definition, should be based on the content of a face. The problem is: what is a "face"? Recently, a state-of-the-art statistics-based face recognition system, the PCA plus LDA approach, has been proposed [1]. However, the authors used "face" images that included hair, shoulders, face and background. Our intuition tells us that only a recognition process based on a "pure" face portion can be called face recognition. The mixture of irrelevant data may result in an incorrect set of decision boundaries. In this paper, we propose a statistics-based technique to quantitatively prove our assertion. For the purpose of evaluating how the different portions of a face image will influence the recognition results, a hypothesis testing model is proposed. We then implement the above mentioned face ...

