Results 11 - 20
of
2,553
OBBTree: A Hierarchical Structure for Rapid Interference Detection
, 1996
"... We present a data structure and an algorithm for efficient and exact interference detection amongst complex models undergoing rigid motion. The algorithm is applicable to all general polygonal models. It pre-computes a hierarchical representation of models using tight-fitting oriented bounding box ..."
Abstract
-
Cited by 596 (38 self)
- Add to MetaCart
We present a data structure and an algorithm for efficient and exact interference detection amongst complex models undergoing rigid motion. The algorithm is applicable to all general polygonal models. It pre-computes a hierarchical representation of models using tight-fitting oriented bounding box trees (OBBTrees). At runtime, the algorithm traverses two such trees and tests for overlaps between oriented bounding boxes based on a separating axis theorem, which takes less than 200 operations in practice. It has been implemented and we compare its performance with other hierarchical data structures. In particular, it can robustly and accurately detect all the contacts between large complex geometries composed of hundreds of thousands of polygons at interactive rates.
Approximate Nearest Neighbors: Towards Removing the Curse of Dimensionality
, 1998
"... The nearest neighbor problem is the following: Given a set of n points P = fp 1 ; : : : ; png in some metric space X, preprocess P so as to efficiently answer queries which require finding the point in P closest to a query point q 2 X. We focus on the particularly interesting case of the d-dimens ..."
Abstract
-
Cited by 533 (28 self)
- Add to MetaCart
The nearest neighbor problem is the following: Given a set of n points P = fp 1 ; : : : ; png in some metric space X, preprocess P so as to efficiently answer queries which require finding the point in P closest to a query point q 2 X. We focus on the particularly interesting case of the d-dimensional Euclidean space where X = ! d under some l p norm. Despite decades of effort, the current solutions are far from satisfactory; in fact, for large d, in theory or in practice, they provide little improvement over the brute-force algorithm which compares the query point to each data point. Of late, there has been some interest in the approximate nearest neighbors problem, which is: Find a point p 2 P that is an ffl-approximate nearest neighbor of the query q in that for all p 0 2 P , d(p; q) (1 + ffl)d(p 0 ; q). We present two algorithmic results for the approximate version that significantly improve the known bounds: (a) preprocessing cost polynomial in n and d, and a trul...
Statistical pattern recognition: A review
- IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
, 2000
"... The primary goal of pattern recognition is supervised or unsupervised classification. Among the various frameworks in which pattern recognition has been traditionally formulated, the statistical approach has been most intensively studied and used in practice. More recently, neural network techniques ..."
Abstract
-
Cited by 487 (20 self)
- Add to MetaCart
The primary goal of pattern recognition is supervised or unsupervised classification. Among the various frameworks in which pattern recognition has been traditionally formulated, the statistical approach has been most intensively studied and used in practice. More recently, neural network techniques and methods imported from statistical learning theory have bean receiving increasing attention. The design of a recognition system requires careful attention to the following issues: definition of pattern classes, sensing environment, pattern representation, feature extraction and selection, cluster analysis, classifier design and learning, selection of training and test samples, and performance evaluation. In spite of almost 50 years of research and development in this field, the general problem of recognizing complex patterns with arbitrary orientation, location, and scale remains unsolved. New and emerging applications, such as data mining, web searching, retrieval of multimedia data, face recognition, and cursive handwriting recognition, require robust and efficient pattern recognition techniques. The objective of this review paper is to summarize and compare some of the well-known methods used in various stages of a pattern recognition system and identify research topics and applications which are at the forefront of this exciting and challenging field.
On the Optimality of the Simple Bayesian Classifier under Zero-One Loss
, 1997
"... The simple Bayesian classifier is known to be optimal when attributes are independent given the class, but the question of whether other sufficient conditions for its optimality exist has so far not been explored. Empirical results showing that it performs surprisingly well in many domains containin ..."
Abstract
-
Cited by 486 (20 self)
- Add to MetaCart
The simple Bayesian classifier is known to be optimal when attributes are independent given the class, but the question of whether other sufficient conditions for its optimality exist has so far not been explored. Empirical results showing that it performs surprisingly well in many domains containing clear attribute dependences suggest that the answer to this question may be positive. This article shows that, although the Bayesian classifier's probability estimates are only optimal under quadratic loss if the independence assumption holds, the classifier itself can be optimal under zero-one loss (misclassification rate) even when this assumption is violated by a wide margin. The region of quadratic-loss optimality of the Bayesian classifier is in fact a second-order infinitesimal fraction of the region of zero-one optimality. This implies that the Bayesian classifier has a much greater range of applicability than previously thought. For example, in this article it is shown to be opti...
Automatic Subspace Clustering of High Dimensional Data
- Data Mining and Knowledge Discovery
, 2005
"... Data mining applications place special requirements on clustering algorithms including: the ability to find clusters embedded in subspaces of high dimensional data, scalability, end-user comprehensibility of the results, non-presumption of any canonical data distribution, and insensitivity to the or ..."
Abstract
-
Cited by 461 (11 self)
- Add to MetaCart
Data mining applications place special requirements on clustering algorithms including: the ability to find clusters embedded in subspaces of high dimensional data, scalability, end-user comprehensibility of the results, non-presumption of any canonical data distribution, and insensitivity to the order of input records. We present CLIQUE, a clustering algorithm that satisfies each of these requirements. CLIQUE identifies dense clusters in subspaces of maximum dimensionality. It generates cluster descriptions in the form of DNF expressions that are minimized for ease of comprehension. It produces identical results irrespective of the order in which input records are presented and does not presume any specific mathematical form for data distribution. Through experiments, we show that CLIQUE efficiently finds accurate clusters in large high dimensional datasets.
Bayesian Network Classifiers
, 1997
"... Recent work in supervised learning has shown that a surprisingly simple Bayesian classifier with strong assumptions of independence among features, called naive Bayes, is competitive with state-of-the-art classifiers such as C4.5. This fact raises the question of whether a classifier with less restr ..."
Abstract
-
Cited by 451 (20 self)
- Add to MetaCart
Recent work in supervised learning has shown that a surprisingly simple Bayesian classifier with strong assumptions of independence among features, called naive Bayes, is competitive with state-of-the-art classifiers such as C4.5. This fact raises the question of whether a classifier with less restrictive assumptions can perform even better. In this paper we evaluate approaches for inducing classifiers from data, based on the theory of learning Bayesian networks. These networks are factored representations of probability distributions that generalize the naive Bayesian classifier and explicitly represent statements about independence. Among these approaches we single out a method we call Tree Augmented Naive Bayes (TAN), which outperforms naive Bayes, yet at the same time maintains the computational simplicity (no search involved) and robustness that characterize naive Bayes. We experimentally tested these approaches, using problems from the University of California at Irvine repository, and compared them to C4.5, naive Bayes, and wrapper methods for feature selection.
An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants
- MACHINE LEARNING
, 1999
"... Methods for voting classification algorithms, such as Bagging and AdaBoost, have been shown to be very successful in improving the accuracy of certain classifiers for artificial and real-world datasets. We review these algorithms and describe a large empirical study comparing several variants in co ..."
Abstract
-
Cited by 449 (2 self)
- Add to MetaCart
Methods for voting classification algorithms, such as Bagging and AdaBoost, have been shown to be very successful in improving the accuracy of certain classifiers for artificial and real-world datasets. We review these algorithms and describe a large empirical study comparing several variants in conjunction with a decision tree inducer (three variants) and a Naive-Bayes inducer.
The purpose of the study is to improve our understanding of why and
when these algorithms, which use perturbation, reweighting, and
combination techniques, affect classification error. We provide a
bias and variance decomposition of the error to show how different
methods and variants influence these two terms. This allowed us to
determine that Bagging reduced variance of unstable methods, while
boosting methods (AdaBoost and Arc-x4) reduced both the bias and
variance of unstable methods but increased the variance for Naive-Bayes,
which was very stable. We observed that Arc-x4 behaves differently
than AdaBoost if reweighting is used instead of resampling,
indicating a fundamental difference. Voting variants, some of which
are introduced in this paper, include: pruning versus no pruning,
use of probabilistic estimates, weight perturbations (Wagging), and
backfitting of data. We found that Bagging improves when
probabilistic estimates in conjunction with no-pruning are used, as
well as when the data was backfit. We measure tree sizes and show
an interesting positive correlation between the increase in the
average tree size in AdaBoost trials and its success in reducing the
error. We compare the mean-squared error of voting methods to
non-voting methods and show that the voting methods lead to large
and significant reductions in the mean-squared errors. Practical
problems that arise in implementing boosting algorithms are
explored, including numerical instabilities and underflows. We use
scatterplots that graphically show how AdaBoost reweights instances,
emphasizing not only "hard" areas but also outliers and noise.
Detecting faces in images: A survey
- IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
, 2002
"... Images containing faces are essential to intelligent vision-based human computer interaction, and research efforts in face processing include face recognition, face tracking, pose estimation, and expression recognition. However, many reported methods assume that the faces in an image or an image se ..."
Abstract
-
Cited by 437 (4 self)
- Add to MetaCart
Images containing faces are essential to intelligent vision-based human computer interaction, and research efforts in face processing include face recognition, face tracking, pose estimation, and expression recognition. However, many reported methods assume that the faces in an image or an image sequence have been identified and localized. To build fully automated systems that analyze the information contained in face images, robust and efficient face detection algorithms are required. Given a single image, the goal of face detection is to identify all image regions which contain a face regardless of its three-dimensional position, orientation, and the lighting conditions. Such a problem is challenging because faces are nonrigid and have a high degree of variability in size, shape, color, and texture. Numerous techniques have been developed to detect faces in a single image, and the purpose of this paper is to categorize and evaluate these algorithms. We also discuss relevant issues such as data collection, evaluation metrics, and benchmarking. After analyzing these algorithms and identifying their limitations, we conclude with several promising directions for future research.
Hidden Markov models in computational biology: applications to protein modeling
- JOURNAL OF MOLECULAR BIOLOGY
, 1994
"... Hidden.Markov Models (HMMs) are applied t.0 the problems of statistical modeling, database searching and multiple sequence alignment of protein families and protein domains. These methods are demonstrated the on globin family, the protein kinase catalytic domain, and the EF-hand calcium binding moti ..."
Abstract
-
Cited by 436 (29 self)
- Add to MetaCart
Hidden.Markov Models (HMMs) are applied t.0 the problems of statistical modeling, database searching and multiple sequence alignment of protein families and protein domains. These methods are demonstrated the on globin family, the protein kinase catalytic domain, and the EF-hand calcium binding motif. In each case the parameters of an HMM are estimated from a training set of unaligned sequences. After the HMM is built, it is used to obtain a multiple alignment of all the training sequences. It is also used to search the. SWISS-PROT 22 database for other sequences. that are members of the given protein family, or contain the given domain. The Hi " produces multiple alignments of good quality that agree closely with the alignments produced by programs that incorporate threedimensional structural information. When employed in discrimination tests (by examining how closely the sequences in a database fit the globin, kinase and EF-hand HMMs), the '\ HMM is able to distinguish members of these families from non-members with a high degree of accuracy. Both the HMM and PROFILESEARCH (a technique used to search for relationships between a protein sequence and multiply aligned sequences) perform better in these tests than PROSITE (a dictionary of sites and patterns in proteins). The HMM appecvs to have a slight advantage over PROFILESEARCH in terms of lower rates of false

