Results 1  10
of
467
A Bayesian method for the induction of probabilistic networks from data
 Machine Learning
, 1992
"... Abstract. This paper presents a Bayesian method for constructing probabilistic networks from databases. In particular, we focus on constructing Bayesian belief networks. Potential applications include computerassisted hypothesis testing, automated scientific discovery, and automated construction of ..."
Abstract

Cited by 1079 (26 self)
 Add to MetaCart
Abstract. This paper presents a Bayesian method for constructing probabilistic networks from databases. In particular, we focus on constructing Bayesian belief networks. Potential applications include computerassisted hypothesis testing, automated scientific discovery, and automated construction of probabilistic expert systems. We extend the basic method to handle missing data and hidden (latent) variables. We show how to perform probabilistic inference by averaging over the inferences of multiple belief networks. Results are presented of a preliminary evaluation of an algorithm for constructing a belief network from a database of cases. Finally, we relate the methods in this paper to previous work, and we discuss open problems.
Path Planning Using Lazy PRM
 In IEEE Int. Conf. Robot. & Autom
, 2000
"... This paper describes a new approach to probabilistic roadmap planners (PRMs). The overall theme of the algorithm, called Lazy PRM, is to minimize the number of collision checks performed during planning and hence minimize the running time of the planner. Our algorithm builds a roadmap in the configu ..."
Abstract

Cited by 192 (14 self)
 Add to MetaCart
This paper describes a new approach to probabilistic roadmap planners (PRMs). The overall theme of the algorithm, called Lazy PRM, is to minimize the number of collision checks performed during planning and hence minimize the running time of the planner. Our algorithm builds a roadmap in the configuration space, whose nodes are the userdefined initial and goal configurations and a number of randomly generated nodes. Neighboring nodes are connected by edges representing paths between the nodes. In contrast with PRMs, our planner initially assumes that all nodes and edges in the roadmap are collisionfree, and searches the roadmap at hand for a shortest path between the initial and the goal node. The nodes and edges along the path are then checked for collision. If a collision with the obstacles occurs, the corresponding nodes and edges are removed from the roadmap. Our planner either finds a new shortest path, or first updates the roadmap with new nodes and edges, and then searches for a shortest path. The above process is repeated until a collisionfree path is returned.
A Comparison of Prediction Accuracy, Complexity, and Training Time of Thirtythree Old and New Classification Algorithms
, 2000
"... . Twentytwo decision tree, nine statistical, and two neural network algorithms are compared on thirtytwo datasets in terms of classication accuracy, training time, and (in the case of trees) number of leaves. Classication accuracy is measured by mean error rate and mean rank of error rate. Both cr ..."
Abstract

Cited by 168 (7 self)
 Add to MetaCart
. Twentytwo decision tree, nine statistical, and two neural network algorithms are compared on thirtytwo datasets in terms of classication accuracy, training time, and (in the case of trees) number of leaves. Classication accuracy is measured by mean error rate and mean rank of error rate. Both criteria place a statistical, splinebased, algorithm called Polyclass at the top, although it is not statistically signicantly dierent from twenty other algorithms. Another statistical algorithm, logistic regression, is second with respect to the two accuracy criteria. The most accurate decision tree algorithm is Quest with linear splits, which ranks fourth and fth, respectively. Although splinebased statistical algorithms tend to have good accuracy, they also require relatively long training times. Polyclass, for example, is third last in terms of median training time. It often requires hours of training compared to seconds for other algorithms. The Quest and logistic regression algor...
Characterization of complex networks: A survey of measurements
 Advances in Physics
"... Each complex network (or class of networks) presents specific topological features which characterize its connectivity and highly influence the dynamics and function of processes executed on the network. The analysis, discrimination, and synthesis of complex networks therefore rely on the use of mea ..."
Abstract

Cited by 89 (7 self)
 Add to MetaCart
Each complex network (or class of networks) presents specific topological features which characterize its connectivity and highly influence the dynamics and function of processes executed on the network. The analysis, discrimination, and synthesis of complex networks therefore rely on the use of measurements capable of expressing the most relevant topological features. This article presents a survey of such measurements. It includes general considerations about complex network characterization, a brief review of the principal models, and the presentation of the main existing measurements organized into classes. Special attention is given to relating complex network analysis with the areas of pattern recognition and feature selection, as well as on surveying some concepts and measurements from traditional graph theory which are potentially useful for complex network research. Depending on the network and the analysis task one has in mind, a specific set of features may be chosen. It is hoped that the present survey will help the
A Linear Method for Deviation Detection in Large Databases
, 1996
"... We describe the problem of finding deviations in large data bases. Normally, explicit information outside the data, like integrity constraints or predefined patterns, is used for deviation detection. In contrast, we approach the problem from the inside of the data, using the implicit redundancy of t ..."
Abstract

Cited by 85 (1 self)
 Add to MetaCart
We describe the problem of finding deviations in large data bases. Normally, explicit information outside the data, like integrity constraints or predefined patterns, is used for deviation detection. In contrast, we approach the problem from the inside of the data, using the implicit redundancy of the data. We give a formal description of the problem and present a linear algorithm for detecting deviations. Our solution simulates a mechanism familiar to human beings: after seeing a series of similar data, an element disturbing the series is considered an exception. We also present experimental results from the application of this algorithm on reallife datasets showing its effectiveness.
Asymptotically Optimal Importance Sampling and Stratification for Pricing PathDependent Options
 Mathematical Finance
, 1999
"... This paper develops a variance reduction technique for Monte Carlo simulations of pathdependent options driven by highdimensional Gaussian vectors. The method combines importance sampling based on a change of drift with stratified sampling along a small number of key dimensions. The change of dri ..."
Abstract

Cited by 61 (13 self)
 Add to MetaCart
This paper develops a variance reduction technique for Monte Carlo simulations of pathdependent options driven by highdimensional Gaussian vectors. The method combines importance sampling based on a change of drift with stratified sampling along a small number of key dimensions. The change of drift is selected through a large deviations analysis and is shown to be optimal in an asymptotic sense. The drift selected has an interpretation as the path of the underlying state variables which maximizes the product of probability and payoffthe most important path. The directions used for stratified sampling are optimal for a quadratic approximation to the integrand or payoff function. Indeed, under differentiability assumptions our importance sampling method eliminates variability due to the linear part of the payoff function, and stratification eliminates much of the variability due to the quadratic part of the payoff. The two parts of the method are linked because the asymptotically optimal drift vector frequently provides a particularly effective direction for stratification. We illustrate the use of the method with pathdependent options, a stochastic volatility model, and interest rate derivatives. The method reveals novel features of the structure of their payoffs. KEY WORDS: Monte Carlo methods, variance reduction, large deviations, Laplace principle 1. INTRODUCTION This paper develops a variance reduction technique for Monte Carlo simulations driven by highdimensional Gaussian vectors, with particular emphasis on the pricing of pathdependent options. The method combines importance sampling based on a change of drift with stratified sampling along a small number of key dimensions. The change of drift is selected through a large deviations analysis and is shown to...
F.: Postural hand synergies for tool use
 The Journal of Neuroscience
, 1998
"... Subjects were asked to shape the right hand as if to grasp and use a large number of familiar objects. The chosen objects typically are held with a variety of grips, including “precision” and “power ” grips. Static hand posture was measured by recording the angular position of 15 joint angles of the ..."
Abstract

Cited by 61 (1 self)
 Add to MetaCart
Subjects were asked to shape the right hand as if to grasp and use a large number of familiar objects. The chosen objects typically are held with a variety of grips, including “precision” and “power ” grips. Static hand posture was measured by recording the angular position of 15 joint angles of the fingers and of the thumb. Although subjects adopted distinct hand shapes for the various objects, the joint angles of the digits did not vary independently. Principal components analysis showed that the first two components could account for �80 % of the variance, implying a substantial reduction from the 15 degrees of freedom that were recorded. However, even though they were small, higherorder (more than three) principal components did not represent random variability but instead provided additional information about the object. These results suggest that the control of hand posture involves a few postural synergies,
Performance Evaluation of Some Clustering Algorithms and Validity Indices
 IEEE Transactions on Pattern Analysis and Machine Intelligence
, 2002
"... Abstract—In this article, we evaluate the performance of three clustering algorithms, hard KMeans, single linkage, and a simulated annealing (SA) based technique, in conjunction with four cluster validity indices, namely DaviesBouldin index, Dunn’s index, CalinskiHarabasz index, and a recently de ..."
Abstract

Cited by 58 (1 self)
 Add to MetaCart
Abstract—In this article, we evaluate the performance of three clustering algorithms, hard KMeans, single linkage, and a simulated annealing (SA) based technique, in conjunction with four cluster validity indices, namely DaviesBouldin index, Dunn’s index, CalinskiHarabasz index, and a recently developed index I. Based on a relation between the index I and the Dunn’s index, a lower bound of the value of the former is theoretically estimated in order to get unique hard Kpartition when the data set has distinct substructures. The effectiveness of the different validity indices and clustering methods in automatically evolving the appropriate number of clusters is demonstrated experimentally for both artificial and reallife data sets with the number of clusters varying from two to ten. Once the appropriate number of clusters is determined, the SAbased clustering technique is used for proper partitioning of the data into the said number of clusters.
Feature Selection in Unsupervised Learning via Evolutionary Search
 In Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
, 2000
"... Feature subset selection is an important problem in knowl edge discovery, not only for the insight gained from deter mining relevant modeling variables but also for the improved understandability, scalability, and possibly, accuracy of the resulting models. In this paper we consider the problem of ..."
Abstract

Cited by 58 (3 self)
 Add to MetaCart
Feature subset selection is an important problem in knowl edge discovery, not only for the insight gained from deter mining relevant modeling variables but also for the improved understandability, scalability, and possibly, accuracy of the resulting models. In this paper we consider the problem of feature selection for unsupervised learning. A number of heuristic criteria can be used to estimate the quality of clusters built from a given featuresubset. Rather than combining such criteria, we use ELSA, an evolutionary lo cal selection algorithm that maintains a diverse population of solutions that approximate the Pareto front in a multi dimensional objectiv espace. Each evolved solution repre sents a feature subset and a number of clusters; a standard Kmeans algorithm is applied to form the given n umber of clusters based on the selected features. Preliminary results on both real and synthetic data show promise in finding Paretooptimal solutions through which we can identify the significant features and the correct number of clusters.
CLICK and EXPANDER: a system for clustering and visualizing gene expression data
 Bioinformatics
, 2003
"... Motivation: Microarrays have become a central tool in biological research. Their applications range from functional annotation to tissue classification and genetic network inference. A key step in the analysis of gene expression data is the identification of groups of genes that manifest similar exp ..."
Abstract

Cited by 57 (5 self)
 Add to MetaCart
Motivation: Microarrays have become a central tool in biological research. Their applications range from functional annotation to tissue classification and genetic network inference. A key step in the analysis of gene expression data is the identification of groups of genes that manifest similar expression patterns. This translates to the algorithmic problem of clustering genes based on their expression patterns. Results: We present a novel clustering algorithm, called CLICK, and its applications to gene expression analysis. The algorithm utilizes graphtheoretic and statistical techniques to identify tight groups (kernels) of highly similar elements, which are likely to belong to the same true cluster. Several heuristic procedures are then used to expand the kernels into the full clusters. We report on the application of CLICK to a variety of gene expression data sets. In all those applications it outperformed extant algorithms according to several common figures of merit. We also point out that CLICK can be successfully used for the identification of common regulatory motifs in the upstream regions of coregulated genes. Furthermore, we demonstrate how CLICK can be used to accurately classify tissue samples into disease types, based on their expression profiles. Finally, we present a new javabased graphical tool, called EXPANDER, for gene expression analysis and visualization, which incorporates CLICK and several other popular clustering algorithms.