Results 1 
5 of
5
Optimal reinsertion: A new search operator for accelerated and more accurate bayesian network structure learning
 In Proceedings of the 20th Intl. Conf. on Machine Learning
, 2003
"... We show how a conceptually simple search operator called Optimal Reinsertion can be applied to learning Bayesian Network structure from data. On each step we pick a node called the target. We delete all arcs entering or exiting the target. We then find, subject to some constraints, the globally opti ..."
Abstract

Cited by 40 (6 self)
 Add to MetaCart
We show how a conceptually simple search operator called Optimal Reinsertion can be applied to learning Bayesian Network structure from data. On each step we pick a node called the target. We delete all arcs entering or exiting the target. We then find, subject to some constraints, the globally optimal combination of inarcs and outarcs with which to reinsert it. The heart of the paper is a new algorithm called ORSearch which allows each optimal reinsertion step to be computed efficiently on large datasets. Our empirical results compare Optimal Reinsertion against a highly tuned implementation of multirestart hill climbing. The results typically show one to two orders of magnitude speedup on a variety of datasets. They usually show better final results, both in terms of BDEU score and in modeling of future data drawn from the same distribution. 1. Bayesian Network Structure Search Given a dataset of R records and m categorical attributes, how can we find a Bayesian network structure that provides a good model of the data? Happily, the formulation of this question into a welldefined optimization problem is now fairly well understood (Heckerman et al., 1995; Cooper & Herskovits, 1992). However, finding the optimal solution is an NPcomplete problem (Chickering, 1996a). The computational issues in performing heuristic search in this space are also severe, even taking into account the numerous ingenious and effective innovations in recent years (e.g.
Active learning for anomaly and rarecategory detection
 In Advances in Neural Information Processing Systems 18
, 2004
"... We introduce a novel activelearning scenario in which a user wants to work with a learning algorithm to identify useful anomalies. These are distinguished from the traditional statistical definition of anomalies as outliers or merely illmodeled points. Our distinction is that the usefulness of ano ..."
Abstract

Cited by 33 (0 self)
 Add to MetaCart
We introduce a novel activelearning scenario in which a user wants to work with a learning algorithm to identify useful anomalies. These are distinguished from the traditional statistical definition of anomalies as outliers or merely illmodeled points. Our distinction is that the usefulness of anomalies is categorized subjectively by the user. We make two additional assumptions. First, there exist extremely few useful anomalies to be hunted down within a massive dataset. Second, both useful and useless anomalies may sometimes exist within tiny classes of similar anomalies. The challenge is thus to identify “rare category ” records in an unlabeled noisy set with help (in the form of class labels) from a human expert who has a small budget of datapoints that they are prepared to categorize. We propose a technique to meet this challenge, which assumes a mixture model fit to the data, but otherwise makes no assumptions on the particular form of the mixture components. This property promises wide applicability in reallife scenarios and for various statistical models. We give an overview of several alternative methods, highlighting their strengths and weaknesses, and conclude with a detailed empirical analysis. We show that our method can quickly zoom in on an anomaly set containing a few tens of points in a dataset of hundreds of thousands. 1
Nonparametric Density Estimation and Clustering with Application to Cosmology. unpuplished
, 2003
"... We present a nonparametric method for galaxy clustering in astronomical sky surveys. We show that the cosmological definition of clusters of galaxies is equivalent to density contour clusters (Hartigan, 1975) Sc = {f> c} where f is a probability density function. The plugin estimator ̂ Sc = { ̂ f ..."
Abstract

Cited by 6 (2 self)
 Add to MetaCart
We present a nonparametric method for galaxy clustering in astronomical sky surveys. We show that the cosmological definition of clusters of galaxies is equivalent to density contour clusters (Hartigan, 1975) Sc = {f> c} where f is a probability density function. The plugin estimator ̂ Sc = { ̂ f> c} is used to estimate Sc where ̂ f is the multivariate kernel density estimator. To choose the optimal smoothing parameter, we use crossvalidation and the plugin method and show that crossvalidation method outperforms the plugin method in our case. A new cluster catalogue, database of the locations of clusters, based on the plugin estimator is compared to existing cluster catalogs, the Abell and Edinburgh/Durham Cluster Catalogue I (EDCCI). Our result is more consistent with the EDCCI than with the Abell, which is the most widely used catalogue. We use the smoothed bootstrap to asses the validity of clustering results.
Linear redshift distortions: A review
 in Ringberg Workshop on LargeScale Structure
, 1998
"... Abstract. Redshift maps of galaxies in the Universe are distorted by the peculiar velocities of galaxies along the line of sight. The amplitude of the distortions on large, linear scales yields a measurement of the linear redshift distortion parameter, which is β ≈ Ω0.6 0 /b in standard cosmology wi ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
Abstract. Redshift maps of galaxies in the Universe are distorted by the peculiar velocities of galaxies along the line of sight. The amplitude of the distortions on large, linear scales yields a measurement of the linear redshift distortion parameter, which is β ≈ Ω0.6 0 /b in standard cosmology with cosmological density Ω0 and lighttomass bias b. All measurements of β from linear redshift distortions published up to mid 1997 are reviewed. The average and standard deviation of the reported values is βoptical = 0.52 ± 0.26 for optically selected galaxies, and βIRAS = 0.77 ± 0.22 for IRAS selected galaxies. The implied relative bias is boptical/bIRAS ≈ 1.5. If optical galaxies are unbiased, then Ω0 = 0.33 +0.32 −0.22 are unbiased, then Ω0 = 0.63 +0.35
unknown title
, 2004
"... www.elsevier.com/locate/csda Nonparametric density estimation and clustering in astronomical sky surveys ..."
Abstract
 Add to MetaCart
www.elsevier.com/locate/csda Nonparametric density estimation and clustering in astronomical sky surveys