Results 1  10
of
40,922
CURE: An Efficient Clustering Algorithm for Large Data sets
 Published in the Proceedings of the ACM SIGMOD Conference
, 1998
"... Clustering, in data mining, is useful for discovering groups and identifying interesting distributions in the underlying data. Traditional clustering algorithms either favor clusters with spherical shapes and similar sizes, or are very fragile in the presence of outliers. We propose a new clustering ..."
Abstract

Cited by 722 (5 self)
 Add to MetaCart
of random sampling and partitioning. A random sample drawn from the data set is first partitioned and each partition is partially clustered. The partial clusters are then clustered in a second pass to yield the desired clusters. Our experimental results confirm that the quality of clusters produced by CURE
Random forests
 Machine Learning
, 2001
"... Abstract. Random forests are a combination of tree predictors such that each tree depends on the values of a random vector sampled independently and with the same distribution for all trees in the forest. The generalization error for forests converges a.s. to a limit as the number of trees in the fo ..."
Abstract

Cited by 3613 (2 self)
 Add to MetaCart
Abstract. Random forests are a combination of tree predictors such that each tree depends on the values of a random vector sampled independently and with the same distribution for all trees in the forest. The generalization error for forests converges a.s. to a limit as the number of trees
Inducing Features of Random Fields
 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
, 1997
"... We present a technique for constructing random fields from a set of training samples. The learning paradigm builds increasingly complex fields by allowing potential functions, or features, that are supported by increasingly large subgraphs. Each feature has a weight that is trained by minimizing the ..."
Abstract

Cited by 670 (10 self)
 Add to MetaCart
We present a technique for constructing random fields from a set of training samples. The learning paradigm builds increasingly complex fields by allowing potential functions, or features, that are supported by increasingly large subgraphs. Each feature has a weight that is trained by minimizing
PROBABILITY INEQUALITIES FOR SUMS OF BOUNDED RANDOM VARIABLES
, 1962
"... Upper bounds are derived for the probability that the sum S of n independent random variables exceeds its mean ES by a positive number nt. It is assumed that the range of each summand of S is bounded or bounded above. The bounds for Pr(SES> nt) depend only on the endpoints of the ranges of the s ..."
Abstract

Cited by 2215 (2 self)
 Add to MetaCart
of the smumands and the mean, or the mean and the variance of S. These results are then used to obtain analogous inequalities for certain sums of dependent random variables such as U statistics and the sum of a random sample without replacement from a finite population.
Empirical exchange rate models of the Seventies: do they fit out of sample?
 JOURNAL OF INTERNATIONAL ECONOMICS
, 1983
"... This study compares the outofsample forecasting accuracy of various structural and time series exchange rate models. We find that a random walk model performs as well as any estimated model at one to twelve month horizons for the dollar/pound, dollar/mark, dollar/yen and tradeweighted dollar exch ..."
Abstract

Cited by 854 (12 self)
 Add to MetaCart
This study compares the outofsample forecasting accuracy of various structural and time series exchange rate models. We find that a random walk model performs as well as any estimated model at one to twelve month horizons for the dollar/pound, dollar/mark, dollar/yen and tradeweighted dollar
Stock Market Prices Do Not Follow Random Walks: Evidence from a Simple Specification Test
 REVIEW OF FINANCIAL STUDIES
, 1988
"... In this article we test the random walk hypothesis for weekly stock market returns by comparing variance estimators derived from data sampled at different frequencies. The random walk model is strongly rejected for the entire sample period (19621985) and for all subperiod for a variety of aggrega ..."
Abstract

Cited by 517 (17 self)
 Add to MetaCart
In this article we test the random walk hypothesis for weekly stock market returns by comparing variance estimators derived from data sampled at different frequencies. The random walk model is strongly rejected for the entire sample period (19621985) and for all subperiod for a variety
Sampling Large Databases for Association Rules
, 1996
"... Discovery of association rules is an important database mining problem. Current algorithms for nding association rules require several passes over the analyzed database, and obviously the role of I/O overhead is very signi cant for very large databases. We present new algorithms that reduce the data ..."
Abstract

Cited by 470 (3 self)
 Add to MetaCart
the database activity considerably. Theidea is to pick a random sample, to ndusingthis sample all association rules that probably hold in the whole database, and then to verify the results with the restofthe database. The algorithms thus produce exact association rules, not approximations based on a sample
Applications of Random Sampling in Computational Geometry, II
 Discrete Comput. Geom
, 1995
"... We use random sampling for several new geometric algorithms. The algorithms are "Las Vegas," and their expected bounds are with respect to the random behavior of the algorithms. These algorithms follow from new general results giving sharp bounds for the use of random subsets in geometric ..."
Abstract

Cited by 432 (12 self)
 Add to MetaCart
We use random sampling for several new geometric algorithms. The algorithms are "Las Vegas," and their expected bounds are with respect to the random behavior of the algorithms. These algorithms follow from new general results giving sharp bounds for the use of random subsets in geometric
On the Resemblance and Containment of Documents
 In Compression and Complexity of Sequences (SEQUENCES’97
, 1997
"... Given two documents A and B we define two mathematical notions: their resemblance r(A, B)andtheircontainment c(A, B) that seem to capture well the informal notions of "roughly the same" and "roughly contained." The basic idea is to reduce these issues to set intersection probl ..."
Abstract

Cited by 506 (6 self)
 Add to MetaCart
problems that can be easily evaluated by a process of random sampling that can be done independently for each document. Furthermore, the resemblance can be evaluated using a fixed size sample for each document.
CONDENSATION  conditional density propagation for visual tracking
, 1998
"... The problem of tracking curves in dense visual clutter is challenging. Kalman filtering is inadequate because it is based on Gaussian densities which, being unimodal, cannot represent simultaneous alternative hypotheses. The Condensation algorithm uses “factored sampling”, previously applied to th ..."
Abstract

Cited by 1503 (12 self)
 Add to MetaCart
The problem of tracking curves in dense visual clutter is challenging. Kalman filtering is inadequate because it is based on Gaussian densities which, being unimodal, cannot represent simultaneous alternative hypotheses. The Condensation algorithm uses “factored sampling”, previously applied
Results 1  10
of
40,922