Results 1 - 10
of
99,011
Estimating the number of clusters in a dataset via the Gap statistic
, 2000
"... We propose a method (the \Gap statistic") for estimating the number of clusters (groups) in a set of data. The technique uses the output of any clustering algorithm (e.g. k-means or hierarchical), comparing the change in within cluster dispersion to that expected under an appropriate reference ..."
Abstract
-
Cited by 502 (1 self)
- Add to MetaCart
We propose a method (the \Gap statistic") for estimating the number of clusters (groups) in a set of data. The technique uses the output of any clustering algorithm (e.g. k-means or hierarchical), comparing the change in within cluster dispersion to that expected under an appropriate reference
A comparative analysis of selection schemes used in genetic algorithms
- Foundations of Genetic Algorithms
, 1991
"... This paper considers a number of selection schemes commonly used in modern genetic algorithms. Specifically, proportionate reproduction, ranking selection, tournament selection, and Genitor (or «steady state") selection are compared on the basis of solutions to deterministic difference or d ..."
Abstract
-
Cited by 531 (31 self)
- Add to MetaCart
This paper considers a number of selection schemes commonly used in modern genetic algorithms. Specifically, proportionate reproduction, ranking selection, tournament selection, and Genitor (or «steady state") selection are compared on the basis of solutions to deterministic difference
Multiobjective evolutionary algorithms: a comparative case study and the strength pareto approach
- IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION
, 1999
"... Evolutionary algorithms (EA’s) are often well-suited for optimization problems involving several, often conflicting objectives. Since 1985, various evolutionary approaches to multiobjective optimization have been developed that are capable of searching for multiple solutions concurrently in a singl ..."
Abstract
-
Cited by 813 (22 self)
- Add to MetaCart
single run. However, the few comparative studies of different methods presented up to now remain mostly qualitative and are often restricted to a few approaches. In this paper, four multiobjective EA’s are compared quantitatively where an extended 0/1 knapsack problem is taken as a basis. Furthermore, we
The anatomy of a large-scale hypertextual web search engine.
- Comput. Netw. ISDN Syst.,
, 1998
"... Abstract In this paper, we present Google, a prototype of a large-scale search engine which makes heavy use of the structure present in hypertext. Google is designed to crawl and index the Web efficiently and produce much more satisfying search results than existing systems. The prototype with a fu ..."
Abstract
-
Cited by 4673 (5 self)
- Add to MetaCart
full text and hyperlink database of at least 24 million pages is available at http://google.stanford.edu/ To engineer a search engine is a challenging task. Search engines index tens to hundreds of millions of web pages involving a comparable number of distinct terms. They answer tens of millions
The use of the area under the ROC curve in the evaluation of machine learning algorithms
- PATTERN RECOGNITION
, 1997
"... In this paper we investigate the use of the area under the receiver operating characteristic (ROC) curve (AUC) as a performance measure for machine learning algorithms. As a case study we evaluate six machine learning algorithms (C4.5, Multiscale Classifier, Perceptron, Multi-layer Perceptron, k-Ne ..."
Abstract
-
Cited by 685 (3 self)
- Add to MetaCart
-Nearest Neighbours, and a Quadratic Discriminant Function) on six "real world " medical diagnostics data sets. We compare and discuss the use of AUC to the more conventional overall accuracy and find that AUC exhibits a number of desirable properties when compared to overall accuracy: increased
Mining Sequential Patterns
, 1995
"... We are given a large database of customer transactions, where each transaction consists of customer-id, transaction time, and the items bought in the transaction. We introduce the problem of mining sequential patterns over such databases. We present three algorithms to solve this problem, and empiri ..."
Abstract
-
Cited by 1568 (6 self)
- Add to MetaCart
, and empirically evaluate their performance using synthetic data. Two of the proposed algorithms, AprioriSome and AprioriAll, have comparable performance, albeit AprioriSome performs a little better when the minimum number of customers that must support a sequential pattern is low. Scale-up experiments show
Indivisible labor and the business cycle
- Journal of Monetary Economics
, 1985
"... A growth model with shocks to technology is studied. Labor is indivisible, so all variability in hours worked is due to fluctuations in the number employed. We find that, unlike previous equilibrium models of the business cycle, this economy displays large fluctuations in hours worked and relatively ..."
Abstract
-
Cited by 805 (10 self)
- Add to MetaCart
A growth model with shocks to technology is studied. Labor is indivisible, so all variability in hours worked is due to fluctuations in the number employed. We find that, unlike previous equilibrium models of the business cycle, this economy displays large fluctuations in hours worked
Random forests
- Machine Learning
, 2001
"... Abstract. Random forests are a combination of tree predictors such that each tree depends on the values of a random vector sampled independently and with the same distribution for all trees in the forest. The generalization error for forests converges a.s. to a limit as the number of trees in the fo ..."
Abstract
-
Cited by 3613 (2 self)
- Add to MetaCart
Abstract. Random forests are a combination of tree predictors such that each tree depends on the values of a random vector sampled independently and with the same distribution for all trees in the forest. The generalization error for forests converges a.s. to a limit as the number of trees
A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection
- INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE
, 1995
"... We review accuracy estimation methods and compare the two most common methods: cross-validation and bootstrap. Recent experimental results on artificial data and theoretical results in restricted settings have shown that for selecting a good classifier from a set of classifiers (model selection), te ..."
Abstract
-
Cited by 1283 (11 self)
- Add to MetaCart
We review accuracy estimation methods and compare the two most common methods: cross-validation and bootstrap. Recent experimental results on artificial data and theoretical results in restricted settings have shown that for selecting a good classifier from a set of classifiers (model selection
SCRIBE: A large-scale and decentralized application-level multicast infrastructure
- IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS (JSAC
, 2002
"... This paper presents Scribe, a scalable application-level multicast infrastructure. Scribe supports large numbers of groups, with a potentially large number of members per group. Scribe is built on top of Pastry, a generic peer-to-peer object location and routing substrate overlayed on the Internet, ..."
Abstract
-
Cited by 658 (29 self)
- Add to MetaCart
This paper presents Scribe, a scalable application-level multicast infrastructure. Scribe supports large numbers of groups, with a potentially large number of members per group. Scribe is built on top of Pastry, a generic peer-to-peer object location and routing substrate overlayed on the Internet
Results 1 - 10
of
99,011