Results 1 - 10
of
29,786
Very simple classification rules perform well on most commonly used datasets
- Machine Learning
, 1993
"... The classification rules induced by machine learning systems are judged by two criteria: their classification accuracy on an independent test set (henceforth "accuracy"), and their complexity. The relationship between these two criteria is, of course, of keen interest to the machin ..."
Abstract
-
Cited by 547 (5 self)
- Add to MetaCart
to the machine learning community. There are in the literature some indications that very simple rules may achieve surprisingly high accuracy on many datasets. For example, Rendell occasionally remarks that many real world datasets have "few peaks (often just one) " and so are &
Cached Sufficient Statistics for Efficient Machine Learning with Large Datasets
- Journal of Artificial Intelligence Research
, 1997
"... This paper introduces new algorithms and data structures for quick counting for machine learning datasets. We focus on the counting task of constructing contingency tables, but our approach is also applicable to counting the number of records in a dataset that match conjunctive queries. Subject to c ..."
Abstract
-
Cited by 146 (20 self)
- Add to MetaCart
-case bounds for this structure for several models of data distribution. We empirically demonstrate that tractably-sized data structures can be produced for large real-world datasets by (a) using a sparse tree structure that never allocates memory for counts of zero, (b) never allocating memory for counts
The Data Grid: Towards an Architecture for the Distributed Management and Analysis of Large Scientific Datasets
- JOURNAL OF NETWORK AND COMPUTER APPLICATIONS
, 1999
"... In an increasing number of scientific disciplines, large data collections are emerging as important community resources. In this paper, we introduce design principles for a data management architecture called the Data Grid. We describe two basic services that we believe are fundamental to the des ..."
Abstract
-
Cited by 471 (41 self)
- Add to MetaCart
In an increasing number of scientific disciplines, large data collections are emerging as important community resources. In this paper, we introduce design principles for a data management architecture called the Data Grid. We describe two basic services that we believe are fundamental
Imagenet: A large-scale hierarchical image database
- In CVPR
, 2009
"... The explosion of image data on the Internet has the potential to foster more sophisticated and robust models and algorithms to index, retrieve, organize and interact with images and multimedia data. But exactly how such data can be harnessed and organized remains a critical problem. We introduce her ..."
Abstract
-
Cited by 840 (28 self)
- Add to MetaCart
datasets. Constructing such a large-scale database is a challenging task. We describe the data collection scheme with Amazon Mechanical Turk. Lastly, we illustrate the usefulness of ImageNet through three simple applications in object recognition, image classification and automatic object clustering. We
BIRCH: an efficient data clustering method for very large databases
- In Proc. of the ACM SIGMOD Intl. Conference on Management of Data (SIGMOD
, 1996
"... Finding useful patterns in large datasets has attracted considerable interest recently, and one of the most widely st,udied problems in this area is the identification of clusters, or deusel y populated regions, in a multi-dir nensional clataset. Prior work does not adequately address the problem of ..."
Abstract
-
Cited by 576 (2 self)
- Add to MetaCart
Finding useful patterns in large datasets has attracted considerable interest recently, and one of the most widely st,udied problems in this area is the identification of clusters, or deusel y populated regions, in a multi-dir nensional clataset. Prior work does not adequately address the problem
Learning probabilistic relational models
- In IJCAI
, 1999
"... A large portion of real-world data is stored in commercial relational database systems. In contrast, most statistical learning methods work only with "flat " data representations. Thus, to apply these methods, we are forced to convert our data into a flat form, thereby losing much ..."
Abstract
-
Cited by 613 (30 self)
- Add to MetaCart
A large portion of real-world data is stored in commercial relational database systems. In contrast, most statistical learning methods work only with "flat " data representations. Thus, to apply these methods, we are forced to convert our data into a flat form, thereby losing much
Group formation in large social networks: membership, growth, and evolution
- IN KDD ’06: PROCEEDINGS OF THE 12TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING
, 2006
"... The processes by which communities come together, attract new members, and develop over time is a central research issue in the social sciences — political movements, professional organizations, and religious denominations all provide fundamental examples of such communities. In the digital domain, ..."
Abstract
-
Cited by 496 (19 self)
- Add to MetaCart
, on-line groups are becoming increasingly prominent due to the growth of community and social networking sites such as MySpace and LiveJournal. However, the challenge of collecting and analyzing large-scale timeresolved data on social groups and communities has left most basic questions about
An empirical comparison of voting classification algorithms: Bagging, boosting, and variants.
- Machine Learning,
, 1999
"... Abstract. Methods for voting classification algorithms, such as Bagging and AdaBoost, have been shown to be very successful in improving the accuracy of certain classifiers for artificial and real-world datasets. We review these algorithms and describe a large empirical study comparing several vari ..."
Abstract
-
Cited by 707 (2 self)
- Add to MetaCart
Abstract. Methods for voting classification algorithms, such as Bagging and AdaBoost, have been shown to be very successful in improving the accuracy of certain classifiers for artificial and real-world datasets. We review these algorithms and describe a large empirical study comparing several
A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection
- INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE
, 1995
"... We review accuracy estimation methods and compare the two most common methods: cross-validation and bootstrap. Recent experimental results on artificial data and theoretical results in restricted settings have shown that for selecting a good classifier from a set of classifiers (model selection), te ..."
Abstract
-
Cited by 1283 (11 self)
- Add to MetaCart
), ten-fold cross-validation may be better than the more expensive leaveone-out cross-validation. We report on a largescale experiment -- over half a million runs of C4.5 and a Naive-Bayes algorithm -- to estimate the effects of different parameters on these algorithms on real-world datasets. For cross
Using Discriminant Eigenfeatures for Image Retrieval
, 1996
"... This paper describes the automatic selection of features from an image training set using the theories of multi-dimensional linear discriminant analysis and the associated optimal linear projection. We demonstrate the effectiveness of these Most Discriminating Features for view-based class retrieval ..."
Abstract
-
Cited by 508 (15 self)
- Add to MetaCart
retrieval from a large database of widely varying real-world objects presented as "well-framed" views, and compare it with that of the principal component analysis.
Results 1 - 10
of
29,786