• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations

Tools

Sorted by:
Try your query at:
Semantic Scholar Scholar Academic
Google Bing DBLP
Results 1 - 10 of 29,786
Next 10 →

Very simple classification rules perform well on most commonly used datasets

by Robert C. Holte - Machine Learning , 1993
"... The classification rules induced by machine learning systems are judged by two criteria: their classification accuracy on an independent test set (henceforth "accuracy"), and their complexity. The relationship between these two criteria is, of course, of keen interest to the machin ..."
Abstract - Cited by 547 (5 self) - Add to MetaCart
to the machine learning community. There are in the literature some indications that very simple rules may achieve surprisingly high accuracy on many datasets. For example, Rendell occasionally remarks that many real world datasets have "few peaks (often just one) " and so are &

Cached Sufficient Statistics for Efficient Machine Learning with Large Datasets

by Andrew Moore, Mary Soon Lee - Journal of Artificial Intelligence Research , 1997
"... This paper introduces new algorithms and data structures for quick counting for machine learning datasets. We focus on the counting task of constructing contingency tables, but our approach is also applicable to counting the number of records in a dataset that match conjunctive queries. Subject to c ..."
Abstract - Cited by 146 (20 self) - Add to MetaCart
-case bounds for this structure for several models of data distribution. We empirically demonstrate that tractably-sized data structures can be produced for large real-world datasets by (a) using a sparse tree structure that never allocates memory for counts of zero, (b) never allocating memory for counts

The Data Grid: Towards an Architecture for the Distributed Management and Analysis of Large Scientific Datasets

by Ann Chervenak , Ian Foster, Carl Kesselman, Charles Salisbury, Steven Tuecke - JOURNAL OF NETWORK AND COMPUTER APPLICATIONS , 1999
"... In an increasing number of scientific disciplines, large data collections are emerging as important community resources. In this paper, we introduce design principles for a data management architecture called the Data Grid. We describe two basic services that we believe are fundamental to the des ..."
Abstract - Cited by 471 (41 self) - Add to MetaCart
In an increasing number of scientific disciplines, large data collections are emerging as important community resources. In this paper, we introduce design principles for a data management architecture called the Data Grid. We describe two basic services that we believe are fundamental

Imagenet: A large-scale hierarchical image database

by Jia Deng, Wei Dong, Richard Socher, Li-jia Li, Kai Li, Li Fei-fei - In CVPR , 2009
"... The explosion of image data on the Internet has the potential to foster more sophisticated and robust models and algorithms to index, retrieve, organize and interact with images and multimedia data. But exactly how such data can be harnessed and organized remains a critical problem. We introduce her ..."
Abstract - Cited by 840 (28 self) - Add to MetaCart
datasets. Constructing such a large-scale database is a challenging task. We describe the data collection scheme with Amazon Mechanical Turk. Lastly, we illustrate the usefulness of ImageNet through three simple applications in object recognition, image classification and automatic object clustering. We

BIRCH: an efficient data clustering method for very large databases

by Tian Zhang, Raghu Ramakrishnan, Miron Livny - In Proc. of the ACM SIGMOD Intl. Conference on Management of Data (SIGMOD , 1996
"... Finding useful patterns in large datasets has attracted considerable interest recently, and one of the most widely st,udied problems in this area is the identification of clusters, or deusel y populated regions, in a multi-dir nensional clataset. Prior work does not adequately address the problem of ..."
Abstract - Cited by 576 (2 self) - Add to MetaCart
Finding useful patterns in large datasets has attracted considerable interest recently, and one of the most widely st,udied problems in this area is the identification of clusters, or deusel y populated regions, in a multi-dir nensional clataset. Prior work does not adequately address the problem

Learning probabilistic relational models

by Nir Friedman, Lise Getoor, Daphne Koller, Avi Pfeffer - In IJCAI , 1999
"... A large portion of real-world data is stored in commercial relational database systems. In contrast, most statistical learning methods work only with "flat " data representations. Thus, to apply these methods, we are forced to convert our data into a flat form, thereby losing much ..."
Abstract - Cited by 613 (30 self) - Add to MetaCart
A large portion of real-world data is stored in commercial relational database systems. In contrast, most statistical learning methods work only with "flat " data representations. Thus, to apply these methods, we are forced to convert our data into a flat form, thereby losing much

Group formation in large social networks: membership, growth, and evolution

by Lars Backstrom, Dan Huttenlocher, Jon Kleinberg, Xiangyang Lan - IN KDD ’06: PROCEEDINGS OF THE 12TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING , 2006
"... The processes by which communities come together, attract new members, and develop over time is a central research issue in the social sciences — political movements, professional organizations, and religious denominations all provide fundamental examples of such communities. In the digital domain, ..."
Abstract - Cited by 496 (19 self) - Add to MetaCart
, on-line groups are becoming increasingly prominent due to the growth of community and social networking sites such as MySpace and LiveJournal. However, the challenge of collecting and analyzing large-scale timeresolved data on social groups and communities has left most basic questions about

An empirical comparison of voting classification algorithms: Bagging, boosting, and variants.

by Eric Bauer , Philip Chan , Salvatore Stolfo , David Wolpert - Machine Learning, , 1999
"... Abstract. Methods for voting classification algorithms, such as Bagging and AdaBoost, have been shown to be very successful in improving the accuracy of certain classifiers for artificial and real-world datasets. We review these algorithms and describe a large empirical study comparing several vari ..."
Abstract - Cited by 707 (2 self) - Add to MetaCart
Abstract. Methods for voting classification algorithms, such as Bagging and AdaBoost, have been shown to be very successful in improving the accuracy of certain classifiers for artificial and real-world datasets. We review these algorithms and describe a large empirical study comparing several

A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection

by Ron Kohavi - INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE , 1995
"... We review accuracy estimation methods and compare the two most common methods: cross-validation and bootstrap. Recent experimental results on artificial data and theoretical results in restricted settings have shown that for selecting a good classifier from a set of classifiers (model selection), te ..."
Abstract - Cited by 1283 (11 self) - Add to MetaCart
), ten-fold cross-validation may be better than the more expensive leaveone-out cross-validation. We report on a largescale experiment -- over half a million runs of C4.5 and a Naive-Bayes algorithm -- to estimate the effects of different parameters on these algorithms on real-world datasets. For cross

Using Discriminant Eigenfeatures for Image Retrieval

by Daniel L. Swets, John Weng , 1996
"... This paper describes the automatic selection of features from an image training set using the theories of multi-dimensional linear discriminant analysis and the associated optimal linear projection. We demonstrate the effectiveness of these Most Discriminating Features for view-based class retrieval ..."
Abstract - Cited by 508 (15 self) - Add to MetaCart
retrieval from a large database of widely varying real-world objects presented as "well-framed" views, and compare it with that of the principal component analysis.
Next 10 →
Results 1 - 10 of 29,786
Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University