• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations

Tools

Sorted by:
Try your query at:
Semantic Scholar Scholar Academic
Google Bing DBLP
Results 1 - 10 of 4,053
Next 10 →

Large Landscape Conservation — Synthetic and Real-World Datasets

by Bistra Dilkina, Katherine Lai, Ronan Le Bras, Yexiang Xue, Carla P. Gomes, Ashish Sabharwal, Jordan Suter, Kevin S. Mckelvey, Michael K. Schwartz, Claire Montgomery
"... Biodiversity underpins ecosystem goods and services and hence protecting it is key to achieving sustainability. How-ever, the persistence of many species is threatened by habitat loss and fragmentation due to human land use and climate change. Conservation efforts are implemented under very limited ..."
Abstract - Add to MetaCart
Biodiversity underpins ecosystem goods and services and hence protecting it is key to achieving sustainability. How-ever, the persistence of many species is threatened by habitat loss and fragmentation due to human land use and climate change. Conservation efforts are implemented under very limited economic resources, and therefore designing scal-able, cost-efficient and systematic approaches for conserva-tion planning is an important and challenging computational task. In particular, preserving landscape connectivity be-tween good habitat has become a key conservation priority in recent years. We give an overview of landscape connectiv-ity conservation and some of the underlying graph-theoretic optimization problems. We present a synthetic generator ca-pable of creating families of randomized structured problems,

Working with Real-World Datasets Preprocessing and prediction with large incomplete and heterogeneous datasets

by Der Technischen Universität Berlin, Doktor Der Naturwissenschaften, Vorsitzender Prof, Dr. F. Wysotzki, Berichter Prof, Dr. K. Obermayer, Berichter Prof, Dr. T. Scheffer, Holger Schöner, Holger Schöner , 2004
"... and prediction with large incomplete and heterogeneous datasets ..."
Abstract - Add to MetaCart
and prediction with large incomplete and heterogeneous datasets

Learning to Detect Traffic Signs: Comparative Evaluation of Synthetic and Real-world Datasets

by Andreas Møgelmose, Mohan M. Trivedi, Thomas B. Moeslund
"... This study compares the performance of sign detection based on synthetic training data to the performance of detection based on real-world training images. Viola-Jones detectors are created for 4 different traffic signs with both synthetic and real data, and varying numbers of training samples. The ..."
Abstract - Cited by 6 (4 self) - Add to MetaCart
This study compares the performance of sign detection based on synthetic training data to the performance of detection based on real-world training images. Viola-Jones detectors are created for 4 different traffic signs with both synthetic and real data, and varying numbers of training samples

Reducing noise in labels and features for a real world dataset: application of NLP corpus annotation methods

by Rebecca J. Passonneau, Cynthia Rudin, Axinia Radeva, Zhi An Liu - In Proceedings of the 10th international , 2009
"... Abstract. This paper illustrates how a combination of information extraction, machine learning, and NLP corpus annotation practice was applied to a problem of ranking vulnerability of structures (service boxes, manholes) in the Manhattan electrical grid. By adapting NLP corpus annotation methods to ..."
Abstract - Cited by 4 (4 self) - Add to MetaCart
Abstract. This paper illustrates how a combination of information extraction, machine learning, and NLP corpus annotation practice was applied to a problem of ranking vulnerability of structures (service boxes, manholes) in the Manhattan electrical grid. By adapting NLP corpus annotation methods to the task of knowledge transfer from domain experts, we compensated for the lack of operational definitions of components of the model, such as serious event. The machine learning depended on the ticket classes, but it was not the end goal. Rather, our rule-based document classification determines both the labels of examples and their feature representations. Changes in our classification of events led to improvements in our model, as reflected in the AUC scores for the full ranked list of over 51K structures. The improvements for the very top of the ranked list, which is of most importance for prioritizing work on the electrical grid, affected one in every four or five structures. 1

unknown title

by unknown authors
"... tionally on both simulated and real world datasets. ..."
Abstract - Add to MetaCart
tionally on both simulated and real world datasets.

Very simple classification rules perform well on most commonly used datasets

by Robert C. Holte - Machine Learning , 1993
"... The classification rules induced by machine learning systems are judged by two criteria: their classification accuracy on an independent test set (henceforth "accuracy"), and their complexity. The relationship between these two criteria is, of course, of keen interest to the machin ..."
Abstract - Cited by 547 (5 self) - Add to MetaCart
to the machine learning community. There are in the literature some indications that very simple rules may achieve surprisingly high accuracy on many datasets. For example, Rendell occasionally remarks that many real world datasets have "few peaks (often just one) " and so are &

A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection

by Ron Kohavi - INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE , 1995
"... We review accuracy estimation methods and compare the two most common methods: cross-validation and bootstrap. Recent experimental results on artificial data and theoretical results in restricted settings have shown that for selecting a good classifier from a set of classifiers (model selection), te ..."
Abstract - Cited by 1283 (11 self) - Add to MetaCart
), ten-fold cross-validation may be better than the more expensive leaveone-out cross-validation. We report on a largescale experiment -- over half a million runs of C4.5 and a Naive-Bayes algorithm -- to estimate the effects of different parameters on these algorithms on real-world datasets. For cross

An empirical comparison of voting classification algorithms: Bagging, boosting, and variants.

by Eric Bauer , Philip Chan , Salvatore Stolfo , David Wolpert - Machine Learning, , 1999
"... Abstract. Methods for voting classification algorithms, such as Bagging and AdaBoost, have been shown to be very successful in improving the accuracy of certain classifiers for artificial and real-world datasets. We review these algorithms and describe a large empirical study comparing several vari ..."
Abstract - Cited by 707 (2 self) - Add to MetaCart
Abstract. Methods for voting classification algorithms, such as Bagging and AdaBoost, have been shown to be very successful in improving the accuracy of certain classifiers for artificial and real-world datasets. We review these algorithms and describe a large empirical study comparing several

On the Need for Time Series Data Mining Benchmarks: A Survey and Empirical Demonstration

by Eamonn Keogh, Shruti Kasetty - SIGKDD'02 , 2002
"... ... mining time series data. Literally hundreds of papers have introduced new algorithms to index, classify, cluster and segment time series. In this work we make the following claim. Much of this work has very little utility because the contribution made (speed in the case of indexing, accuracy in ..."
Abstract - Cited by 325 (59 self) - Add to MetaCart
in the case of classification and clustering, model accuracy in the case of segmentation) offer an amount of "improvement" that would have been completely dwarfed by the variance that would have been observed by testing on many real world datasets, or the variance that would have been observed

Learning probabilistic relational models

by Nir Friedman, Lise Getoor, Daphne Koller, Avi Pfeffer - In IJCAI , 1999
"... A large portion of real-world data is stored in commercial relational database systems. In contrast, most statistical learning methods work only with "flat " data representations. Thus, to apply these methods, we are forced to convert our data into a flat form, thereby losing much ..."
Abstract - Cited by 613 (30 self) - Add to MetaCart
A large portion of real-world data is stored in commercial relational database systems. In contrast, most statistical learning methods work only with "flat " data representations. Thus, to apply these methods, we are forced to convert our data into a flat form, thereby losing much
Next 10 →
Results 1 - 10 of 4,053
Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University