Results 1 -
9 of
9
Active Feature-Value Acquisition for Classifier Induction
, 2004
"... Many induction problems, such as on-line customer profiling, include missing data that can be acquired at a cost, such as incomplete customer information that can be filled in by an intermediary. For building accurate predictive models, acquiring complete information for all instances is often prohi ..."
Abstract
-
Cited by 18 (10 self)
- Add to MetaCart
Many induction problems, such as on-line customer profiling, include missing data that can be acquired at a cost, such as incomplete customer information that can be filled in by an intermediary. For building accurate predictive models, acquiring complete information for all instances is often prohibitively expensive or unnecessary. Randomly selecting instances for feature acquisition allows a representative sampling, but does not incorporate estimations of the value of acquisition. Active feature acquisition aims at reducing the cost of achieving a desired model accuracy by identifying instances for which complete information is most informative to obtain. We present approaches in which instances are selected for feature acquisition based on the current model's ability to predict accurately and the model's confidence in its prediction. Experimental results on several real-world data sets demonstrate that these approaches can induce accurate models using substantially fewer feature acquisitions, and suggest promising directions for improvements.
Economical Active Feature-value Acquisition through Expected Utility Estimation
- In Proceedings of the KDD05 Workshop on Utility-Based Data Mining
, 2005
"... In many classification tasks training data have missing feature values that can be acquired at a cost. For building accurate predictive models, acquiring all missing values is often prohibitively expensive or unnecessary, while acquiring a random subset of feature values may not be most effective. T ..."
Abstract
-
Cited by 14 (2 self)
- Add to MetaCart
In many classification tasks training data have missing feature values that can be acquired at a cost. For building accurate predictive models, acquiring all missing values is often prohibitively expensive or unnecessary, while acquiring a random subset of feature values may not be most effective. The goal of active feature-value acquisition is to incrementally select feature values that are most cost-effective for improving the model’s accuracy. We present two policies, Sampled Expected Utility and Expected Utility-ES, that acquire feature values for inducing a classification model based on an estimation of the expected improvement in model accuracy per unit cost. A comparison of the two policies to each other and to alternative policies demonstrate that Sampled Expected Utility is preferable as it effectively reduces the cost of producing a model of a desired accuracy and exhibits a consistent performance across domains.
Curious Machines: Active Learning with Structured Instances
, 2008
"... and for Natalie, who now piques it. i ii Supervised machine learning is a branch of artificial intelligence concerned with automatically inducing predictive models from labeled data. Such learning approaches are useful for many interesting real-world applications, but particularly shine for tasks in ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
and for Natalie, who now piques it. i ii Supervised machine learning is a branch of artificial intelligence concerned with automatically inducing predictive models from labeled data. Such learning approaches are useful for many interesting real-world applications, but particularly shine for tasks involving the automatic organization, extraction, and retrieval of information from large collections of data (e.g., text, images, and other digital media). In traditional supervised learning, one uses “labeled ” training data to induce a model. However, labeled instances for real-world applications are often difficult, expensive, or time consuming to obtain. Consider a complex task such as extracting key person and organization names from text documents. While gathering large amounts of unlabeled documents for these tasks is often relatively easy (e.g., from the World Wide Web), labeling these texts usually requires experienced human annotators with specific domain knowledge and training. There are implicit costs associated with obtaining these labels from domain experts, such as limited time and financial resources. This
Active sampling for knowledge discovery from biomedical data. PKDD
, 2005
"... Abstract. We describe work aimed at cost-constrained knowledge discovery in the biomedical domain. To improve the diagnostic/prognostic models of cancer, new biomarkers are studied by researchers that might provide predictive information. Biological samples from monitored patients are selected and a ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Abstract. We describe work aimed at cost-constrained knowledge discovery in the biomedical domain. To improve the diagnostic/prognostic models of cancer, new biomarkers are studied by researchers that might provide predictive information. Biological samples from monitored patients are selected and analyzed for determining the predictive power of the biomarker. During the process of biomarker evaluation, portions of the samples are consumed, limiting the number of measurements that can be performed. The biological samples obtained from carefully monitored patients, that are well annotated with pathological information, are a valuable resource that must be conserved. We present an active sampling algorithm derived from statistical first principles to incrementally choose the samples that are most informative in estimating the efficacy of the candidate biomarker. We provide empirical evidence on real biomedical data that our active sampling algorithm requires significantly fewer samples than random sampling to ascertain the efficacy of the new biomarker. 1
Active feature-value acquisition for classifier induction
- In Proceedings of the Fourth IEEE International Conference on Data Mining (ICDM’04
, 2004
"... Many induction problems, such as on-line customer profiling, include missing data that can be acquired at a cost, such as incomplete customer information that can be filled in by an intermediary. For building accurate predictive models, acquiring complete information for all instances is often prohi ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Many induction problems, such as on-line customer profiling, include missing data that can be acquired at a cost, such as incomplete customer information that can be filled in by an intermediary. For building accurate predictive models, acquiring complete information for all instances is often prohibitively expensive or unnecessary. Randomly selecting instances for feature acquisition allows a representative sampling, but does not incorporate other value estimations of acquisition. Active feature-value acquisition aims at reducing the cost of achieving a desired model accuracy by identifying instances for which complete information is most informative to obtain. We present approaches in which instances are selected for feature acquisition based on the current model’s ability to predict accurately and the model’s confidence in its prediction. Experimental results on several real-world data sets demonstrate that our approach can induce accurate models using substantially fewer feature-value acquisitions as compared to a baseline policy and a previously-published approach. 1
Customer Targeting Models Using Actively-Selected Web Content
"... We consider the problem of predicting the likelihood that a company will purchase a new product from a seller. The statistical models we have developed at IBM for this purpose rely on historical transaction data coupled with structured firmographic information like the company revenue, number of emp ..."
Abstract
- Add to MetaCart
We consider the problem of predicting the likelihood that a company will purchase a new product from a seller. The statistical models we have developed at IBM for this purpose rely on historical transaction data coupled with structured firmographic information like the company revenue, number of employees and so on. In this paper, we extend this methodology to include additional text-based features based on analysis of the content on each company’s website. Empirical results demonstrate that incorporating such web content can significantly improve customer targeting. Furthermore, we present methods to actively select only the web content that is likely to improve our models, while reducing the costs of acquisition and processing.
Active Learning of Features and Labels
"... Co-training improves multi-view classifier learning by enforcing internal consistency between the predicted classes of unlabeled objects based on different views (different sets of features for characterizing the same object). In some applications, due to the cost involved in data acquisition, only ..."
Abstract
- Add to MetaCart
Co-training improves multi-view classifier learning by enforcing internal consistency between the predicted classes of unlabeled objects based on different views (different sets of features for characterizing the same object). In some applications, due to the cost involved in data acquisition, only a subset of features may be obtained for many unlabeled objects. Observing additional features of objects that were earlier incompletely characterized, increases the data available for cotraining, hence improving the classification accuracy. This paper addresses the problem of active learning of features: which additional features should be acquired of incompletely characterized objects in order to maximize the accuracy of the learned classifier? Our method, which extends previous techniques for the active learning of labels, is experimentally shown to be effective in a real-life multi-sensor mine detection problem. 1.

