COST-SENSITIVE INFORMATION ACQUISITION IN STRUCTURED DOMAINS (2010)
BibTeX
@MISC{Bilgic10cost-sensitiveinformation,
author = {Mustafa Bilgic},
title = { COST-SENSITIVE INFORMATION ACQUISITION IN STRUCTURED DOMAINS},
year = {2010}
}
OpenURL
Abstract
Many real-world prediction tasks require collecting information about the domain entities to achieve better predictive performance. Collecting the additional information is often a costly process (money, time, risk, etc.) that involves acquiring the features describing the entities and annotating the entities with target concepts and labels. For example, document collections need to be manually annotated for document classification and lab tests need to be ordered for medical diagnosis. Annotating the whole document collection and ordering all possible lab tests might be infeasible due to limited resources or may prove unnecessary. Thus, we need to be selective about which entity we annotate and which features we acquire. In this thesis, I explore effective and efficient ways of choosing the right information to acquire under limited resources. Specifically, I develop and empirically evaluate algorithms for feature and label acquisition in structured domains. For the problem of feature acquisition, we are given entities with missing features and the task is to classify them with minimum misclassification cost. Thelikelihood of misclassification can be reduced by acquiring features but acquiring







