Results 1  10
of
456
A DecisionTheoretic Generalization of onLine Learning and an Application to Boosting
, 1997
"... In the first part of the paper we consider the problem of dynamically apportioning resources among a set of options in a worstcase online framework. The model we study can be interpreted as a broad, abstract extension of the wellstudied online prediction model to a general decisiontheoretic set ..."
Abstract

Cited by 2307 (59 self)
 Add to MetaCart
In the first part of the paper we consider the problem of dynamically apportioning resources among a set of options in a worstcase online framework. The model we study can be interpreted as a broad, abstract extension of the wellstudied online prediction model to a general decisiontheoretic setting. We show that the multiplicative weightupdate rule of Littlestone and Warmuth [20] can be adapted to this model yielding bounds that are slightly weaker in some cases, but applicable to a considerably more general class of learning problems. We show how the resulting learning algorithm can be applied to a variety of problems, including gambling, multipleoutcome prediction, repeated games and prediction of points in R n . In the second part of the paper we apply the multiplicative weightupdate technique to derive a new boosting algorithm. This boosting algorithm does not require any prior knowledge about the performance of the weak learning algorithm. We also study generalizations of...
Additive Logistic Regression: a Statistical View of Boosting
 Annals of Statistics
, 1998
"... Boosting (Freund & Schapire 1996, Schapire & Singer 1998) is one of the most important recent developments in classification methodology. The performance of many classification algorithms can often be dramatically improved by sequentially applying them to reweighted versions of the input data, and t ..."
Abstract

Cited by 1217 (21 self)
 Add to MetaCart
Boosting (Freund & Schapire 1996, Schapire & Singer 1998) is one of the most important recent developments in classification methodology. The performance of many classification algorithms can often be dramatically improved by sequentially applying them to reweighted versions of the input data, and taking a weighted majority vote of the sequence of classifiers thereby produced. We show that this seemingly mysterious phenomenon can be understood in terms of well known statistical principles, namely additive modeling and maximum likelihood. For the twoclass problem, boosting can be viewed as an approximation to additive modeling on the logistic scale using maximum Bernoulli likelihood as a criterion. We develop more direct approximations and show that they exhibit nearly identical results to boosting. Direct multiclass generalizations based on multinomial likelihood are derived that exhibit performance comparable to other recently proposed multiclass generalizations of boosting in most...
Wrapper Induction for Information Extraction
, 1997
"... The Internet presents numerous sources of useful informationtelephone directories, product catalogs, stock quotes, weather forecasts, etc. Recently, many systems have been built that automatically gather and manipulate such information on a user's behalf. However, these resources are usually form ..."
Abstract

Cited by 519 (30 self)
 Add to MetaCart
The Internet presents numerous sources of useful informationtelephone directories, product catalogs, stock quotes, weather forecasts, etc. Recently, many systems have been built that automatically gather and manipulate such information on a user's behalf. However, these resources are usually formatted for use by people (e.g., the relevant content is embedded in HTML pages), so extracting their content is difficult. Wrappers are often used for this purpose. A wrapper is a procedure for extracting a particular resource's content. Unfortunately, handcoding wrappers is tedious. We introduce wrapper induction, a technique for automatically constructing wrappers. Our techniques can be described in terms of three main contributions. First, we pose the problem of wrapper construction as one of inductive learn...
An Efficient Boosting Algorithm for Combining Preferences
, 1999
"... The problem of combining preferences arises in several applications, such as combining the results of different search engines. This work describes an efficient algorithm for combining multiple preferences. We first give a formal framework for the problem. We then describe and analyze a new boosting ..."
Abstract

Cited by 515 (18 self)
 Add to MetaCart
The problem of combining preferences arises in several applications, such as combining the results of different search engines. This work describes an efficient algorithm for combining multiple preferences. We first give a formal framework for the problem. We then describe and analyze a new boosting algorithm for combining preferences called RankBoost. We also describe an efficient implementation of the algorithm for certain natural cases. We discuss two experiments we carried out to assess the performance of RankBoost. In the first experiment, we used the algorithm to combine different WWW search strategies, each of which is a query expansion for a given domain. For this task, we compare the performance of RankBoost to the individual search strategies. The second experiment is a collaborativefiltering task for making movie recommendations. Here, we present results comparing RankBoost to nearestneighbor and regression algorithms.
Selection of relevant features and examples in machine learning
 ARTIFICIAL INTELLIGENCE
, 1997
"... In this survey, we review work in machine learning on methods for handling data sets containing large amounts of irrelevant information. We focus on two key issues: the problem of selecting relevant features, and the problem of selecting relevant examples. We describe the advances that have been mad ..."
Abstract

Cited by 423 (1 self)
 Add to MetaCart
In this survey, we review work in machine learning on methods for handling data sets containing large amounts of irrelevant information. We focus on two key issues: the problem of selecting relevant features, and the problem of selecting relevant examples. We describe the advances that have been made on these topics in both empirical and theoretical work in machine learning, and we present a general framework that we use to compare different methods. We close with some challenges for future work in this area.
Learnability in Optimality Theory
, 1995
"... In this article we show how Optimality Theory yields a highly general Constraint Demotion principle for grammar learning. The resulting learning procedure specifically exploits the grammatical structure of Optimality Theory, independent of the content of substantive constraints defining any given gr ..."
Abstract

Cited by 373 (31 self)
 Add to MetaCart
In this article we show how Optimality Theory yields a highly general Constraint Demotion principle for grammar learning. The resulting learning procedure specifically exploits the grammatical structure of Optimality Theory, independent of the content of substantive constraints defining any given grammatical module. We decompose the learning problem and present formal results for a central subproblem, deducing the constraint ranking particular to a target language, given structural descriptions of positive examples. The structure imposed on the space of possible grammars by Optimality Theory allows efficient convergence to a correct grammar. We discuss implications for learning from overt data only, as well as other learning issues. We argue that Optimality Theory promotes confluence of the demands of more effective learnability and deeper linguistic explanation.
Efficient noisetolerant learning from statistical queries
 JOURNAL OF THE ACM
, 1998
"... In this paper, we study the problem of learning in the presence of classification noise in the probabilistic learning model of Valiant and its variants. In order to identify the class of “robust” learning algorithms in the most general way, we formalize a new but related model of learning from stat ..."
Abstract

Cited by 288 (6 self)
 Add to MetaCart
In this paper, we study the problem of learning in the presence of classification noise in the probabilistic learning model of Valiant and its variants. In order to identify the class of “robust” learning algorithms in the most general way, we formalize a new but related model of learning from statistical queries. Intuitively, in this model, a learning algorithm is forbidden to examine individual examples of the unknown target function, but is given access to an oracle providing estimates of probabilities over the sample space of random examples. One of our main results shows that any class of functions learnable from statistical queries is in fact learnable with classification noise in Valiant’s model, with a noise rate approaching the informationtheoretic barrier of 1/2. We then demonstrate the generality of the statistical query model, showing that practically every class learnable in Valiant’s model and its variants can also be learned in the new model (and thus can be learned in the presence of noise). A notable exception to this statement is the class of parity functions, which we prove is not learnable from statistical queries, and for which no noisetolerant algorithm is known.
On the (im)possibility of obfuscating programs
 Lecture Notes in Computer Science
, 2001
"... Informally, an obfuscator O is an (efficient, probabilistic) “compiler ” that takes as input a program (or circuit) P and produces a new program O(P) that has the same functionality as P yet is “unintelligible ” in some sense. Obfuscators, if they exist, would have a wide variety of cryptographic an ..."
Abstract

Cited by 189 (10 self)
 Add to MetaCart
Informally, an obfuscator O is an (efficient, probabilistic) “compiler ” that takes as input a program (or circuit) P and produces a new program O(P) that has the same functionality as P yet is “unintelligible ” in some sense. Obfuscators, if they exist, would have a wide variety of cryptographic and complexitytheoretic applications, ranging from software protection to homomorphic encryption to complexitytheoretic analogues of Rice’s theorem. Most of these applications are based on an interpretation of the “unintelligibility ” condition in obfuscation as meaning that O(P) is a “virtual black box, ” in the sense that anything one can efficiently compute given O(P), one could also efficiently compute given oracle access to P. In this work, we initiate a theoretical investigation of obfuscation. Our main result is that, even under very weak formalizations of the above intuition, obfuscation is impossible. We prove this by constructing a family of efficient programs P that are unobfuscatable in the sense that (a) given any efficient program P ′ that computes the same function as a program P ∈ P, the “source code ” P can be efficiently reconstructed, yet (b) given oracle access to a (randomly selected) program P ∈ P, no efficient algorithm can reconstruct P (or even distinguish a certain bit in the code from random) except with negligible probability. We extend our impossibility result in a number of ways, including even obfuscators that (a) are not necessarily computable in polynomial time, (b) only approximately preserve the functionality, and (c) only need to work for very restricted models of computation (TC 0). We also rule out several potential applications of obfuscators, by constructing “unobfuscatable” signature schemes, encryption schemes, and pseudorandom function families.
Learning Trees and Rules with Setvalued Features
, 1996
"... In most learning systems examples are represented as fixedlength "feature vectors", the components of which are either real numbers or nominal values. We propose an extension of the featurevector representation that allows the value of a feature to be a set of strings; for instance, to represent a ..."
Abstract

Cited by 182 (2 self)
 Add to MetaCart
In most learning systems examples are represented as fixedlength "feature vectors", the components of which are either real numbers or nominal values. We propose an extension of the featurevector representation that allows the value of a feature to be a set of strings; for instance, to represent a small white and black dog with the nominal features size and species and the setvalued feature color, one might use a feature vector with size=small, species=canisfamiliaris and color=fwhite,blackg. Since we make no assumptions about the number of possible set elements, this extension of the traditional featurevector representation is closely connected to Blum's "infinite attribute" representation. We argue that many decision tree and rule learning algorithms can be easily extended to setvalued features. We also show by example that many realworld learning problems can be efficiently and naturally represented with setvalued features; in particular, text categorization problems and probl...