## Learning Default Concepts (1994)

Venue: | In Proceedings of the Tenth Canadian Conference on Artificial Intelligence (CSCSI-94 |

Citations: | 20 - 7 self |

### BibTeX

@INPROCEEDINGS{Schuurmans94learningdefault,

author = {Dale Schuurmans and Russell Greiner},

title = {Learning Default Concepts},

booktitle = {In Proceedings of the Tenth Canadian Conference on Artificial Intelligence (CSCSI-94},

year = {1994},

pages = {519--523}

}

### OpenURL

### Abstract

Classical concepts, based on necessary and sufficient defining conditions, cannot classify logically insufficient object descriptions. Many reasoning systems avoid this limitation by using "default concepts" to classify incompletely described objects. This paper addresses the task of learning such default concepts from observational data. We first model the underlying performance task --- classifying incomplete examples --- as a probabilistic process that passes random test examples through a "blocker" that can hide object attributes from the classifier. We then address the task of learning accurate default concepts from random training examples. After surveying the learning techniques that have been proposed for this task in the machine learning and knowledge representation literatures, and investigating their relative merits, we present a more data-efficient learning technique, developed from well-known statistical principles. Finally, we extend Valiant's pac- learning framework to ...

### Citations

7122 | Probabilistic reasoning in intelligent systems : networks of plausible inference. The Morgan Kaufmann series in representation and reasoning - Pearl - 1988 |

3967 | Classification and Regression Trees - BREIMAN, FRIEDMAN, et al. - 1984 |

3949 |
Pattern Classification and Scene Analysis
- Duda, Hart
- 1973
(Show Context)
Citation Context ...a 2 Under fi I , for any domain distribution PXC , the optimally accurate dcd d makes maximum conditional likelihood (mcl) classifications under PXC , given the observed attributes of an object (cf., =-=[DH73]). T-=-hus, the structure of an optimal dcd d is determined solely by the domain distribution, and we can interpretsd as a collection of assertions about the domain distribution PXC directly: �� x ! c 2 ... |

1707 | A theory of the learnable
- Valiant
- 1984
(Show Context)
Citation Context .... In any successful application, the learning system must be constrained to search a restricted space of appropriate classifiers, which here are dcds. 8 Following the methodology pioneered by Valiant =-=[Val84]-=-, we consider how learning performance scales as a function of prior knowledge. Here we quantify bias by 8 thv and mli are particularly well suited to incorporating background knowledge; as demonstrat... |

1156 |
Statistical analysis with missing data
- Little, Rubin
- 1987
(Show Context)
Citation Context ...nown idea from theoretical statistics is applicable: namely, first determine the maximum likelihood distribution that accounts for all the data, then perform inferences according to this distribution =-=[LR87]-=-. This approach yields an effective method for determining the most likely classifications given incomplete training examples. mli (Maximum Likelihood (Incomplete)) [LR87] First, determine the domain ... |

788 |
A framework for representing knowledge
- Minsky
- 1975
(Show Context)
Citation Context ...nsider when learning dcds; see Footnote 7 below. "RealWorld" PXC - h��x; ci Blocker fi - h��x ; ci ffl ffi fi fl Learner ffi - �� xsffl ffi fi fl Classifier Figure 2: Model of Bl=-=ocking Process frames [Min75]: frame selectors ca-=-n be viewed as "default " sufficient conditions for the frame concept, and frame instantiations can be viewed as "default" necessary conditions. These notions of non-classical conc... |

627 |
Learnability and the VapnikChervonenkis dimension
- Blumer, Ehrenfeucht, et al.
- 1989
(Show Context)
Citation Context ...n attain perfect accuracy in general. Notice also that we are only addressing the sample complexity of learning, not computational complexity. 10 This is the same measure used when learning ccds. See =-=[BEHW89]-=- for a precise definition of VCdim and its application to determining the difficulty of learning sets of ccds. Combining these lemmas yields the intuitive result that learning from complete training e... |

375 | Learning decision lists - Rivest - 1987 |

282 | Heuristic Classification
- Clancey
- 1985
(Show Context)
Citation Context ... learnability results. Appears in the Proceedings of the Tenth Canadian Conference on Artificial Intelligence (CSCSI-94), Banff, May 1994. 1 Introduction Many reasoning tasks involve "classificat=-=ion" [Cla85]-=- --- i.e., determining whether a particular object belongs to a specified class, given a description of that object. For example, a diagnosis process must determine whether a patient, with a specified... |

250 |
Representing and reasoning with probabilistic knowledge
- Bacchus
- 1990
(Show Context)
Citation Context ...ea89]) all satisfy the consistent inheritance axiom and so tacitly assume independent blocking fi I . Here the meaning of a rule �� x !c can be given a "majority" semantics under fi I ak=-=in to that of [Bac90]-=-. 3.2 Arbitrary blocking While fi I is a simple and convenient model, it does not capture every practical situation; in particular, it cannot deal with circumstances where our knowledge of an attribut... |

198 | Efficient distribution-free learning of probabilistic concepts
- Kearns, Schapire
- 1994
(Show Context)
Citation Context ...ity: A system that learns with attribute noise [SV88] does not know which attribute values have been corrupted; by contrast, we know explicitly which values are missing. Also, a probabilistic concept =-=[KS90]-=- is a mappingsc i : Xn 7! [0; 1] from the space of complete object descriptions Xn to probability values; such mappings do not directly handle missing attribute values. 2 Default Concepts Following st... |

106 | Concept Learning and Heuristic Classification in Weak-Theory Domains
- Porter, Bareiss, et al.
- 1990
(Show Context)
Citation Context ...o explicitly extract the knowledge of domain experts, it makes sense to use machine learning techniques to automatically acquire the appropriate default concept based on existing "solved" ca=-=ses; cf., [PBH90]-=-. Unfortunately, the task of learning default concept definitions has received relatively little attention, especially when compared to the vast literature on the subject of learning to classify compl... |

105 |
Unknown attribute values in induction
- Quinlan
- 1989
(Show Context)
Citation Context ...te value. 6 Given the benign assumption that L's guesses for a description �� x are conditionally independent of the training labels of domain objects �� x that do not match �� x . thv (Th=-=ree-valued) [Qui89] For des-=-cription �� x , predict the most frequent classification among training examples of the form h��x ; ci. thv clearly does not make the most effective use of the available training data, given t... |

91 |
Probabilistic semantics for nonmonotonic reasoning: A survey,” to appear
- Pearl
- 1989
(Show Context)
Citation Context ... 2 d oe ) hx 1 : : :s: : : x n i !c 2 d. Theorem 1 Under fi I , d is inheritance consistent () d is satisfiable by some domain distribution PXC . Existing default logics based on ffl-semantics (e.g., =-=[Pea89]) all satisfy -=-the consistent inheritance axiom and so tacitly assume independent blocking fi I . Here the meaning of a rule �� x !c can be given a "majority" semantics under fi I akin to that of [Bac9... |

85 |
Nonmonotonic reasoning
- Reiter
- 1987
(Show Context)
Citation Context ...ormalisms designed to classify partial object descriptions. Default concept definitions (dcd s) are a natural generalization of ccds, which avoid this limitation by using defaultsclassification rules =-=[Rei87]-=-. These classifiers play an important role in many expert systems [Cla85, PBH90]. Of course these dcds must somehow be acquired for such applications. As it is often quite difficult to explicitly extr... |

43 | From statistics to beliefs - Bacchus, Grove, et al. - 1992 |

43 | The reference class - Kyburg - 1983 |

33 |
Learning complicated concepts reliably and usefully
- Rivest, Sloan
- 1988
(Show Context)
Citation Context ...hat default definitions categorically classify every description, no matter how incomplete. An interesting direction is to consider partial default definitions that sometimes say "I don't know&qu=-=ot; `a la [RS88]-=-. Such classifiers could prove useful in domains where the consequences of an incorrect classification sometimes outweigh those of remaining silent. Another interesting extension is to consider active... |

33 |
Learning k-DNF with noise in the attributes
- Shackelford, Volper
- 1988
(Show Context)
Citation Context ...ssible confusions, it is worth explicitly distinguishing our "missing attribute" framework from two other models of learning from the learnability community: A system that learns with attrib=-=ute noise [SV88]-=- does not know which attribute values have been corrupted; by contrast, we know explicitly which values are missing. Also, a probabilistic concept [KS90] is a mappingsc i : Xn 7! [0; 1] from the space... |

29 | Hierarchical knowledge bases and efficient disjunctive reasoning, in
- Borgida, Etherington
- 1989
(Show Context)
Citation Context ...troduces only a restricted form of ambiguity: fi may produce descriptions corresponding to disjunctions like 0 j 00s01, but cannot produce a description corresponding to 01s10 (this is reminiscent of =-=[BE89]) --- i.e., it canno-=-t express the claim that an object is "either a non-green plant or a green non-plant". This will restrict the type of "reference classes" we must consider when learning dcds; see F... |

20 | Objectives probabilities
- Kyburg
- 1987
(Show Context)
Citation Context ...loy a conflict resolution strategy (which tradesoff interval bias and width) to decide whether to adopt, for this �� x , the classification associated with successively more general reference clas=-=ses [Kyb91]-=-. 7 Although the ref strategy can override the predictions from specific descriptions with those from more general descriptions, it is not clear that it does so in the best conceivable way. The strate... |

2 |
Knowing what doesn't matter
- Greiner, Hancock, et al.
- 1994
(Show Context)
Citation Context ...sing attributes are irrelevantsto the classification, given the known attribute values [PBH90]. Notice that fi I is overly restrictive and fi A is too underconstrained to adequately model such tasks; =-=[GHR94]-=- provides an initial analysis of this situation. We are currently investigating other intermediate blocking models that can more accurately model such domains and (we hope) lead to better empirical le... |

1 |
Reliable Machine Learning
- Efficient
(Show Context)
Citation Context ... true classificationsc 2 f0; 1g. The space of possible examples is denoted X n \Theta f0; 1g. 1 Unfortunately, space constraints preclude presenting proofs of the results stated in this abstract; see =-=[Sch94]. A clas-=-sical concept definition (ccd) is a subset of Xn , which we represent by its indicator function c : Xn ! f0; 1g; thus c(��x) = 1 iff �� x belongs to the concept. A default concept definition (... |