Results 1  10
of
325
Learnability in Optimality Theory
, 1995
"... In this article we show how Optimality Theory yields a highly general Constraint Demotion principle for grammar learning. The resulting learning procedure specifically exploits the grammatical structure of Optimality Theory, independent of the content of substantive constraints defining any given gr ..."
Abstract

Cited by 447 (33 self)
 Add to MetaCart
(Show Context)
In this article we show how Optimality Theory yields a highly general Constraint Demotion principle for grammar learning. The resulting learning procedure specifically exploits the grammatical structure of Optimality Theory, independent of the content of substantive constraints defining any given grammatical module. We decompose the learning problem and present formal results for a central subproblem, deducing the constraint ranking particular to a target language, given structural descriptions of positive examples. The structure imposed on the space of possible grammars by Optimality Theory allows efficient convergence to a correct grammar. We discuss implications for learning from overt data only, as well as other learning issues. We argue that Optimality Theory promotes confluence of the demands of more effective learnability and deeper linguistic explanation.
Discovering Models of Software Processes from EventBased Data
 ACM Transactions on Software Engineering and Methodology
, 1998
"... this article we describe a Markov method that we developed specifically for process discovery, as well as describe two additional methods that we adopted from other domains and augmented for our purposes. The three methods range from the purely algorithmic to the purely statistical. We compare the m ..."
Abstract

Cited by 254 (7 self)
 Add to MetaCart
this article we describe a Markov method that we developed specifically for process discovery, as well as describe two additional methods that we adopted from other domains and augmented for our purposes. The three methods range from the purely algorithmic to the purely statistical. We compare the methods and discuss their application in an industrial case study.
Computational Limitations on Learning from Examples
 Journal of the ACM
, 1988
"... Abstract. The computational complexity of learning Boolean concepts from examples is investigated. It is shown for various classes of concept representations that these cannot be learned feasibly in a distributionfree sense unless R = NP. These classes include (a) disjunctions of two monomials, (b) ..."
Abstract

Cited by 192 (10 self)
 Add to MetaCart
(Show Context)
Abstract. The computational complexity of learning Boolean concepts from examples is investigated. It is shown for various classes of concept representations that these cannot be learned feasibly in a distributionfree sense unless R = NP. These classes include (a) disjunctions of two monomials, (b) Boolean threshold functions, and (c) Boolean formulas in which each variable occurs at most once. Relationships between learning of heuristics and finding approximate solutions to NPhard optimization problems are given. Categories and Subject Descriptors: F. 1.1 [Computation by Abstract Devices]: Models of Computationrelations among models; F. 1.2 [Computation by Abstract Devices]: Modes of Computationprobabilistic computation; F. 1.3 [Computation by Abstract Devices]: Complexity Classesreducibility and completeness; 1.2.6 [Artificial Intelligence]: Learningconcept learning; induction
PartofSpeech Tagging and Partial Parsing
 CorpusBased Methods in Language and Speech
, 1996
"... m we can carve o# next. `Partial parsing' is a cover term for a range of di#erent techniques for recovering some but not all of the information contained in a traditional syntactic analysis. Partial parsing techniques, like tagging techniques, aim for reliability and robustness in the face of t ..."
Abstract

Cited by 102 (0 self)
 Add to MetaCart
m we can carve o# next. `Partial parsing' is a cover term for a range of di#erent techniques for recovering some but not all of the information contained in a traditional syntactic analysis. Partial parsing techniques, like tagging techniques, aim for reliability and robustness in the face of the vagaries of natural text, by sacrificing completeness of analysis and accepting a low but nonzero error rate. 1 Tagging The earliest taggers [35, 51] had large sets of handconstructed rules for assigning tags on the basis of words' character patterns and on the basis of the tags assigned to preceding or following words, but they had only small lexica, primarily for exceptions to the rules. TAGGIT [35] was used to generate an initial tagging of the Brown corpus, which was then handedited. (Thus it provided the data that has since been used to train other taggers [20].) The tagger described by Garside [56, 34], CLAWS, was a probabilistic version of TAGGIT, and the DeRose tagger improved on
Learning phonotactic distributions
 Constraints in Phonological Acquisition
, 2004
"... All languages have distributional regularities: patterns which restrict what sounds can appear where, including nowhere, as determined by local syntagmatic factors independent of any particular morphemic alternations. Early generative phonology tended to slight the study of distributional relations ..."
Abstract

Cited by 75 (17 self)
 Add to MetaCart
All languages have distributional regularities: patterns which restrict what sounds can appear where, including nowhere, as determined by local syntagmatic factors independent of any particular morphemic alternations. Early generative phonology tended to slight the study of distributional relations in favor of morphophonemics, perhaps because wordrelatedness phonology was thought to be more productive of theoretical depth, reliably leading the analyst beyond the merely observable. But over the last few decades it has become clear that much morphophonemics can be understood as accommodation to phonotactic requirements, e.g.
A Guided Tour Across the Boundaries of Learning Recursive Languages
 Lecture Notes in Artificial Intelligence
, 1994
"... The present paper deals with the learnability of indexed families of uniformly recursive languages from positive data as well as from both, positive and negative data. We consider the influence of various monotonicity constraints to the learning process, and provide a thorough study concerning the i ..."
Abstract

Cited by 57 (29 self)
 Add to MetaCart
(Show Context)
The present paper deals with the learnability of indexed families of uniformly recursive languages from positive data as well as from both, positive and negative data. We consider the influence of various monotonicity constraints to the learning process, and provide a thorough study concerning the influence of several parameters. In particular, we present examples pointing to typical problems and solutions in the field. Then we provide a unifying framework for learning. Furthermore, we survey results concerning learnability in dependence on the hypothesis space, and concerning order independence. Moreover, new results dealing with the efficiency of learning are provided. First, we investigate the power of iterative learning algorithms. The second measure of efficiency studied is the number of mind changes a learning algorithm is allowed to perform. In this setting we consider the problem whether or not the monotonicity constraints introduced do influence the efficiency of learning algo...
The Power of Vacillation in Language Learning
, 1992
"... Some extensions are considered of Gold's influential model of language learning by machine from positive data. Studied are criteria of successful learning featuring convergence in the limit to vacillation between several alternative correct grammars. The main theorem of this paper is that there ..."
Abstract

Cited by 48 (11 self)
 Add to MetaCart
(Show Context)
Some extensions are considered of Gold's influential model of language learning by machine from positive data. Studied are criteria of successful learning featuring convergence in the limit to vacillation between several alternative correct grammars. The main theorem of this paper is that there are classes of languages that can be learned if convergence in the limit to up to (n+1) exactly correct grammars is allowed but which cannot be learned if convergence in the limit is to no more than n grammars, where the no more than n grammars can each make finitely many mistakes. This contrasts sharply with results of Barzdin and Podnieks and, later, Case and Smith, for learnability from both positive and negative data. A subset principle from a 1980 paper of Angluin is extended to the vacillatory and other criteria of this paper. This principle, provides a necessary condition for circumventing overgeneralization in learning from positive data. It is applied to prove another theorem to the eff...
PAC Learning from Positive Statistical Queries
 Proc. 9th International Conference on Algorithmic Learning Theory  ALT ’98
, 1998
"... . Learning from positive examples occurs very frequently in natural learning. The PAC learning model of Valiant takes many features of natural learning into account, but in most cases it fails to describe such kind of learning. We show that in order to make the learning from positive data possible, ..."
Abstract

Cited by 43 (3 self)
 Add to MetaCart
(Show Context)
. Learning from positive examples occurs very frequently in natural learning. The PAC learning model of Valiant takes many features of natural learning into account, but in most cases it fails to describe such kind of learning. We show that in order to make the learning from positive data possible, extrainformation about the underlying distribution must be provided to the learner. We define a PAC learning model from positive and unlabeled examples. We also define a PAC learning model from positive and unlabeled statistical queries. Relations with PAC model ([Val84]), statistical query model ([Kea93]) and constantpartition classification noise model ([Dec97]) are studied. We show that kDNF and kdecision lists are learnable in both models, i.e. with far less information than it is assumed in previously used algorithms. 1 Introduction The PAC learning model of Valiant ([Val84]) has become the reference model in computational learning theory. However, in spite of the importance of lea...
Incremental concept learning for bounded data mining
 Information and Computation
, 1999
"... Important re nements of concept learning in the limit from positive data considerably restricting the accessibility of input data are studied. Let c be any concept; every in nite sequence of elements exhausting c is called positive presentation of c. In all learning models considered the learning ma ..."
Abstract

Cited by 40 (30 self)
 Add to MetaCart
(Show Context)
Important re nements of concept learning in the limit from positive data considerably restricting the accessibility of input data are studied. Let c be any concept; every in nite sequence of elements exhausting c is called positive presentation of c. In all learning models considered the learning machine computes a sequence of hypotheses about the target concept from a positive presentation of it. With iterative learning, the learning machine, in making a conjecture, has access to its previous conjecture and the latest data item coming in. In kbounded examplememory inference (k is a priori xed) the learner is allowed to access, in making a conjecture, its previous hypothesis, its memory of up to k data items it has already seen, and the next element coming in. In the case of kfeedback identi cation, the learning machine, in making a conjecture, has access to its previous conjecture, the latest data item coming in, and, on the basis of this information, it can compute k items and query the database of previous data to nd out, for each of the k items, whether or not it is in the database (k is again a priori xed). In all cases, the sequence of conjectures has to converge to a hypothesis