Results 1  10
of
21
Approaches to the Automatic Discovery of Patterns in Biosequences
, 1995
"... This paper is a survey of approaches and algorithms used for the automatic discovery of patterns in biosequences. Patterns with the expressive power in the class of regular languages are considered, and a classification of pattern languages in this class is developed, covering those patterns which a ..."
Abstract

Cited by 173 (21 self)
 Add to MetaCart
This paper is a survey of approaches and algorithms used for the automatic discovery of patterns in biosequences. Patterns with the expressive power in the class of regular languages are considered, and a classification of pattern languages in this class is developed, covering those patterns which are the most frequently used in molecular bioinformatics. A formulation is given of the problem of the automatic discovery of such patterns from a set of sequences, and an analysis presented of the ways in which an assessment can be made of the significance and usefulness of the discovered patterns. It is shown that this problem is related to problems studied in the field of machine learning. The largest part of this paper comprises a review of a number of existing methods developed to solve this problem and how these relate to each other, focusing on the algorithms underlying the approaches. A comparison is given of the algorithms, and examples are given of patterns that have been discovered...
Structure Comparison and Structure Patterns
 JOURNAL OF COMPUTATIONAL BIOLOGY
, 1999
"... This article investigate different aspects regarding pairwise and multiple structure comparison, and the problem of automatically discover common patterns in a set of structures. Descriptions and representation of structures and patterns are investigated, as well as scoring and algorithms for com ..."
Abstract

Cited by 102 (2 self)
 Add to MetaCart
This article investigate different aspects regarding pairwise and multiple structure comparison, and the problem of automatically discover common patterns in a set of structures. Descriptions and representation of structures and patterns are investigated, as well as scoring and algorithms for comparison and discovery. A framework and nomenclature is developed, and a lot of methods are reviewed and placed into this framework.
Incremental concept learning for bounded data mining
 Information and Computation
, 1999
"... Important re nements of concept learning in the limit from positive data considerably restricting the accessibility of input data are studied. Let c be any concept; every in nite sequence of elements exhausting c is called positive presentation of c. In all learning models considered the learning ma ..."
Abstract

Cited by 42 (32 self)
 Add to MetaCart
(Show Context)
Important re nements of concept learning in the limit from positive data considerably restricting the accessibility of input data are studied. Let c be any concept; every in nite sequence of elements exhausting c is called positive presentation of c. In all learning models considered the learning machine computes a sequence of hypotheses about the target concept from a positive presentation of it. With iterative learning, the learning machine, in making a conjecture, has access to its previous conjecture and the latest data item coming in. In kbounded examplememory inference (k is a priori xed) the learner is allowed to access, in making a conjecture, its previous hypothesis, its memory of up to k data items it has already seen, and the next element coming in. In the case of kfeedback identi cation, the learning machine, in making a conjecture, has access to its previous conjecture, the latest data item coming in, and, on the basis of this information, it can compute k items and query the database of previous data to nd out, for each of the k items, whether or not it is in the database (k is again a priori xed). In all cases, the sequence of conjectures has to converge to a hypothesis
Learning OneVariable Pattern Languages Very Efficiently on Average, in Parallel, and by Asking Queries
, 1997
"... A pattern is a finite string of constant and variable symbols. The language generated by a pattern is the set of all strings of constant symbols which can be obtained from the pattern by substituting nonempty strings for variables. We study the learnability of onevariable pattern languages in the ..."
Abstract

Cited by 20 (9 self)
 Add to MetaCart
(Show Context)
A pattern is a finite string of constant and variable symbols. The language generated by a pattern is the set of all strings of constant symbols which can be obtained from the pattern by substituting nonempty strings for variables. We study the learnability of onevariable pattern languages in the limit with respect to the update time needed for computing a new single hypothesis and the expected total learning time taken until convergence to a correct hypothesis. Our results are as follows. First, we design a consistent and setdriven learner that, using the concept of descriptive patterns, achieves update time O(n 2 log n), where n is the size of the input sample. The best previously known algorithm for computing descriptive onevariable patterns requires time O(n 4 log n) (cf. Angluin [2]). Second, we give a parallel version of this algorithm that requires time O(log n) and O(n 3 = log n) processors on an EREWPRAM. Third, using a modified version of the sequential algorithm a...
A discontinuity in pattern inference
 In Proceedings of the 21st Symposium on Theoretical Aspects of Computer Science, STACS 2004
, 2004
"... A discontinuity in pattern inference This item was submitted to Loughborough University’s Institutional Repository by the/an author. ..."
Abstract

Cited by 19 (11 self)
 Add to MetaCart
(Show Context)
A discontinuity in pattern inference This item was submitted to Loughborough University’s Institutional Repository by the/an author.
Learning twig and path queries
 In ICDT
, 2012
"... We investigate the problem of learning XML queries, path queries and twig queries, from examples given by the user. A learning algorithm takes on the input a set of XML documents with nodes annotated by the user and returns a query that selects the nodes in a manner consistent with the annotation. ..."
Abstract

Cited by 15 (5 self)
 Add to MetaCart
We investigate the problem of learning XML queries, path queries and twig queries, from examples given by the user. A learning algorithm takes on the input a set of XML documents with nodes annotated by the user and returns a query that selects the nodes in a manner consistent with the annotation. We study two learning settings that differ with the types of annotations. In the first setting the user may only indicate required nodes that the query must select (i.e., positive examples). In the second, more general, setting, the user may also indicate forbidden nodes that the query must not select (i.e., negative examples). The query may or may not select any node with no annotation. We formalize what it means for a class of queries to be learnable. One requirement is the existence of a learning algorithm that is sound i.e., always returning a query consistent with the examples given by the user. Furthermore, the learning algorithm should be complete i.e., able to produce every query with sufficiently rich examples. Other requirements involve tractability of the learning algorithm and its robustness to nonessential examples. We identify practical classes of Boolean and unary, path and twig queries that are learnable from positive examples. We also show that adding negative examples to the picture renders learning unfeasible.
Error Estimation and Model Selection
, 1999
"... Machine learning algorithms search a space of possible hypotheses and estimate the error of each hypotheses using a sample. Most often, the goal of classification tasks is to find a hypothesis with a low true (or generalization) misclassification probability (or error rate); however, only the sample ..."
Abstract

Cited by 9 (1 self)
 Add to MetaCart
Machine learning algorithms search a space of possible hypotheses and estimate the error of each hypotheses using a sample. Most often, the goal of classification tasks is to find a hypothesis with a low true (or generalization) misclassification probability (or error rate); however, only the sample (or empirical) error rate can actually be measured and minimized. The true error rate of the returned hypothesis is unknown but can, for instance, be estimated using cross validation, and very general worstcase bounds can be given. This doctoral dissertation addresses a compound of questions on error assessment and the intimately related selection of a "good" hypothesis language, or learning algorithm, for a given problem. In the first
Synthesizing Learners Tolerating Computable Noisy Data
 In Proc. 9th International Workshop on Algorithmic Learning Theory, Lecture
, 1998
"... An index for an r.e. class of languages (by definition) generates a sequence of grammars defining the class. An index for an indexed family of languages (by definition) generates a sequence of decision procedures defining the family. F. Stephan's model of noisy data is employed, in which, rough ..."
Abstract

Cited by 6 (0 self)
 Add to MetaCart
An index for an r.e. class of languages (by definition) generates a sequence of grammars defining the class. An index for an indexed family of languages (by definition) generates a sequence of decision procedures defining the family. F. Stephan's model of noisy data is employed, in which, roughly, correct data crops up infinitely often, and incorrect data only finitely often. In a completely computable universe, all data sequences, even noisy ones, are computable. New to the present paper is the restriction that noisy data sequences be, nonetheless, computable! Studied, then, is the synthesis from indices for r.e. classes and for indexed families of languages of various kinds of noisetolerant languagelearners for the corresponding classes or families indexed, where the noisy input data sequences are restricted to being computable. Many positive results, as well as some negative results, are presented regarding the existence of such synthesizers. The main positive result is surpris...
Methods for Finding Motifs in Sets of Related Biosequences.
, 1996
"... this paper is as follows. This introduction is followed in Section 2 by a brief introduction to some problems in machine learning, and especially to some approaches to learning from strings. In Section 3 we discuss the problem of discovering biopatterns, on the background provided in Section 2. The ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
(Show Context)
this paper is as follows. This introduction is followed in Section 2 by a brief introduction to some problems in machine learning, and especially to some approaches to learning from strings. In Section 3 we discuss the problem of discovering biopatterns, on the background provided in Section 2. The major part of the thesis is a portfolio of research papers of the author, with colleges. Section 4 gives an overview of the work presented in these papers. In Section 5 we give suggestions for further work, and in Section 6 a conclusion is given. The Appendix contains the collection of research papers. Methods for finding motifs in sets of related biosequences 3 Application Domain Theory
Efficient Learning of OneVariable Pattern Languages from Positive Data
, 1996
"... A pattern is a finite string of constant and variable symbols. The language generated by a pattern is the set of all strings of constant symbols which can be obtained from the pattern by substituting nonempty strings for variables. Descriptive patterns are a key concept for inductive inference o ..."
Abstract

Cited by 3 (3 self)
 Add to MetaCart
A pattern is a finite string of constant and variable symbols. The language generated by a pattern is the set of all strings of constant symbols which can be obtained from the pattern by substituting nonempty strings for variables. Descriptive patterns are a key concept for inductive inference of pattern languages. A pattern is descriptive for a given sample if the sample is contained in the language L() generated by and no other pattern having this property generates a proper subset of the language L(). The best previously known algorithm for computing descriptive onevariable patterns requires time O(n log n), where n is the size of the sample. We present a simpler and more efficient algorithm solving the same problem in time O(n log n). In addition, we give a parallel version of this algorithm that requires time O(log n) and O(n = log n) processors on an EREWPRAM. Previously, no parallel algorithm was known for this problem. Using a