Results 1  10
of
14
A polyhedral approach to sequence alignment problems
 DISCRETE APPL. MATH
, 2000
"... We study two new problems in sequence alignment both from a practical and a theoretical view, using tools from combinatorial optimization to develop branchandcut algorithms. The Generalized Maximum Trace formulation captures several forms of multiple sequence alignment problems in a common framewo ..."
Abstract

Cited by 20 (1 self)
 Add to MetaCart
We study two new problems in sequence alignment both from a practical and a theoretical view, using tools from combinatorial optimization to develop branchandcut algorithms. The Generalized Maximum Trace formulation captures several forms of multiple sequence alignment problems in a common framework, among them the original formulation of Maximum Trace. The RNA Sequence Alignment Problem captures the comparison of RNA molecules on the basis of their primary sequence and their secondary structure. Both problems have a characterization in terms of graphs which we reformulate in terms of integer linear programming. We then study the polytopes (or convex hulls of all feasible solutions) associated with the integer linear program for both problems. For each polytope we derive several classes of facetdefining inequalities and show that for some of these classes the corresponding separation problem can be solved in polynomial time. This leads to a polynomial time algorithm for pairwise sequence alignment that is not based on dynamic programming. Moreover, for multiple sequences the branchandcut algorithms for both sequence alignment problems are able to solve to optimality instances that are beyond the range of present dynamic programming approaches.
Memorybounded A* graph search
 In Proc. 15th International Flairs Conference
, 2002
"... We describe a framework for reducing the space complexity of graph search algorithms such as A* that use Open and Closed lists to keep track of the frontier and interior nodes of the search space. We propose a sparse representation of the Closed list in which only a fraction of already expanded node ..."
Abstract

Cited by 19 (4 self)
 Add to MetaCart
We describe a framework for reducing the space complexity of graph search algorithms such as A* that use Open and Closed lists to keep track of the frontier and interior nodes of the search space. We propose a sparse representation of the Closed list in which only a fraction of already expanded nodes need to be stored to perform the two functions of the Closed List preventing duplicate search effort and allowing solution extraction. Our proposal is related to earlier work on search algorithms that do not use a Closed list at all [Korf and Zhang, 2000]. However, the approach we describe has several advantages that make it effective for a wider variety of problems. 1
Sweep A*: Spaceefficient heuristic search in partially ordered graphs
 In Proceedings of the 15th IEEE International Conference on Tools with Artificial Intelligence
, 2003
"... We describe a novel heuristic search algorithm, called Sweep A*, that exploits the regular structure of partially ordered graphs to substantially reduce the memory requirements of search. We show that it outperforms previous search algorithms in optimally aligning multiple protein or DNA sequences, ..."
Abstract

Cited by 17 (4 self)
 Add to MetaCart
We describe a novel heuristic search algorithm, called Sweep A*, that exploits the regular structure of partially ordered graphs to substantially reduce the memory requirements of search. We show that it outperforms previous search algorithms in optimally aligning multiple protein or DNA sequences, an important problem in bioinformatics. Sweep A * also promises to be effective for other search problems with similar structure. 1.
MemoryEfficient A* Heuristics for Multiple Sequence Alignment
 IN PROCEEDINGS OF THE 18TH NATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE (AAAI02
, 2002
"... The time and space needs of an A* search are strongly influenced by the quality of the heuristic evaluation function. Usually there is a ..."
Abstract

Cited by 15 (1 self)
 Add to MetaCart
The time and space needs of an A* search are strongly influenced by the quality of the heuristic evaluation function. Usually there is a
Multiple sequence alignment with arbitrary gap costs: Computing an optimal solution using polyhedral combinatorics
, 2002
"... ..."
Protein Multiple Sequence Alignment
"... Protein sequence alignment is the task of identifying evolutionarily or structurally related positions in a collection of amino acid sequences. Although the protein alignment problem has been studied for several decades, many recent studies have demonstrated considerable progress in improving the ac ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
Protein sequence alignment is the task of identifying evolutionarily or structurally related positions in a collection of amino acid sequences. Although the protein alignment problem has been studied for several decades, many recent studies have demonstrated considerable progress in improving the accuracy or scalability of multiple and pairwise alignment tools, or in expanding the scope of tasks handled by an alignment program. In this chapter, we review stateoftheart protein sequence alignment and provide practical advice for users of alignment tools.
Kgroup A* for multiple sequence alignment with quasinatural gap costs
 In Proceedings of the 16th IEEE International Conference on Tools with Artificial Intelligence (ICTAI04
, 2004
"... Alignment of multiple protein or DNA sequences is an important problem in Bioinformatics. Previous work has shown that the A * search algorithm can find optimal alignments for up to several sequences, and that a Kgroup generalization of A * can find approximate alignments for much larger numbers of ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
Alignment of multiple protein or DNA sequences is an important problem in Bioinformatics. Previous work has shown that the A * search algorithm can find optimal alignments for up to several sequences, and that a Kgroup generalization of A * can find approximate alignments for much larger numbers of sequences [6]. In this paper, we describe the first implementation of Kgroup A * that uses quasinatural gap costs, the cost model used in practice by biologists. We also introduce a new method for computing gapopening costs in profile alignment. Our results show that Kgroup A * can efficiently find optimal or closetooptimal alignments for small groups of sequences, and, for large numbers of sequences, it can find higherquality alignments than the widelyused CLUSTAL family of approximate alignment tools. This demonstrates the benefits of A* in aligning large numbers of sequences, as typically compared by biologists, and suggests that Kgroup A * could become a practical tool for multiple sequence alignment. 1.
A BranchandCut Algorithm for Multiple Sequence Alignment
"... Abstract. We consider a branchandcut approach for solving the multiple sequence alignment problem, which is a central problem in computational biology. We propose a general model for this problem in which arbitrary gap costs are allowed. An interesting aspect of our approach is that the three (exp ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Abstract. We consider a branchandcut approach for solving the multiple sequence alignment problem, which is a central problem in computational biology. We propose a general model for this problem in which arbitrary gap costs are allowed. An interesting aspect of our approach is that the three (exponentially large) classes of natural valid inequalities that we consider turn out to be both facetdefining for the convex hull of integer solutions and separable in polynomial time. Both the proofs that these classes of valid inequalities are facetdefining and the description of the separation algorithms are far from trivial. Experimental results on several benchmark instances show that our method outperforms the best tools developed so far, in that it produces alignments that are better from a biological point of view. A noteworthy outcome of the results is the effectiveness of using branchandcut with only a carefullyselected subset of the variables as a heuristic. 1.
Bioinformatics
, 2003
"... Selection of significant genes via expression patterns is an important problem in microarray experiments. Owing to small sample size and the large number of variables (genes), the selection process can be unstable. This paper proposes a hierarchical Bayesian model for gene (variable) selection. We e ..."
Abstract
 Add to MetaCart
Selection of significant genes via expression patterns is an important problem in microarray experiments. Owing to small sample size and the large number of variables (genes), the selection process can be unstable. This paper proposes a hierarchical Bayesian model for gene (variable) selection. We employ latent variables to specialize the model to a regression setting and uses a Bayesian mixture prior to perform the variable selection. We control the size of the model by assigning a prior distribution over the dimension (number of significant genes) of the model. The posterior distributions of the parameters are not in explicit form and we need to use a combination of truncated sampling and Markov Chain Monte Carlo (MCMC) based computation techniques to simulate the parameters from the posteriors. The Bayesian model is flexible enough to identify significant genes as well as to perform future predictions. The method is applied to cancer classification via cDNA microarrays where the genes BRCA1 and BRCA2 are associated with a hereditary disposition to breast cancer, and the method is used to identify a set of significant genes. The method is also applied successfully to the leukemia data.