Results 1  10
of
113
Introduction to Algorithms, second edition
 BOOK
, 2001
"... This part will get you started in thinking about designing and analyzing algorithms.
It is intended to be a gentle introduction to how we specify algorithms, some of the
design strategies we will use throughout this book, and many of the fundamental
ideas used in algorithm analysis. Later parts of t ..."
Abstract

Cited by 707 (3 self)
 Add to MetaCart
This part will get you started in thinking about designing and analyzing algorithms.
It is intended to be a gentle introduction to how we specify algorithms, some of the
design strategies we will use throughout this book, and many of the fundamental
ideas used in algorithm analysis. Later parts of this book will build upon this base.
Chapter 1 is an overview of algorithms and their place in modern computing
systems. This chapter defines what an algorithm is and lists some examples. It also
makes a case that algorithms are a technology, just as are fast hardware, graphical
user interfaces, objectoriented systems, and networks.
In Chapter 2, we see our first algorithms, which solve the problem of sorting
a sequence of n numbers. They are written in a pseudocode which, although not
directly translatable to any conventional programming language, conveys the structure
of the algorithm clearly enough that a competent programmer can implement
it in the language of his choice. The sorting algorithms we examine are insertion
sort, which uses an incremental approach, and merge sort, which uses a recursive
technique known as “divide and conquer.” Although the time each requires increases
with the value of n, the rate of increase differs between the two algorithms.
We determine these running times in Chapter 2, and we develop a useful notation
to express them.
Chapter 3 precisely defines this notation, which we call asymptotic notation. It
starts by defining several asymptotic notations, which we use for bounding algorithm
running times from above and/or below. The rest of Chapter 3 is primarily a
presentation of mathematical notation. Its purpose is more to ensure that your use
of notation matches that in this book than to teach you new mathematical concepts.
Invitation to FixedParameter Algorithms
, 2002
"... Contents 1. Foundations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 Keep the Parameter Fixed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Preliminaries and Agreements . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..."
Abstract

Cited by 293 (73 self)
 Add to MetaCart
Contents 1. Foundations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 Keep the Parameter Fixed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Preliminaries and Agreements . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.3 Parameterized Complexitya Brief Overview . . . . . . . . . . . . . . 6 1.3.1 Basic Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.3.2 Interpreting FixedParameter Tractability . . . . . . . . . . . 9 1.4 Vertex Cover  an Illustrative Example . . . . . . . . . . . . . . . . . 11 1.4.1 Parameterize . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 1.4.2 Specialize . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 1.4.3 Generalize . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 1.4.4 Count or Enumerate . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Alignment of whole genomes
 Nucleic Acids Res
, 1999
"... A new system for aligning whole genome sequences is described. Using an efficient data structure called a suffix tree, the system is able to rapidly align sequences containing millions of nucleotides. Its use is demonstrated on two strains of Mycobacterium tuberculosis, on two less similar species o ..."
Abstract

Cited by 147 (5 self)
 Add to MetaCart
A new system for aligning whole genome sequences is described. Using an efficient data structure called a suffix tree, the system is able to rapidly align sequences containing millions of nucleotides. Its use is demonstrated on two strains of Mycobacterium tuberculosis, on two less similar species of Mycoplasma bacteria and on two syntenic sequences from human chromosome 12 and mouse chromosome 6. In each case it found an alignment of the input sequences, using between 30 s and 2 min of computation time. From the system output, information on single nucleotide changes, translocations and homologous genes can easily be extracted. Use of the algorithm should facilitate analysis of syntenic chromosomal regions, straintostrain comparisons, evolutionary comparisons and genomic duplications.
Dynamic Programming Algorithms for Haplotype Block Partitioning: Applications to Human Chromosome 21 Haplotype Data
 Proc. Natl. Acad. Sci. USA
, 2003
"... Recent studies have shown that the human genome has a haplotype block structure such that it can be divided into discrete blocks of limited haplotype diversity. Patil et al. [6] and Zhang et al. [12] developed algorithms to partition haplotypes into blocks with minimum number of tag SNPs for the ent ..."
Abstract

Cited by 91 (6 self)
 Add to MetaCart
Recent studies have shown that the human genome has a haplotype block structure such that it can be divided into discrete blocks of limited haplotype diversity. Patil et al. [6] and Zhang et al. [12] developed algorithms to partition haplotypes into blocks with minimum number of tag SNPs for the entire chromosome. However, it is not clear how to partition haplotypes into blocks with restricted number of SNPs when only limited resources are available. In this paper, we first formulated this problem as finding a block partition with a fixed number of tag SNPs that can cover the maximal percentage of a genome. Then we solved it by two dynamic programming algorithms, which are fairly flexible to take into account the knowledge of functional polymorphism. We applied our algorithms to the published SNP data of human chromosome 21 combining with the functional information of these SNPs and demonstrated the effectiveness of them. Statistical investigation of the relationship between the starting points of a block partition and the coding and noncoding regions illuminated that the SNPs at these starting points are not significantly enriched in coding regions. We also developed an efficient algorithm to find all possible long local maximal haplotypes across a subset of samples. After applying this algorithm to the human chromosome 21 haplotype data, we found that samples with long local haplotypes are not necessarily globally similar.
Motif Statistics
, 1999
"... We present a complete analysis of the statistics of number of occurrences of a regular expression pattern in a random text. This covers "motifs" widely used in computational biology. Our approach is based on: (i) a constructive approach to classical results in theoretical computer science (automata ..."
Abstract

Cited by 48 (4 self)
 Add to MetaCart
We present a complete analysis of the statistics of number of occurrences of a regular expression pattern in a random text. This covers "motifs" widely used in computational biology. Our approach is based on: (i) a constructive approach to classical results in theoretical computer science (automata and formal language theory), in particular, the rationality of generating functions of regular languages; (ii) analytic combinatorics that is used for deriving asymptotic properties from generating functions; (iii) computer algebra for determining generating functions explicitly, analysing generating functions and extracting coefficients efficiently. We provide constructions for overlapping or nonoverlapping matches of a regular expression. A companion implementation produces multivariate generating functions for the statistics under study. A fast computation of Taylor coefficients of the generating functions then yields exact values of the moments with typical application to random t...
Indexing and Retrieval for Genomic Databases
 IEEE Transactions on Knowledge and Data Engineering
, 2002
"... Genomic sequence databases are widely used by molecular biologists for homology searching. Aminoacid and nucleotide databases are increasing in size exponentially, and mean sequence lengths are also increasing. In searching such databases, it is desirable to use heuristics to perform computationall ..."
Abstract

Cited by 45 (6 self)
 Add to MetaCart
Genomic sequence databases are widely used by molecular biologists for homology searching. Aminoacid and nucleotide databases are increasing in size exponentially, and mean sequence lengths are also increasing. In searching such databases, it is desirable to use heuristics to perform computationally intensive local alignments on selected sequences only and to reduce the costs of the alignments that are attempted. We present an indexbased approach for both selecting sequences that display broad similarity to a query and for fast local alignment. We show experimentally that the indexed approach results in signi cant savings in computationally intensive local alignments, and that indexbased searching is as accurate as existing exhaustive search schemes.
Accurate formula for pvalues of gapped local sequence and profile alignments
 J. Mol. Biol
, 2000
"... A simple general approximation for the distribution of gapped local alignment scores is presented, suitable for assessing significance of comparisons between two protein sequences or a sequence and a profile. The approximation takes account of the scoring scheme (ie gap penalty and substitution matr ..."
Abstract

Cited by 37 (1 self)
 Add to MetaCart
A simple general approximation for the distribution of gapped local alignment scores is presented, suitable for assessing significance of comparisons between two protein sequences or a sequence and a profile. The approximation takes account of the scoring scheme (ie gap penalty and substitution matrix or profile), sequence composition and length. Use of this formula means it is unnecessary to fit an extremevalue distribution to simulations or to the results of databank searches. The method is based on the theoretical ideas introduced in (Mott & Tribe, 1999). Extensive simulation studies show that scorethresholds produced by the method are accurate to within ±5 % 95 % of the time. We also investigate factors which affect the accuracy of alignment statistics, and show that any method based on asymptotic theory is limited because asymptotic behaviour is not strictly achieved for many real protein sequences, due to extreme composition effects. Consequently it may not be practicable to find a general formula that is significantly more accurate until the subasymptotic behaviour of alignments is better understood.
An optimal decomposition algorithm for tree edit distance
 In Proceedings of the 34th International Colloquium on Automata, Languages and Programming (ICALP
, 2007
"... Abstract. The edit distance between two ordered rooted trees with vertex labels is the minimum cost of transforming one tree into the other by a sequence of elementary operations consisting of deleting and relabeling existing nodes, as well as inserting new nodes. In this paper, we present a worstc ..."
Abstract

Cited by 35 (2 self)
 Add to MetaCart
Abstract. The edit distance between two ordered rooted trees with vertex labels is the minimum cost of transforming one tree into the other by a sequence of elementary operations consisting of deleting and relabeling existing nodes, as well as inserting new nodes. In this paper, we present a worstcase O(n 3)time algorithm for this problem, improving the previous best O(n 3 log n)time algorithm [9]. Our result requires a novel adaptive strategy for deciding how a dynamic program divides into subproblems, together with a deeper understanding of the previous algorithms for the problem. We prove the optimality of our algorithm among the family of decomposition strategy algorithms—which also includes the previous fastest algorithms—by tightening the known lower bound of Ω(n 2 log 2 n) [6] to Ω(n 3), matching our algorithm’s running time. Furthermore, we obtain matching upper and lower bounds of)) when the two trees have sizes m and n where m < n. Θ(nm 2 (1 + log n m
A novel stringtostring distance measure with applications to machine translation evaluation
 MT Summit IX
, 2003
"... We introduce a stringtostring distance measure which extends the edit distance by block transpositions as constant cost edit operation. An algorithm for the calculation of this distance measure in polynomial time is presented. We then demonstrate how this distance measure can be used as an evaluat ..."
Abstract

Cited by 33 (4 self)
 Add to MetaCart
We introduce a stringtostring distance measure which extends the edit distance by block transpositions as constant cost edit operation. An algorithm for the calculation of this distance measure in polynomial time is presented. We then demonstrate how this distance measure can be used as an evaluation criterion in machine translation. The correlation between this evaluation criterion and human judgment is systematically compared with that of other automatic evaluation measures on two translation tasks. In general, like other automatic evaluation measures, the criterion shows low correlation at sentence level, but good correlation at system level. 1
Rotation of Periodic Strings and Short Superstrings
, 1996
"... This paper presents two simple approximation algorithms for the shortest superstring problem, with approximation ratios 2 2 3 ( 2:67) and 2 25 42 ( 2:596), improving the best previously published 2 3 4 approximation. The framework of our improved algorithms is similar to that of previous a ..."
Abstract

Cited by 26 (0 self)
 Add to MetaCart
This paper presents two simple approximation algorithms for the shortest superstring problem, with approximation ratios 2 2 3 ( 2:67) and 2 25 42 ( 2:596), improving the best previously published 2 3 4 approximation. The framework of our improved algorithms is similar to that of previous algorithms in the sense that they construct a superstring by computing some optimal cycle covers on the distance graph of the given strings, and then break and merge the cycles to finally obtain a Hamiltonian path, but we make use of new bounds on the overlap between two strings. We prove that for each periodic semiinfinite string ff = a1a2 \Delta \Delta \Delta of period q, there exists an integer k, such that for any (finite) string s of period p which is inequivalent to ff, the overlap between s and the rotation ff[k] = ak ak+1 \Delta \Delta \Delta is at most p+ 1 2 q. Moreover, if p q, then the overlap between s and ff[k] is not larger than 2 3 (p+q). In the previous shortes...