Results 1 
8 of
8
Strong computational lower bounds via parameterized complexity
, 2006
"... We develop new techniques for deriving strong computational lower bounds for a class of wellknown NPhard problems. This class includes weighted satisfiability, dominating set, hitting set, set cover, clique, and independent set. For example, although a trivial enumeration can easily test in time O ..."
Abstract

Cited by 17 (2 self)
 Add to MetaCart
We develop new techniques for deriving strong computational lower bounds for a class of wellknown NPhard problems. This class includes weighted satisfiability, dominating set, hitting set, set cover, clique, and independent set. For example, although a trivial enumeration can easily test in time O(n k) if a given graph of n vertices has a clique of size k, we prove that unless an unlikely collapse occurs in parameterized complexity theory, the problem is not solvable in time f(k)n o(k) for any function f, even if we restrict the parameter values to be bounded by an arbitrarily small function of n. Under the same assumption, we prove that even if we restrict the parameter values k to be of the order Θ(µ(n)) for any reasonable function µ, no algorithm of running time n o(k) can test if a graph of n vertices has a clique of size k. Similar strong lower bounds on the computational complexity are also derived for other NPhard problems in the above class. Our techniques can be further extended to derive computational lower bounds on polynomial time approximation schemes for NPhard optimization problems. For example, we prove that the NPhard distinguishing substring selection problem, for which a polynomial time approximation scheme has been recently developed, has no polynomial time approximation schemes of running time f(1/ɛ)n o(1/ɛ) for any function f unless an unlikely collapse occurs in parameterized complexity theory.
More Efficient Algorithms for Closest String and Substring Problems
"... Abstract. The closest string and substring problems find applications in PCR primer design, genetic probe design, motif finding, and antisense drug design. For their importance, the two problems have been extensively studied recently in computational biology. Unfortunately both problems are NPcompl ..."
Abstract

Cited by 12 (2 self)
 Add to MetaCart
Abstract. The closest string and substring problems find applications in PCR primer design, genetic probe design, motif finding, and antisense drug design. For their importance, the two problems have been extensively studied recently in computational biology. Unfortunately both problems are NPcomplete. Researchers have developed both fixedparameter algorithms and approximation algorithms for the two problems. In terms of fixedparameter, when the radius d is the parameter, the bestknown fixedparameter algorithm for closest string has time complexity O(nd d+1), which is still superpolynomial even if d = O(log n). In this paper we provide an O nΣ  O(d) algorithm where Σ is the alphabet. This gives a polynomial time algorithm when d = O(log n) and Σ has constant size. Using the same technique, we additionally provide a more efficient subexponential time algorithm for the closest substring problem. In terms of approximation, both closest string and closest substring problems admit polynomial time approximation schemes (PTAS). The best known time complexity of the PTAS is O(n O(ɛ−2 log 1 ɛ)). In this paper we present a PTAS with time complexity O(n O(ɛ−2)). At last, we prove that a restricted version of the closest substring has the same parameterized complexity as closest substring, answering an open question in the literature. 1
On the parameterized intractability of motif search problems
 Combinatorica
, 2006
"... We show that Closest Substring, one of the most important problems in the field of biological sequence analysis, is W[1]hard when parameterized by the number k of input strings (and remains so, even over a binary alphabet). This problem is therefore unlikely to be solvable in time O(f(k) · n c) fo ..."
Abstract

Cited by 11 (4 self)
 Add to MetaCart
We show that Closest Substring, one of the most important problems in the field of biological sequence analysis, is W[1]hard when parameterized by the number k of input strings (and remains so, even over a binary alphabet). This problem is therefore unlikely to be solvable in time O(f(k) · n c) for any function f of k and constant c independent of k. The problem can therefore be expected to be intractable, in any practical sense, for k ≥ 3. Our result supports the intuition that Closest Substring is computationally much harder than the special case of Closest String, although both problems are NPcomplete. We also prove W[1]hardness for other parameterizations in the case of unbounded alphabet size. Our W[1]hardness result for Closest Substring generalizes to Consensus Patterns, a problem of similar significance in computational biology. 1
Parameterized intractability of distinguishing substring selection
 Theory of Computing Systems
, 2006
"... A central question in computational biology is the design of genetic markers to distinguish between two given sets of (DNA) sequences. This question is formalized as the NPcomplete Distinguishing Substring Selection problem (DSSS for short) which asks, given a set of “good” strings and a set of “ba ..."
Abstract

Cited by 5 (4 self)
 Add to MetaCart
A central question in computational biology is the design of genetic markers to distinguish between two given sets of (DNA) sequences. This question is formalized as the NPcomplete Distinguishing Substring Selection problem (DSSS for short) which asks, given a set of “good” strings and a set of “bad ” strings, for a solution string which is, with respect to Hamming metric, “away ” from the good strings and “close ” to the bad strings. More precisely, given integers dg, db, and L, we ask for a lengthL string s such that s has Hamming distance at least dg to every lengthL substring of the good strings and such that every bad string has a lengthL substring with Hamming distance at most db to s. Studying the parameterized complexity of DSSS, we show that, already for binary alphabet, DSSS is W[1]hard with respect to its natural parameters. This, in particular, implies that a recently given polynomialtime approximation scheme (PTAS) by Deng et al. [6, 7] cannot be replaced by a
Fast exact algorithms for the closest string and substring problems with application to the planted (l, d)motif model, in
 IEEE/ACM Transactions on Computational Biology and Bioinformatics
"... Abstract—We present two parameterized algorithms for the closest string problem. The first runs in OðnL þ nd 17:97dÞ time for DNA strings and in OðnL þ nd 61:86dÞ time for protein strings, where n is the number of input strings, L is the length of each input string, and d is the given upper bound on ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
Abstract—We present two parameterized algorithms for the closest string problem. The first runs in OðnL þ nd 17:97dÞ time for DNA strings and in OðnL þ nd 61:86dÞ time for protein strings, where n is the number of input strings, L is the length of each input string, and d is the given upper bound on the number of mismatches between the center string and each input string. The second runs in OðnL þ nd 13:92dÞ time for DNA strings and in OðnL þ nd 47:21dÞ time for protein strings. We then extend the first algorithm to a new parameterized algorithm for the closest substring problem that runs in Oððn 1Þm2ðL þ d 17:97d mblog2ðdþ1Þc 2 d ÞÞ time for DNA strings and in Oððn 1Þm ðL þ d 61:86 m blog 2ðdþ1Þc ÞÞ time for protein strings, where n is the number of input strings, L is the length of the center substring, L 1 þ m is the maximum length of a single input string, and d is the given upper bound on the number of mismatches between the center substring and at least one substring of each input string. All the algorithms significantly improve the previous bests. To verify experimentally the theoretical improvements in the time complexity, we implement our algorithm in C and apply the resulting program to the planted ðL; dÞmotif problem proposed by Pevzner and Sze in 2000. We compare our program with the previously best exact program for the problem, namely PMSPrune (designed by Davila et al. in 2007). Our experimental data show that our program runs faster for practical cases and also for several challenging cases. Our algorithm uses less memory too. Index Terms—Parameterized algorithm, closest string, closest substring, DNA motif discovery. 1
RESEARCH Open Access CLOSEST STRING WITH OUTLIERS
"... Background: Given n strings s1, …, sn each of length ℓ and a nonnegative integer d, the CLOSEST STRING problem asks to find a center string s such that none of the input strings has Hamming distance greater than d from s. Finding a common pattern in many – but not necessarily all – input strings is ..."
Abstract
 Add to MetaCart
Background: Given n strings s1, …, sn each of length ℓ and a nonnegative integer d, the CLOSEST STRING problem asks to find a center string s such that none of the input strings has Hamming distance greater than d from s. Finding a common pattern in many – but not necessarily all – input strings is an important task that plays a role in many applications in bioinformatics. Results: Although the closest string model is robust to the oversampling of strings in the input, it is severely affected by the existence of outliers. We propose a refined model, the CLOSEST STRING WITH OUTLIERS (CSWO) problem, to overcome this limitation. This new model asks for a center string s that is within Hamming distance d to at least n – k of the n input strings, where k is a parameter describing the maximum number of outliers. A CSWO solution not only provides the center string as a representative for the set of strings but also reveals the outliers of the set. We provide fixed parameter algorithms for CSWO when d and k are parameters, for both bounded and unbounded alphabets. We also show that when the alphabet is unbounded the problem is W[1]hard with respect to n – k, ℓ, and d. Conclusions: Our refined model abstractly models finding common patterns in several but not all input strings.