Results 11 - 20
of
97
Pythia: A regression test selection tool based on textual differencing
, 1997
"... Regression testing is a commonly used activity whose purpose is to determine whether the modifications made to a software system have introduced new faults. For many large, complex, software systems the retest all strategy is not practical: the resources required to reexecute and verify all availabl ..."
Abstract
-
Cited by 32 (0 self)
- Add to MetaCart
Regression testing is a commonly used activity whose purpose is to determine whether the modifications made to a software system have introduced new faults. For many large, complex, software systems the retest all strategy is not practical: the resources required to reexecute and verify all available test cases (i.e., time and human effort) are prohibitive. Ad hoc methods are not desirable, as they can compromise the reliability of the regression test activity and consequently the reliability of the software system being tested. In this paper we present a new technique for selecting regression test cases based on the modifications that have been made on the program. The technique, which is based on the idea of directly comparing source files from the old and the new version of the program, has been implemented in a tool called Pythia. A novel characteristic of Pythia, which is capable of analyzing large software systems written in C, is that it has been implemented primarily through th...
An Empirical Study of Delta Algorithms
, 1996
"... . Delta algorithms compress data by encoding one file in terms of another. This type of compression is useful in a number of situations: storing multiple versions of data, distributing updates, storing backups, transmitting video sequences, and others. This paper studies the performance parameters o ..."
Abstract
-
Cited by 29 (9 self)
- Add to MetaCart
. Delta algorithms compress data by encoding one file in terms of another. This type of compression is useful in a number of situations: storing multiple versions of data, distributing updates, storing backups, transmitting video sequences, and others. This paper studies the performance parameters of several delta algorithms, using a benchmark of over 1300 pairs of files taken from two successive releases of GNU software. Results indicate that modern delta compression algorithms based on Ziv-Lempel techniques significantly outperform diff, a popular but older delta compressor, in terms of compression ratio. The modern compressors also correlate better with the actual difference between files; one of them is even faster than diff in both compression and decompression speed. 1 Introduction Delta algorithms, i.e., algorithms that compute differences between two files or strings, have a number of uses when multiple versions of data objects must be stored, transmitted, or proce...
Word-Pair Extraction for Lexicography
, 1996
"... . We describe an application of sentence alignment techniques and approximate string matching to the problem of extracting lexicographically interesting wordword pairs from multilingual corpora. Since our interest is in support systems for lexicographers rather than in fully automatic constructio ..."
Abstract
-
Cited by 29 (3 self)
- Add to MetaCart
. We describe an application of sentence alignment techniques and approximate string matching to the problem of extracting lexicographically interesting wordword pairs from multilingual corpora. Since our interest is in support systems for lexicographers rather than in fully automatic construction of lexicons, we would like to provide access to parameters allowing a tunable trade-off between precision and recall. We evaluate two techniques for doing this. Since sentence alignment tends to associate semantically similar words, approximate string matching draws attention to orthographic similarities, they can be used to serve different lexicographic purposes, as can the combination of the two techniques, which amounts, inter alia, to a tool for uncovering faux amis. We conclude by sketching a simple and flexible means for allowing lexicographers to provide information which has the potential to improve system performance. 1 Introduction One of the central challenges of comput...
Longest Common Subsequences
- In Proc. of 19th MFCS, number 841 in LNCS
, 1994
"... . The length of a longest common subsequence (LLCS) of two or more strings is a useful measure of their similarity. The LLCS of a pair of strings is related to the `edit distance', or number of mutations /errors/editing steps required in passing from one string to the other. In this talk, we explore ..."
Abstract
-
Cited by 25 (1 self)
- Add to MetaCart
. The length of a longest common subsequence (LLCS) of two or more strings is a useful measure of their similarity. The LLCS of a pair of strings is related to the `edit distance', or number of mutations /errors/editing steps required in passing from one string to the other. In this talk, we explore some of the combinatorial properties of the suband super-sequence relations, survey various algorithms for computing the LLCS, and introduce some results on the expected LLCS for pairs of random strings. 1 Introduction The set \Sigma of finite strings over an unordered finite alphabet \Sigma admits of several natural partial orders. Some, such as the substring, prefix, and suffix relations, depend on contiguity and lead to many interesting combinatorial questions with practical applications to string-matching. An excellent survey is given by Aho in [1]. In this talk however we will focus on the `subsequence' partial order. We say that u = u 1 \Delta \Delta \Delta um is a subsequence of ...
Fast Evaluation of Sequence Pair in Block Placement by Longest Common Subsequence Computation
, 2000
"... of block placement called sequence pair. All block placement algorithms which are based on sequence pairs use simulated annealing where the generation and evaluation of a large number of sequence pairs is required. Therefore, a fast algorithm is needed to evaluate each generated sequence pair, i.e. ..."
Abstract
-
Cited by 24 (2 self)
- Add to MetaCart
of block placement called sequence pair. All block placement algorithms which are based on sequence pairs use simulated annealing where the generation and evaluation of a large number of sequence pairs is required. Therefore, a fast algorithm is needed to evaluate each generated sequence pair, i.e. to translate the sequence pair to its corresponding block placement. This paper presents a new approach to evaluate a sequence pair based on computing longest common subsequence in a pair of weighted sequences. We present a very simple and problem. We also show that using a more sophisticated in [1]. For example, we achieve 60X speedup over the previous algorithm when input size n # ###.
Enumerating Longest Increasing Subsequences and Patience Sorting
, 2000
"... In this paper we present three algorithms that solve three combinatorial optimization problems related to each other. One of them is the patience sorting game, invented as a practical method of sorting real decks of cards. The second problem is computing the longest monotone increasing subsequenc ..."
Abstract
-
Cited by 22 (0 self)
- Add to MetaCart
In this paper we present three algorithms that solve three combinatorial optimization problems related to each other. One of them is the patience sorting game, invented as a practical method of sorting real decks of cards. The second problem is computing the longest monotone increasing subsequence of the given sequence of n positive integers in the range 1; : : : ; n. The third problem is to enumerate all the longest monotone increasing subsequences of the given permutation.
Expected Length of Longest Common Subsequences
"... Contents 1 Introduction 1 2 Notation and preliminaries 4 2.1 Notation and basic definitions : : : : : : : : : : : : : : : : : : 4 2.2 Longest common subsequences : : : : : : : : : : : : : : : : : : 7 2.3 Computing longest common subsequences : : : : : : : : : : : 10 2.4 Expected length of longest c ..."
Abstract
-
Cited by 17 (2 self)
- Add to MetaCart
Contents 1 Introduction 1 2 Notation and preliminaries 4 2.1 Notation and basic definitions : : : : : : : : : : : : : : : : : : 4 2.2 Longest common subsequences : : : : : : : : : : : : : : : : : : 7 2.3 Computing longest common subsequences : : : : : : : : : : : 10 2.4 Expected length of longest common subsequences : : : : : : : 14 3 Lower Bounds 20 3.1 Css machines : : : : : : : : : : : : : : : : : : : : : : : : : : : 20 3.2 Analysis of css machines : : : : : : : : : : : : : : : : : : : : : 26 3.3 Design of css machines : : : : : : : : : : : : : : : : : : : : : : 31 3.4 Labeled css machines : : : : : : : : : : : : : : : : : : : : : : : 38 4 Upper bounds 45 4.1 Collations : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 45 4.2 Previous upper bounds : : : : : : : : : : : : : : : : : : : : : : 51 4.3 Simple upper bound (binary alphabet) : : : : : : : : : : : : : 55 4.4 Simple upper bound (alphabet size 3) : : : : : : : : : : : : : : 59 4.5 Upper bounds for binary alphabet : :
Practical Language-Independent Detection of Near-Miss Clones
- IN PROCEEDINGS OF THE 14TH IBM CENTRE FOR ADVANCED STUDIES CONFERENCE (CASCON’04
, 2004
"... Previous research shows that most software systems contain significant amounts of duplicated, or cloned, code. Some clones are exact duplicates of each other, while others differ in small details only. We designate these almost-perfect clones as "near-miss" clones. While technically difficult, detec ..."
Abstract
-
Cited by 14 (6 self)
- Add to MetaCart
Previous research shows that most software systems contain significant amounts of duplicated, or cloned, code. Some clones are exact duplicates of each other, while others differ in small details only. We designate these almost-perfect clones as "near-miss" clones. While technically difficult, detection of near-miss clones has many benefits, both academic and practical. Finding these clones can give us better insight into the way developers maintain and reuse code, and we can also parameterize and remove near-miss clones to reduce overall source code size and decrease system complexity. This paper presents a simple, general and practical way to detect near-miss clones, and summarizes the results of its application to two production websites. We use standard lexical comparison tools coupled with language-specific extractors to locate potential clones. Our approach separates code comparisons from code understanding, and makes the comparisons language independent. This makes it easy to adapt to different programming languages.
Fixed-Parameter Tractability Results for Feedback Set Problems in Tournaments
- JOURNAL OF DISCRETE ALGORITHMS
, 2009
"... Complementing recent progress on classical complexity and polynomial-time approximability of feedback set problems in (bipartite) tournaments, we extend and improve fixed-parameter tractability results for these problems. We show that Feedback Vertex Set in tournaments (FVST) is amenable to the nove ..."
Abstract
-
Cited by 14 (5 self)
- Add to MetaCart
Complementing recent progress on classical complexity and polynomial-time approximability of feedback set problems in (bipartite) tournaments, we extend and improve fixed-parameter tractability results for these problems. We show that Feedback Vertex Set in tournaments (FVST) is amenable to the novel iterative compression technique, and we provide a depth-bounded search tree for Feedback Arc Set in bipartite tournaments based on a new forbidden subgraph characterization. Moreover, we apply the iterative compression technique to d-Hitting Set, which generalizes Feedback Vertex Set in tournaments, and obtain improved upper bounds for the time needed to solve 4-Hitting Set and 5-Hitting Set. Using our parameterized algorithm for Feedback Vertex Set in tournaments, we also give an exact (not parameterized) algorithm for it running in O(1.709 n) time, where n is the number of input graph vertices, answering a question of Woeginger [Discrete Appl. Math. 156(3):397–405, 2008].
A Scaleable Technique For Best-Match Retrieval Of Sequential Information Using Metrics-Guided Search
- Journal of Information Science
, 1994
"... A new technique is described for retrieving information by finding the best match or matches between a textual `query' and a textual database. The technique uses principles of beam search with a measure of probability to guide the search and prune the search tree. Unlike many methods for comparing s ..."
Abstract
-
Cited by 13 (12 self)
- Add to MetaCart
A new technique is described for retrieving information by finding the best match or matches between a textual `query' and a textual database. The technique uses principles of beam search with a measure of probability to guide the search and prune the search tree. Unlike many methods for comparing strings, the method gives a set of alternative matches, graded by the `quality' of the matching achieved. For any one sequence of hits between a query and a database, the probability measure is an estimate of the probability that the observed configuration, or better, could have occurred by chance. This probability is an inverse measure of the redundancy between the query and the database. The new technique is embodied in a software simulation called SP21 which runs on a conventional computer. Examples are presented showing best-match retrieval of information from a textual database. Analytic and empirical evidence is presented showing that, in a serial processing environment, the search tech...

