Results 1 - 10
of
97
An O(ND) Difference Algorithm and Its Variations
- Algorithmica
, 1986
"... The problems of finding a longest common subsequence of two sequences A and B and a shortest edit script for transforming A into B have long been known to be dual problems. In this paper, they are shown to be equivalent to finding a shortest/longest path in an edit graph. Using this perspective, a s ..."
Abstract
-
Cited by 133 (4 self)
- Add to MetaCart
The problems of finding a longest common subsequence of two sequences A and B and a shortest edit script for transforming A into B have long been known to be dual problems. In this paper, they are shown to be equivalent to finding a shortest/longest path in an edit graph. Using this perspective, a simple O(ND) time and space algorithm is developed where N is the sum of the lengths of A and B and D is the size of the minimum edit script for A and B. The algorithm performs well when differences are small (sequences are similar) and is consequently fast in typical applications. The algorithm is shown to have O(N +D expected-time performance under a basic stochastic model. A refinement of the algorithm requires only O(N) space, and the use of suffix trees leads to an O(NlgN +D ) time variation.
Fault Localization with Nearest Neighbor Queries
, 2003
"... We present a method for performing fault localization using similar program spectra. Our method assumes the existence of a faulty run and a larger number of correct runs. It then selects according to a distance criterion the correct run that most resembles the faulty run, compares the spectra corres ..."
Abstract
-
Cited by 101 (1 self)
- Add to MetaCart
We present a method for performing fault localization using similar program spectra. Our method assumes the existence of a faulty run and a larger number of correct runs. It then selects according to a distance criterion the correct run that most resembles the faulty run, compares the spectra corresponding to these two runs, and produces a report of "suspicious" parts of the program. Our method is widely applicable because it does not require any knowledge of the program input and no more information from the user than a classification of the runs as either "correct" or "faulty". To experimentally validate the viability of the method, we implemented it in a tool, WHITHER using basic block profiling spectra. We experimented with two different similarity measures and the Siemens suite of 132 programs with injected bugs. To measure the success of the tool, we developed a generic method for establishing the quality of a report. The method is based on the way an "ideal user" would navigate the program using the report to save effort during debugging. The best results we obtained were, on average, above 50%, meaning that our ideal user would avoid looking at half of the program.
The Cartesian Product Algorithm - Simple and Precise Type Inference of Parametric Polymorphism
, 1995
"... Concrete types and abstract types are different and serve different purposes. Concrete types, the focus of this paper, are essential to support compilation, application delivery, and debugging in object-oriented environments. Concrete types should not be obtained from explicit type declarations beca ..."
Abstract
-
Cited by 96 (3 self)
- Add to MetaCart
Concrete types and abstract types are different and serve different purposes. Concrete types, the focus of this paper, are essential to support compilation, application delivery, and debugging in object-oriented environments. Concrete types should not be obtained from explicit type declarations because their presence limits polymorphism unacceptably. This leaves us with type inference. Unfortunately, while polymorphism demands the use of type inference, it has also been the hardest challenge for type inference. We review previous type inference algorithms that analyze code with parametric polymorphism and then present a new one: the cartesian product algorithm. It improves precision and efficiency over previous algorithms and deals directly with inheritance, rather than relying on a preprocessor to expand it away. Last, but not least, it is conceptually simple. The cartesian product algorithm has been used in the Self system since late 1993. We present measurements to document its pe...
Bitext Maps and Alignment via Pattern Recognition
- Computational Linguistics
, 1999
"... This article advances the state of the art ofbitext mapping by formulating the problem in terms of pattern recognition. From this point of view, the success of a bitext mapping algorithm hinges on how well it performs three tasks: signal generation, noise filtering, and search. The Smooth Injective ..."
Abstract
-
Cited by 68 (0 self)
- Add to MetaCart
This article advances the state of the art ofbitext mapping by formulating the problem in terms of pattern recognition. From this point of view, the success of a bitext mapping algorithm hinges on how well it performs three tasks: signal generation, noise filtering, and search. The Smooth Injective Map Recognizer (SIMR) algorithm presented here integrates innovative approaches to each of these tasks. Objective evaluation has shown that SIMR's accuracy is consistently high for language pairs as diverse as French/English and Korean/English. If necessary, S IMR's bitext maps can be efficiently converted into segment alignments using the Geometric Segment Alignment (GSA) algorithm, which is also presented here. SIMR has produced bitext maps for over 200 megabytes of French-English bitexts. GSA has converted these maps into alignments. Both the maps and the alignments are available from the Linguistic Data Consortium) 1.
Parameterized Pattern Matching: Algorithms and Applications
, 1994
"... The problem of finding sections of code that either are identical or are related by the systematic renaming of variables or constants can be modeled in terms of parameterized strings (p-strings) and parameterized matches (p- matches) [Baker93a]. P-strings are strings over two alphabets, one of whic ..."
Abstract
-
Cited by 57 (5 self)
- Add to MetaCart
The problem of finding sections of code that either are identical or are related by the systematic renaming of variables or constants can be modeled in terms of parameterized strings (p-strings) and parameterized matches (p- matches) [Baker93a]. P-strings are strings over two alphabets, one of which represents parameters. Two p-strings are a parameterized match (p-match) if one pstring is obtained by renaming the parameters of the other by a one-to-one function. In this paper, we investigate parameterized pattern matching via parameterized suffix trees (p-suffix trees), defined in [Baker93a]. We give two algorithms for constructing p-suffix trees: one (eager) that runs in linear time for fixed alphabets, and another that uses auxiliary data structures and runs in O(nlog (n)) time for variable alphabets, where n is input length. We show that using a psuffix tree for a pattern p-string P, it is possible to search for all p-matches of P within a text p-string T in space linear in ï P ï...
Concrete Type Inference: Delivering Object-Oriented Applications
, 1995
"... Unlimited copying without fee is permitted provided that the copies are not made nor distributed for direct commercial advantage, and credit to the source is given. Otherwise, no part of this work covered by copyright hereon may be reproduced in any form or by any means graphic, electronic, or mecha ..."
Abstract
-
Cited by 49 (0 self)
- Add to MetaCart
Unlimited copying without fee is permitted provided that the copies are not made nor distributed for direct commercial advantage, and credit to the source is given. Otherwise, no part of this work covered by copyright hereon may be reproduced in any form or by any means graphic, electronic, or mechanical, including photocopying, recording, taping, or storage in an information retrieval system, without the prior written permission of the copyright owner. TRADEMARKS Sun, Sun Microsystems, and the Sun logo are trademarks or registered trademarks of Sun Microsystems, Inc. UNIX is a registered trademark in the United States and other countries, exclusively licensed through X/Open Company, Ltd. All SPARC trademarks, including the SCD Compliant Logo, are trademarks or registered trademarks of SPARC International, Inc. SPARCstation, SPARCserver, SPARCengine, SPARCworks, and SPARCompiler are licensed exclusively to Sun Microsystems, Inc. All other product names mentioned herein are the trademarks of their respective owners.
Delta Algorithms: An Empirical Analysis
, 1998
"... Delta algorithms compress data by encoding one file in terms of another. This type of compression is useful in a number of situations: storing multiple versions of data, displaying differences, merging changes, distributing updates, storing backups, transmitting video sequences, and others. This pap ..."
Abstract
-
Cited by 45 (3 self)
- Add to MetaCart
Delta algorithms compress data by encoding one file in terms of another. This type of compression is useful in a number of situations: storing multiple versions of data, displaying differences, merging changes, distributing updates, storing backups, transmitting video sequences, and others. This paper studies the performance parameters of several delta algorithms, using a benchmark of over 1300 pairs of files taken from two successive releases of GNU software. Results indicate that modern delta compression algorithms based on Ziv-Lempel techniques significantly outperform diff, a popular but older delta compressor, in terms of compression ratio. The modern compressors also correlate better with the actual difference between files without sacrificing performance.
Efficient Snapshot Differential Algorithms for Data Warehousing
- In Proceedings of the International Conference on Very Large Data Bases
, 1996
"... Detecting and extracting modifications from information sources is an integral part of data warehousing. For unsophisticated sources, in practice it is often necessary to infer modifications by periodically comparing snapshots of data from the source. Although this sapshot di/rem tial problem is ..."
Abstract
-
Cited by 39 (7 self)
- Add to MetaCart
Detecting and extracting modifications from information sources is an integral part of data warehousing. For unsophisticated sources, in practice it is often necessary to infer modifications by periodically comparing snapshots of data from the source. Although this sapshot di/rem tial problem is closely related to traditional joins and outerjoins, there are significant differences, which lead to simple new algorithms. In particular, we present algorithms that perform (possibly lossy) compression of records. We also present a window algorithm that works very well if the snapshots are not "very different." The algorithms are studied via analysis and an implementation of two of them; the results illustrate the potential gains achievable with the new algorithms.
Recognizing Acronyms and their Definitions
- ISRI (Information Science Research Institute) UNLV
, 1999
"... Abstract This paper introduces an automatic method for finding acronyms and their definitions in free text. The method is based on an inexact pattern matching algorithm applied to text surrounding the possible acronym. Evaluation shows both high recall and precision for a set of documents randomly s ..."
Abstract
-
Cited by 35 (0 self)
- Add to MetaCart
Abstract This paper introduces an automatic method for finding acronyms and their definitions in free text. The method is based on an inexact pattern matching algorithm applied to text surrounding the possible acronym. Evaluation shows both high recall and precision for a set of documents randomly selected from a larger set of full text documents. \Lambda
Representing Concerns in Source Code
, 2003
"... Many program evolution tasks involve source code that is not modularized as a single unit. Furthermore, the source code relevant to a change task often implements different concerns, or high-level concepts that a developer must consider. Finding and understanding concerns scattered in source code is ..."
Abstract
-
Cited by 33 (6 self)
- Add to MetaCart
Many program evolution tasks involve source code that is not modularized as a single unit. Furthermore, the source code relevant to a change task often implements different concerns, or high-level concepts that a developer must consider. Finding and understanding concerns scattered in source code is a difficult task that accounts for a large proportion of the effort of performing program evolution. One possibility to mitigate this problem is to produce textual documentation that describes scattered concerns. However, this approach is impractical because it is costly, and because, as a program evolves, the documentation becomes inconsistent with the source code. The thesis of this dissertation is that a description of concerns, representing program structures and linked to source code, that can be produced cost-effectively during program investigation activities, can help developers perform software evolution tasks more systematically, and on different versions of a system. To validate the claims of this thesis, we have developed a model for a structure, called concern graph, that describes concerns in source code in terms of relations between program elements. The model also defines precisely the notion of inconsistency between a concern graph and the

