• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

An algorithm for differential file comparison. Computer Science (1975)

by J W Hunt, M D McIlroy
Add To MetaCart

Tools

Sorted by:
Results 11 - 20 of 62
Next 10 →

An Empirical Study of Delta Algorithms

by James Hunt, Kiem-phong Vo, Walter F. Tichy , 1996
"... . Delta algorithms compress data by encoding one file in terms of another. This type of compression is useful in a number of situations: storing multiple versions of data, distributing updates, storing backups, transmitting video sequences, and others. This paper studies the performance parameters o ..."
Abstract - Cited by 29 (9 self) - Add to MetaCart
. Delta algorithms compress data by encoding one file in terms of another. This type of compression is useful in a number of situations: storing multiple versions of data, distributing updates, storing backups, transmitting video sequences, and others. This paper studies the performance parameters of several delta algorithms, using a benchmark of over 1300 pairs of files taken from two successive releases of GNU software. Results indicate that modern delta compression algorithms based on Ziv-Lempel techniques significantly outperform diff, a popular but older delta compressor, in terms of compression ratio. The modern compressors also correlate better with the actual difference between files; one of them is even faster than diff in both compression and decompression speed. 1 Introduction Delta algorithms, i.e., algorithms that compute differences between two files or strings, have a number of uses when multiple versions of data objects must be stored, transmitted, or proce...

Modeling history to analyze software evolution

by Tudor Gîrba, Stéphane Ducasse - INTERNATIONAL JOURNAL ON SOFTWARE MAINTENANCE: RESEARCH AND PRACTICE (JSME) , 2006
"... The histories of software systems hold useful information when reasoning about the systems at hand or when reasoning about general laws of software evolution. Over the past 30 years more and more research has been spent on understanding software evolution. However, the approaches developed so far do ..."
Abstract - Cited by 29 (14 self) - Add to MetaCart
The histories of software systems hold useful information when reasoning about the systems at hand or when reasoning about general laws of software evolution. Over the past 30 years more and more research has been spent on understanding software evolution. However, the approaches developed so far do not rely on an explicit metamodel, and thus, they make it difficult to reuse or compare their results. We argue that there is a need for an explicit meta-model for software evolution analysis. We present a survey of the evolution analyses and deduce a set of requirements that an evolution meta-model should have. We define, Hismo, a meta-model in which history is modeled as an explicit entity. Hismo adds a time layer on top of structural information, and provides a common infrastructure for expressing and combining evolution analyses and structural analyses. We validate the usefulness of our a meta-model by presenting how different analyses are expressed on it. key words: Software evolution, meta-modeling, history, reverse engineering, evolution analysis.

Tracking and Viewing Changes on the Web

by Fred Douglis , Thomas Ball , 1996
"... We describe a set of tools that detect when WorldWide -Web pages have been modified and present the modifications visually to the user through markedup HTML. The tools consist of three components: w3newer, which detects changes to pages; snapshot, which permits a user to store a copy of an arbitrary ..."
Abstract - Cited by 22 (5 self) - Add to MetaCart
We describe a set of tools that detect when WorldWide -Web pages have been modified and present the modifications visually to the user through markedup HTML. The tools consist of three components: w3newer, which detects changes to pages; snapshot, which permits a user to store a copy of an arbitrary Web page and to compare any subsequent version of a page with the saved version; and HtmlDiff, which marks up HTML text to indicate how it has changed from a previous version. We refer to the tools collectively as the AT&T Internet Difference Engine (AIDE). This paper discusses several aspects of AIDE, with an emphasis on systems issues such as scalability, security, and error conditions.

Processing Software Source Text in Automated Design Recovery and Transformation

by Andrew Malton, Kevin A. Schneider, James R. Cordy, Thomas R. Dean, Darren Cousineau, Jason Reynolds - In Proc. International Workshop on Program Comprehension (IWPC’01 , 2001
"... Software source text is the raw material of program understanding and transformation systems. In order to share the results of source analyses, both between phases of a design recovery process, and between tools and systems in different processes, a source text interchange format is needed. This pap ..."
Abstract - Cited by 18 (0 self) - Add to MetaCart
Software source text is the raw material of program understanding and transformation systems. In order to share the results of source analyses, both between phases of a design recovery process, and between tools and systems in different processes, a source text interchange format is needed. This paper describes a simple technique, ‘source factoring’, by which a common structural decomposition of source text can address the many issues of preprocessing, macro processing, lexical analysis, design recovery, and automated transformation. Above all, source factorization allows the results of design analysis to be attached to source, and the results of source transformation to be reinstalled cleanly into the code base. This view of source text underlies the architecture of a successful software maintenance system which has processed billions of lines of legacy code in all major programming languages.

A framework for asynchronous change awareness in collaborative documents and workspaces

by James Tam, Saul Greenberg - Int. J. Hum. Comput. Stud
"... Abstract. Change awareness is the ability of individuals to track the asynchronous changes made to a collaborative document or surface by other participants over time. We develop a framework that articulates what change awareness information is critical if people are to track and maintain change awa ..."
Abstract - Cited by 16 (0 self) - Add to MetaCart
Abstract. Change awareness is the ability of individuals to track the asynchronous changes made to a collaborative document or surface by other participants over time. We develop a framework that articulates what change awareness information is critical if people are to track and maintain change awareness. Information elements include: knowing who changed the artifact, what those changes involve, where changes occur, when changes were made, how things have changed, and why people made the changes. The framework also accounts for people’s need to view these changes from different perspectives: an artifactbased view, a person-based view, and a workspace-based view. 1

Practical Language-Independent Detection of Near-Miss Clones

by James R. Cordy, Thomas R. Dean, Nikita Synytskyy - IN PROCEEDINGS OF THE 14TH IBM CENTRE FOR ADVANCED STUDIES CONFERENCE (CASCON’04 , 2004
"... Previous research shows that most software systems contain significant amounts of duplicated, or cloned, code. Some clones are exact duplicates of each other, while others differ in small details only. We designate these almost-perfect clones as "near-miss" clones. While technically difficult, detec ..."
Abstract - Cited by 14 (6 self) - Add to MetaCart
Previous research shows that most software systems contain significant amounts of duplicated, or cloned, code. Some clones are exact duplicates of each other, while others differ in small details only. We designate these almost-perfect clones as "near-miss" clones. While technically difficult, detection of near-miss clones has many benefits, both academic and practical. Finding these clones can give us better insight into the way developers maintain and reuse code, and we can also parameterize and remove near-miss clones to reduce overall source code size and decrease system complexity. This paper presents a simple, general and practical way to detect near-miss clones, and summarizes the results of its application to two production websites. We use standard lexical comparison tools coupled with language-specific extractors to locate potential clones. Our approach separates code comparisons from code understanding, and makes the comparisons language independent. This makes it easy to adapt to different programming languages.

A Parallel Wavefront Algorithm for Efficient Biological Sequence Comparison

by C. E. R. Alves, E.N. Cáceres, F. Dehne, S.W. Song, São Paulo - In The 2003 International Conference on Computational Science and its Applications , 2003
"... In this paper we present a parallel wavefront algorithm for computing an alignment between two strings A and C, with |A| = m, and |C| = n. On a distributed memory parallel computer of p processors each with O((m + n)/p) memory, the proposed algorithm requires O(p) communication rounds and O(mn/p) lo ..."
Abstract - Cited by 13 (4 self) - Add to MetaCart
In this paper we present a parallel wavefront algorithm for computing an alignment between two strings A and C, with |A| = m, and |C| = n. On a distributed memory parallel computer of p processors each with O((m + n)/p) memory, the proposed algorithm requires O(p) communication rounds and O(mn/p) local computing time. The novelty of this algorithm is based on a compromise between the workload of each processor and the number of communication rounds required, expressed by a parameter called α. The proposed algorithm is expressed in terms of this parameter that can be tuned to obtain the best overall parallel time in a given implementation. We show very promising experimental results obtained on a 64-node Beowulf machine. A characteristic of the wavefront communication requirement is that each processor communicates with few other processors. This makes it very suitable as a potential application for grid computing.

NICAD: Accurate Detection of Near-Miss Intentional Clones Using Flexible Pretty-Printing and Code Normalization

by Chanchal K. Roy, James R. Cordy , 2008
"... This paper examines the effectiveness of a new languagespecific parser-based but lightweight clone detection approach. Exploiting a novel application of a source transformation system, the method accurately finds near-miss clones using an efficient text line comparison technique. The transformation ..."
Abstract - Cited by 13 (6 self) - Add to MetaCart
This paper examines the effectiveness of a new languagespecific parser-based but lightweight clone detection approach. Exploiting a novel application of a source transformation system, the method accurately finds near-miss clones using an efficient text line comparison technique. The transformation system assists the method in three ways. First, using agile parsing it provides user-specified flexible pretty- printing to remove noise, standardize formatting and break program statements into parts such that potential changes can be detected as simple linewise text differences. Second, it provides efficient flexible extraction of potential clones to be compared using island grammars and agile parsing to select granularities and enumerate potential clones. Third, using transformation rules it provides flexible code normalization to allow for local editing differences between similar code segments and filtering out of uninteresting parts of potential clones. In this paper we introduce the theory and practice of the framework and demonstrate its use in finding function clones in C code. Early experiments indicate that the method is capable of finding near-miss clones with high precision and recall, and with reasonable performance.

SAIL: A System for Generating, Archiving, and Retrieving Specialized Assignments Using LaTeX

by Stina Bridgeman, Michael T. Goodrich, Stephen G. Kobourov, Roberto Tamassia - in Proceedings of SIGCSE ‘00 , 2000
"... In this paper we present a package for the creation of Specialized Assignments In I.$TEX, SAIL. We describe several features which allow an instructor to create sufficiently different instances of the "same " problem so as to encourage student cooperation without fear of plagiarism. The SAIL package ..."
Abstract - Cited by 10 (0 self) - Add to MetaCart
In this paper we present a package for the creation of Specialized Assignments In I.$TEX, SAIL. We describe several features which allow an instructor to create sufficiently different instances of the "same " problem so as to encourage student cooperation without fear of plagiarism. The SAIL package also provides support for grading aids and grading automation. In addition, we describe an on-line system for archiving homework problems in a database that can be easily searched and to which new parametrized problems can be easily added. Together, the SAIL package and the searchable database of problems offer a powerful tool for generating, archiving, and retrieving homework assignments (as well as tests and quizzes). 1

Privacy Oracle: A System for Finding Application Leaks with Black Box Differential Testing

by Jaeyeon Jung, Anmol Sheth, Ben Greenstein, David Wetherall, Gabriel Maganis, Tadayoshi Kohno - In Proceedings of ACM CCS , 2008
"... We describe the design and implementation of Privacy Oracle, a system that reports on application leaks of user information via the network traffic that they send. Privacy Oracle treats each application as a black box, without access to either its internal structure or communication protocols. This ..."
Abstract - Cited by 9 (5 self) - Add to MetaCart
We describe the design and implementation of Privacy Oracle, a system that reports on application leaks of user information via the network traffic that they send. Privacy Oracle treats each application as a black box, without access to either its internal structure or communication protocols. This means that it can be used over a broad range of applications and information leaks (i.e., not only Web traffic content or credit card numbers). To accomplish this, we develop a differential testing technique in which perturbations in the application inputs are mapped to perturbations in the application outputs to discover likely leaks; we leverage alignment algorithms from computational biology to find high quality mappings between different byte-sequences efficiently. Privacy Oracle includes this technique and a virtual machine-based testing system. To evaluate it, we tested 26 popular applications, including system and file utilities, media players, and IM clients. We found that Privacy Oracle discovered many small and previously undisclosed information leaks. In several cases, these are leaks of directly identifying information that are regularly sent in the clear (without endto-end encryption) and which could make users vulnerable to tracking by third parties or providers.
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University