Results 1  10
of
16
Intrusion Detection: A Bioinformatics Approach
 In : 19th Annual Computer Security Applications Conferences, Las Vegas, Nevada
, 2003
"... This paper addresses the problem of detecting masquerading, a security attack in which an intruder assumes the identity of a legitimate user. Many approaches based on Hidden Markov Models and various forms of Finite State Automata have been proposed to solve this problem. The novelty of our approach ..."
Abstract

Cited by 26 (7 self)
 Add to MetaCart
This paper addresses the problem of detecting masquerading, a security attack in which an intruder assumes the identity of a legitimate user. Many approaches based on Hidden Markov Models and various forms of Finite State Automata have been proposed to solve this problem. The novelty of our approach results from the application of techniques used in bioinformatics for a pairwise sequence alignment to compare the monitored session with past user behavior. Our algorithm uses a semiglobal alignment and a unique scoring system to measure similarity between a sequence of commands produced by a potential intruder and the user signature, which is a sequence of commands collected from a legitimate user. We tested this algorithm on the standard intrusion data collection set. As discussed in the paper, the results of the test showed that the described algorithm yields a promising combination of intrusion detection rate and false positive rate, when compared to published intrusion detection algorithms.
NearOptimal Sequence Alignment
 Curr. Opin. Struct. Biol
, 1996
"... Introduction Aligning two short, similiar protein sequences by eye is an easy task. Teaching a machine the same skill turns out to be astonishingly difficult. Starting with the famous paper by Needleman and Wunsch [1] people have used dynamic programming algorithms to maximize the score associated ..."
Abstract

Cited by 16 (0 self)
 Add to MetaCart
Introduction Aligning two short, similiar protein sequences by eye is an easy task. Teaching a machine the same skill turns out to be astonishingly difficult. Starting with the famous paper by Needleman and Wunsch [1] people have used dynamic programming algorithms to maximize the score associated with an alignment. While this algorithm is sometimes perceived as complicated the definition of an optimal sequence alignment is not. It is, in fact, simple. For proteins, pairs of residues are attributed a similarity score. With the aid of gaps (modeling insertions and deletions) the sum of these scores has to be maximized while the number and length of the gaps has to remain within biologically reasonable limits. This is achieved by subtracting penalties for gaps from the score for the matches. In a sense, it is astonishing how much biological information can be obtained using such a simple approach. A high alignment score between two proteins usually implies that sequences are homo
LinearSpace Algorithms that Build Local Alignments from Fragments
 Algorithmica
, 1995
"... Abstract. This paper presents practical algorithms for building an alignment of two long sequences from a collection of "alignment fragments, " such as all occurrences of identical 5tuples in each of two DNA sequences. We first combine a timeefficient algorithm developed by Galil and cow ..."
Abstract

Cited by 12 (7 self)
 Add to MetaCart
Abstract. This paper presents practical algorithms for building an alignment of two long sequences from a collection of "alignment fragments, " such as all occurrences of identical 5tuples in each of two DNA sequences. We first combine a timeefficient algorithm developed by Galil and coworkers with a spacesaving approach of Hirschberg to obtain a local alignment algorithm that uses O((M + N + F log N) log M) time and O(M + N) space to align sequences of lengths M and N from a pool of F alignment fragments. Ideas of Huang and Miller are then employed to develop a time and spaceefficient algorithm that computes n best nonintersecting alignments for any n> 1. An example illustrates the utility of these methods.
Efficient Algorithms for Sequence Analysis with Concave and Convex Gap Costs
, 1989
"... EFFICIENT ALGORITHMS FOR SEQUENCE ANALYSIS WITH CONCAVE AND CONVEX GAP COSTS David A. Eppstein We describe algorithms for two problems in sequence analysis: sequence alignment with gaps (multiple consecutive insertions and deletions treated as a unit) and RNA secondary structure with single loops ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
EFFICIENT ALGORITHMS FOR SEQUENCE ANALYSIS WITH CONCAVE AND CONVEX GAP COSTS David A. Eppstein We describe algorithms for two problems in sequence analysis: sequence alignment with gaps (multiple consecutive insertions and deletions treated as a unit) and RNA secondary structure with single loops only. We make the assumption that the gap cost or loop cost is a convex or concave function of the length of the gap or loop, and show how this assumption may be used to develop e#cient algorithms for these problems. We show how the restriction to convex or concave functions may be relaxed, and give algorithms for solving the problems when the cost functions are neither convex nor concave, but can be split into a small number of convex or concave functions. Finally we point out some sparsity in the structure of our sequence analysis problems, and describe how we may take advantage of that sparsity to further speed up our algorithms. CONTENTS 1. Introduction ............................1 ...
Simian hepatitis A virus (HAV) strain AGM27: comparison of genome structure and growth in cell culture with other HAV strains
"... Fragments of eDNA representing greater than 99 % of the entire genome of wildtype hepatitis A virus (HAV) strain AGM27, isolated from an African green monkey, were obtained by the polymerase chain reaction and sequenced. Comparison with other HAV isolates revealed differences in the predicted amin ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
Fragments of eDNA representing greater than 99 % of the entire genome of wildtype hepatitis A virus (HAV) strain AGM27, isolated from an African green monkey, were obtained by the polymerase chain reaction and sequenced. Comparison with other HAV isolates revealed differences in the predicted amino acid sequence in functionally critical parts of the genome. Comparison of the biological properties of AGM27 with those of human wildtype and cell cultureadapted HM175 strains revealed that AGM27 grew in cell culture significantly better than did wildtype HM175, but not as well as cell cultureadapted HM175. AGM27 and cell cultureadapted HM175 were distinguishable by their differential growth in CVI, FRhK4 and primary AGMK cells. r
Learning Significant Alignments: An Alternative to Normalized Local Alignment
"... We describe a supervised learning approach to resolve difficulties in nding biologically significant local alignments. It was noticed that the O(n²) algorithm by SmithWaterman, the prevalent tool for computing local sequence alignment, often outputs long, meaningless alignments while ignoring ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
We describe a supervised learning approach to resolve difficulties in nding biologically significant local alignments. It was noticed that the O(n²) algorithm by SmithWaterman, the prevalent tool for computing local sequence alignment, often outputs long, meaningless alignments while ignoring shorter, biologically significant ones. Arslan et. al. proposed an O(n²log n) algorithm which outputs a normalized local alignment that maximizes the degree of similarity rather than the total similarity score. Given a properly selected normalization parameter, the algorithm can discover significant alignments that would be missed by the SmithWaterman algorithm. Unfortunately, determining a proper normalization parameter requires repeated executions with different parameter values and expert feedback to determine the usefulness of the alignments. We propose a learning approach that uses existing biologically significant alignments to learn parameters for intelligently processing suboptimal SmithWaterman alignments. Our algorithm runs in O(n²) time and can discover biologically significant alignments without requiring expert feedback to produce meaningful results.
Efficient Algorithms for Sequence Analysis
 Proc. Second Workshop on Sequences: Combinatorics, Compression. Securiry
, 1991
"... : We consider new algorithms for the solution of many dynamic programming recurrences for sequence comparison and for RNA secondary structure prediction. The techniques upon which the algorithms are based e#ectively exploit the physical constraints of the problem to derive more e#cient methods f ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
: We consider new algorithms for the solution of many dynamic programming recurrences for sequence comparison and for RNA secondary structure prediction. The techniques upon which the algorithms are based e#ectively exploit the physical constraints of the problem to derive more e#cient methods for sequence analysis. 1. INTRODUCTION In this paper we consider algorithms for two problems in sequence analysis. The first problem is sequence alignment, and the second is the prediction of RNA structure. Although the two problems seem quite di#erent from each other, their solutions share a common structure, which can be expressed as a system of dynamic programming recurrence equations. These equations also can be applied to other problems, including text formatting and data storage optimization. We use a number of well motivated assumptions about the problems in order to provide e#cient algorithms. The primary assumption is that of concavity or convexity. The recurrence relations for bo...
Discrete Pattern Matching Over Sequences And Interval Sets
, 1993
"... Finding matches, both exact and approximate, between a sequence of symbols A and a pattern P has long been an active area of research in algorithm design. Some of the more wellknown byproducts from that research are the diff program and grep family of programs. These problems form a subdomain of a ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Finding matches, both exact and approximate, between a sequence of symbols A and a pattern P has long been an active area of research in algorithm design. Some of the more wellknown byproducts from that research are the diff program and grep family of programs. These problems form a subdomain of a larger areas of problems called discrete pattern matching which has been developed recently to characterise the wide range of pattern matching problems. This dissertation presents new algorithms for discrete pattern matching over sequences and develops a new subdomain of problems called discrete pattern matching over interval sets. The problems and algorithms presented here are characterised by pattern matching over interval sets. The problems and al
Biological Sequence Comparison: An Overview of Techniques
, 1994
"... Introduction Molecular biologists often want to compare two or more DNA or amino acid sequences to measure their similarity. Sequences which are similar may either have descended from a common evolutionary ancestor, or may have evolved to have a similar function (divergent and convergent evolution, ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Introduction Molecular biologists often want to compare two or more DNA or amino acid sequences to measure their similarity. Sequences which are similar may either have descended from a common evolutionary ancestor, or may have evolved to have a similar function (divergent and convergent evolution, respectively). The first kind of similarity is called "homology", although the term is (loosely) used to cover the second kind of similarity, too. One may want to determine homology in order to reconstruct evolutionary trees. Another important application of similarity testing is to determine the function of an unknown protein or piece of DNA: one can compare the query sequence with a database of sequences whose function is known, and predict the function of the query sequence based on the most similar match. (This highlights the relevance of various string searching techniques which have been developed in computer science; see for example [27].) The question of how to define simila
Fast Protein Fold Recognition via Sequence to Structure Alignment and Contact Capacity Potentials
 Pac. Symp. Biocomput
, 1996
"... We propose new empirical scoring potentials and associated alignment procedures for optimally aligning protein sequences to protein structures. The method has two main applications: first, the recognition of a plausible fold for a protein sequence of unknown structure out of a database of representa ..."
Abstract
 Add to MetaCart
We propose new empirical scoring potentials and associated alignment procedures for optimally aligning protein sequences to protein structures. The method has two main applications: first, the recognition of a plausible fold for a protein sequence of unknown structure out of a database of representative protein structures and, second, the improvement of sequence alignments by using structural information in order to find a better starting point for homology based modelling. The empirical scoring function is derived from an analysis of a non redundant database of known structures by converting relative frequencies into pseudoenergies using a normalization according to the inverse Boltzmann law. These  so called contact capacity  potentials turn out to be discriminative enough to detect structural folds in the absence of significant sequence similarity and at the same time simple enough to allow for a very fast optimization in an alignment procedure. 1 Introduction and Problem Defi...