Results 1 -
6 of
6
A Database Index to Large Biological Sequences
- In VLDB
, 2001
"... We present an approach to searching genetic DNA sequences using an adaptation of the suffix tree data structure deployed on the general purpose persistent Java platform, PJama. Our implementation technique is novel, in that it allows us to build suffix trees on disk for arbitrarily large sequences, ..."
Abstract
-
Cited by 50 (3 self)
- Add to MetaCart
We present an approach to searching genetic DNA sequences using an adaptation of the suffix tree data structure deployed on the general purpose persistent Java platform, PJama. Our implementation technique is novel, in that it allows us to build suffix trees on disk for arbitrarily large sequences, for instance for the longest human chromosome consisting of 263 million letters. We propose to use such indexes as an alternative to the current practice of serial scanning. We describe our tree creation algorithm, analyse the performance of our index, and discuss the interplay of the data structure with object store architectures. Early measurements are presented.
Database indexing for large DNA and protein sequence collections
, 2002
"... Our aim is to develop new database technologies for the approximate matching of unstructured string data using indexes. We explore the potential of the suffix tree data structure in this context. We present a new method of building suffix trees, allowing us to build trees in excess of RAM size, whic ..."
Abstract
-
Cited by 20 (3 self)
- Add to MetaCart
Our aim is to develop new database technologies for the approximate matching of unstructured string data using indexes. We explore the potential of the suffix tree data structure in this context. We present a new method of building suffix trees, allowing us to build trees in excess of RAM size, which has hitherto not been possible. We show that this method performs in practice as well as the O(n) method of Ukkonen [70]. Using this method we build indexes for 200Mb of protein and 300Mbp of DNA, whose disk-image exceeds the available RAM. We show experimentally that suffix trees can be effectively used in approximate string matching with biological data. For a range of query lengths and error bounds the suffix tree reduces the size of the unoptimised O(mn) dynamic programming calculation required in the evaluation of string similarity, and the gain from indexing increases with index size. In the indexes we built this reduction is significant, and less than 0.3% of the expected matrix is evaluated. We detail the requirements for further database and algorithmic research to support efficient use of large suffix indexes in biological applications.
Scalable and Recoverable Implementation of Object Evolution for the PJama Platform
- In Persistent Object Systems (POS
, 2000
"... PJama 1 is the latest version of an orthogonally persistent platform for Java. It depends on a new persistent object store, Sphere, and provides facilities for class evolution. This evolution technology supports an arbitrary set of changes to the classes, which may have arbitrarily large populations ..."
Abstract
-
Cited by 17 (2 self)
- Add to MetaCart
PJama 1 is the latest version of an orthogonally persistent platform for Java. It depends on a new persistent object store, Sphere, and provides facilities for class evolution. This evolution technology supports an arbitrary set of changes to the classes, which may have arbitrarily large populations of persistent objects. We verify that the changes are safe. When there are format changes, we also convert all of the instances, while leaving their identities unchanged. We aspire to both very large persistent object stores and freedom for developers to specify arbitrary conversion methods in Java to convey information from old to new formats. Evolution operations must be safe and the evolution cost should be approximately linear in the number of objects that must be reformatted. In order that these conversion methods can be written easily, we continue to present the pre-evolution state consistently to Java executions throughout an evolution. At the completion of applying all of these tra...
Architecture of the PEVM: a high-performance orthogonally persistent java virtual machine
- the Proc. of the 9th Workshop on Persistent Object Systems (POS9
, 2000
"... This paper outlines the design and implementation of the PEVM, a new scalable, high-performance implementation of orthogonal persistence for the Java platform (OPJ). The PEVM is based on the Sun Microsystems Laboratories Virtual Machine for Research, which features an optimizing Just-In-Time compile ..."
Abstract
-
Cited by 10 (0 self)
- Add to MetaCart
This paper outlines the design and implementation of the PEVM, a new scalable, high-performance implementation of orthogonal persistence for the Java platform (OPJ). The PEVM is based on the Sun Microsystems Laboratories Virtual Machine for Research, which features an optimizing Just-In-Time compiler, exact generational garbage collection, and fast thread synchronization. The PEVM also uses a new, scalable persistent object store designed to manage 80GB of objects. It is approximately ten times faster than previous OPJ implementations and can run signi cantly larger programs. Despite its greater speed and scalability, the PEVM's implementation is much simpler (e.g., just 43 % of the VM source patches needed by our previous OPJ implementation). This is largely due to the pointer swizzling strategy we chose, the ResearchVM's exact memory management, and simple but e ective mechanisms. For example, we implement some key data structures in the Java programming language since this automatically makes them persistent.
Integrating programming languages and databases: What is the problem
- In ODBMS.ORG, Expert Article
, 2005
"... Abstract. The problem of integrating databases and programming languages has been open for nearly 45 years. During this time much progress has been made, in exploring specialized database programming languages, orthogonal persistence, object-oriented databases, transaction models, data access librar ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
Abstract. The problem of integrating databases and programming languages has been open for nearly 45 years. During this time much progress has been made, in exploring specialized database programming languages, orthogonal persistence, object-oriented databases, transaction models, data access libraries, embedded queries, and object-relational mapping. While new solutions are proposed every year, none has yet proven fully satisfactory. One explanation for this situation is that the problem itself is not sufficiently well defined, so that partial solutions continue to be proposed and evaluated based upon incomplete metrics, making directed progress difficult. This paper is an attempt to clarify the problem, rather than propose a new solution. We review issues that arise on the boundary between programming languages and databases, including typing, optimization, and reuse. We develop specific criteria for evaluating solutions and apply these to the solution approaches mentioned above. The analysis shows that progress has been made, yet the key problem of meeting all the criteria simultaneously remains open. Updated 10/12/2005. So the solution’s easy enough; each of us stays put in his or her corner and takes no notice of the others. You here, you here, and I there. Like soldiers at our posts. Also, we mustn’t speak. Not one word. That won’t be difficult; each of us has plenty of material for self-communings. – Huis Clos (No Exit) by Jean Paul Sartre 1
Comparative study of persistence mechanisms for the java platform
, 2004
"... Access to persistent data is a requirement for the majority of computer applications. The Java programming language and associated run-time environment provide excellent features for the construction of reliable and robust applications, but currently these do not extend to the domain of persistent d ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
Access to persistent data is a requirement for the majority of computer applications. The Java programming language and associated run-time environment provide excellent features for the construction of reliable and robust applications, but currently these do not extend to the domain of persistent data. Many mechanisms for managing persistent data have been proposed, some of which are now included in the standard Java platforms, e.g., J2SE ™ and J2EE™. This paper defines a set of criteria by which persistence mechanisms may be compared and then applies the criteria to a representative set of widely used mechanisms. The criteria are evaluated in the context of a widely-known benchmark, which was ported to each of the mechanisms, and include performance and scalability results.

