Results 1 - 10
of
16
Indexing and Retrieval for Genomic Databases
- IEEE Transactions on Knowledge and Data Engineering
, 2002
"... Genomic sequence databases are widely used by molecular biologists for homology searching. Amino-acid and nucleotide databases are increasing in size exponentially, and mean sequence lengths are also increasing. In searching such databases, it is desirable to use heuristics to perform computationall ..."
Abstract
-
Cited by 40 (6 self)
- Add to MetaCart
Genomic sequence databases are widely used by molecular biologists for homology searching. Amino-acid and nucleotide databases are increasing in size exponentially, and mean sequence lengths are also increasing. In searching such databases, it is desirable to use heuristics to perform computationally intensive local alignments on selected sequences only and to reduce the costs of the alignments that are attempted. We present an index-based approach for both selecting sequences that display broad similarity to a query and for fast local alignment. We show experimentally that the indexed approach results in signi cant savings in computationally intensive local alignments, and that index-based searching is as accurate as existing exhaustive search schemes.
A Simple Hyper-Geometric Approach for Discovering Putative Transcription Factor Binding Sites
- Algorithms in Bioinformatics: Proc. First International Workshop, number 2149 in LNCS
, 2001
"... A central issue in molecular biology is understanding the regulatory mechanisms that control gene expression. The recent ood of genomic and post-genomic data opens the way for computational methods elucidating the key components that play a role in these mechanisms. ..."
Abstract
-
Cited by 36 (6 self)
- Add to MetaCart
A central issue in molecular biology is understanding the regulatory mechanisms that control gene expression. The recent ood of genomic and post-genomic data opens the way for computational methods elucidating the key components that play a role in these mechanisms.
Extracting structured motifs using a suffix tree - algorithms and application to promoter consensus identification
- In Proceedings of RECOMB 2000
, 2000
"... promoter consensus identification ..."
Inferring regulatory elements from a whole genome. An analysis of Helicobacter pylori sigma(80) family of promoter signals
- J. Mol. Biol
, 2000
"... binding site. E-mail address of the corresponding author: ..."
Abstract
-
Cited by 21 (3 self)
- Add to MetaCart
binding site. E-mail address of the corresponding author:
VOTING ALGORITHMS FOR DISCOVERING LONG MOTIFS
"... Pevzner and Sze [14] have introduced the Planted (l,d)-Motif Problem to find similar patterns (motifs) in sequences which represent the promoter region of co-regulated genes. l is the length of the motif and d is the maximum Hamming distance around the similar patterns. Many algorithms have been dev ..."
Abstract
-
Cited by 19 (6 self)
- Add to MetaCart
Pevzner and Sze [14] have introduced the Planted (l,d)-Motif Problem to find similar patterns (motifs) in sequences which represent the promoter region of co-regulated genes. l is the length of the motif and d is the maximum Hamming distance around the similar patterns. Many algorithms have been developed to solve this motif problem. However, these algorithms either have long running times or do not guarantee the motif can be found. In this paper, we introduce new algorithms to solve the motif problem. Our algorithms can find motifs in reasonable time for not only the challenging (9,2), (11,3), (15,5)-motif problems but for even longer motifs, say (20,7), (30,11) and (40,15), which have never been seriously attempted by other researchers because of heavy time and space requirements. 1
On finding novel gapped motifs in DNA sequences
- In RECOMB98: Proceedings of the Second Annual International Conference on Computational Molecular Biology
, 1998
"... this paper I will describe the concept and implementation of an algorithm for finding motifs in DNA sequences, on which Martin Tompa and I have been working for approximately a year (see also [5]). This section will briefly describe the project concept as it existed when I began work on the project. ..."
Abstract
-
Cited by 16 (4 self)
- Add to MetaCart
this paper I will describe the concept and implementation of an algorithm for finding motifs in DNA sequences, on which Martin Tompa and I have been working for approximately a year (see also [5]). This section will briefly describe the project concept as it existed when I began work on the project. The remaining sections highlight our progress on the problem during the time I was working on it. Section 2 will discuss some improvements made to the algorithm. In section 3, some results are shown. Section 4 will describe some details of the implementation. 1.1
COPIA: A new software for finding consensus patterns in unaligned protein sequences
- University of Waterloo
, 2001
"... I hereby declare that I am the sole author of this thesis. This is a true copy of the thesis, including any required final revisions, as accepted by my examiners. I understand that my thesis may be made electronically available to the public. ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
I hereby declare that I am the sole author of this thesis. This is a true copy of the thesis, including any required final revisions, as accepted by my examiners. I understand that my thesis may be made electronically available to the public.
Pattern Discovery from Biosequences
, 2002
"... In this thesis we have developed novel methods for analyzing biological data, the primary sequences of the DNA and proteins, the microarray based gene expression data, and other functional genomics data. The main contribution is the development of the pattern discovery algorithm SPEXS, accompanied b ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
In this thesis we have developed novel methods for analyzing biological data, the primary sequences of the DNA and proteins, the microarray based gene expression data, and other functional genomics data. The main contribution is the development of the pattern discovery algorithm SPEXS, accompanied by several practical applications for analyzing real biological problems. For performing these biological studies that integrate different types of biological data we have developed a comprehensive web-based biological data analysis environment Expression Profiler (http://ep.ebi.ac.uk/)...
An efficient algorithm for the extended (l,d)-motif problem with unknown number of binding sites
- Proc. BIBE
, 2005
"... Finding common patterns, or motifs, from a set of DNA sequences is an important problem in molecular biology. Most motif-discovering algorithms/software require the length of the motif as input. Motivated by the fact that the motif’s length is usually unknown in practice, Styczynski et al. introduce ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
Finding common patterns, or motifs, from a set of DNA sequences is an important problem in molecular biology. Most motif-discovering algorithms/software require the length of the motif as input. Motivated by the fact that the motif’s length is usually unknown in practice, Styczynski et al. introduced the Extended (l,d)-Motif Problem (EMP), where the motif’s length is not an input parameter. Unfortunately, the algorithm given by Styczynski et al. to solve EMP can take an unacceptably long time to run, e.g. over 3 months to discover a length-14 motif. This paper makes two main contributions. First, we eliminate another input parameter from EMP: the minimum number of binding sites in the DNA sequences. Fewer input parameters not only reduces the burden of the user, but also may give more realistic/robust results since restrictions on length or on the number of binding sites make little sense when the best motif may not be the longest nor have the largest number of binding sites. Second, we develop an efficient algorithm to solve our redefined problem. The algorithm is also a fast solution for EMP (without any sacrifice to accuracy) making EMP practical. 1.
On Motifs in Biological Sequences
"... Conserved patterns of any kind are of great interest in biology as they are likely to represent objects upon which strong constraints are potentially acting and may therefore perform a biological function. Among the objects which may model biological entities, we shall consider sequences only, wheth ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Conserved patterns of any kind are of great interest in biology as they are likely to represent objects upon which strong constraints are potentially acting and may therefore perform a biological function. Among the objects which may model biological entities, we shall consider sequences only, whether dna, rna or proteins. There are basically two questions that may be addressed when trying to search for known or predicted patterns in biological sequences. One is the question of position: where are these patterns located (pattern localization prediction)? The second question concerns identifying and then, possibly, modelling the patterns ab initio: what would be a consensual motif for them (pattern consensus prediction)? What is most interesting to discover is often not which known pattern has matches in one or more sequences, but which patterns, unknown at start, appear conserved and have therefore...

