Results 1  10
of
12
Correction of sequencebased artifacts in serial analysis of gene expression
 Bioinformatics
, 2004
"... Motivation: Serial Analysis of Gene Expression (SAGE) is a powerful technology for measuring global gene expression, through rapid generation of large numbers of transcript tags. Beyond their intrinsic value in differential gene expression analysis, SAGE tag collections afford abundant information o ..."
Abstract

Cited by 17 (0 self)
 Add to MetaCart
(Show Context)
Motivation: Serial Analysis of Gene Expression (SAGE) is a powerful technology for measuring global gene expression, through rapid generation of large numbers of transcript tags. Beyond their intrinsic value in differential gene expression analysis, SAGE tag collections afford abundant information on the size and shape of the sample transcriptome and can accelerate novel gene discovery. These latter SAGE applications are facilitated by the enhanced method of Long SAGE. A characteristic of sequencingbased methods, such as SAGE and Long SAGE is the unavoidable occurrence of artifact sequences resulting from sequencing errors. By virtue of their lowrandom incidence, such tag errors have minimal impact on differential expression analysis. However, to fully exploit the value of large SAGE tag datasets, it is desirable to account for and correct tag artifacts. Results: We present estimates for occurrences of tag errors, and an efficient error correction algorithm. Error rate estimates are based on a stochastic model that includes the Polymerase chain reaction and sequencing error contributions.The correction algorithm, SAGEScreen, is a multistep procedure that addresses ditag processing, estimation of empirical error rates from highly abundant tags, grouping of similarsequence tags and statistical testing of observed counts. We apply SAGEScreen to Long SAGE libraries and compare error rates for several processing scenarios. Results with simulated tag collections indicate that SAGEScreen corrects 78% of recoverable tag errors and reduces the occurrences of singleton tags.
Eternal branching Markov processes: averaging properties and PCR applications
 UNIVERSITÉ CLAUDE BERNARD LYON 1 EXLAPCS – DOMAINE DE GERLAND 50, AVENUE TONYGARNIER 69366 LYON CEDEX 07 (FRANCE) DIDIER.PIAU@UNIVLYON1.FR HTTP://LAPCS.UNIVLYON1.FR
, 2001
"... Eternal branching Markov process (eBMP) is a modication of the usual branching model, in which each particle of generation n is counted, in addition to its offsprings, as a member of generation n + 1, its state being unchanged. When the number of osprings is Bernoulli, eBMP accounts, for instance, ..."
Abstract

Cited by 12 (5 self)
 Add to MetaCart
Eternal branching Markov process (eBMP) is a modication of the usual branching model, in which each particle of generation n is counted, in addition to its offsprings, as a member of generation n + 1, its state being unchanged. When the number of osprings is Bernoulli, eBMP accounts, for instance, for the variability of the biological sequences that are produced by polymerase chain reactions (PCR). This variability is due to the mutations and to the incomplete replications that affect the PCR. Estimators of PCR mutation rate and eciency have been proposed, that are based in particular on the empirical law n of the mutations of a sequence. Unfortunately, n is not analytically tractable. However, the innitepopulation limit n of n is easily characterized in the two following, biologically relevant, cases. The Markovian kernel describes an homogeneous random walk, either on the integers, or on some finite Cartesian product of a finite set A. In the PCR context, this corresponds to infinite or finite targets, respectively. In this paper, we provide bounds of the discrepancy between n and n in these two cases. As a consequence, eBMP exhibits a strong averaging effect, even for surprisingly small starting populations. The bounds are explicit functions of the ospring law, the Markovian kernel, the number of steps n, the size of the initial population and, in the finitetarget case, the size of the target. They concern every moment and, what might be less expected, the histogram itself. In the finite target case, some of the bounds are restricted to mutation rates per site and per cycle below 1 1=N , where N > 2 is the size of A. We use precise estimates of the harmonic means of general classical branching processes, whose proofs are included in an appendix.
A Stochastic model and simulation algorithm for polymerase chain reaction (PCR) systems
 Proc. of Workshop on Genomics Signal Processing and Statistics
, 2004
"... A new stochastic approach to model Polymerase Chain Reaction (PCR) kinetic is presented, in which primer and template DNA sequences, enzyme concentration, temperature profile, PCR duration, hybridization kinetics and enzymatic rates are all incorporated. By studying the underlying biochemical proces ..."
Abstract

Cited by 7 (2 self)
 Add to MetaCart
(Show Context)
A new stochastic approach to model Polymerase Chain Reaction (PCR) kinetic is presented, in which primer and template DNA sequences, enzyme concentration, temperature profile, PCR duration, hybridization kinetics and enzymatic rates are all incorporated. By studying the underlying biochemical processes of PCR, we show that under certain conditions, the extension length of DNA strands during a given time interval can be modeled as a Poisson random variable. We further use this fact to derive the distribution of the number of replicated strands, along with the probability of DNA replication (i.e. efficiency) in each PCR cycle. This in turn enables one to follow the stochastic behavior of the biochemical process as the PCR cycles progress. A simulation algorithm with preliminary results is also included, which demonstrates the feasibility and applicability of this modeling technique for a wide range of PCR applications.
A computer program for the estimation of protein and nucleic acid sequence diversity in random point mutagenesis libraries
, 2005
"... ..."
On the Harmonic Means of Branching Processes
"... The random variable X plays the role of 1 + Z in the eBMP context. Thus, m corresponds to 1 + = (1 p) 1 , and p 1 in this note is p 0 in Piau (2001b), and so on. 0.2 Results Joe (1993) mentions, as an unpublished result, the fact that, for any a > 0, E (S a n ) 6 c q n for any q > max ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
The random variable X plays the role of 1 + Z in the eBMP context. Thus, m corresponds to 1 + = (1 p) 1 , and p 1 in this note is p 0 in Piau (2001b), and so on. 0.2 Results Joe (1993) mentions, as an unpublished result, the fact that, for any a > 0, E (S a n ) 6 c q n for any q > maxfp 1 ; m a g; where c is independent of n. In the same vein, Athreya (1994) shows that, if p 1 m a > 1, then p n 1 E(S a n j S 0 = 1) is a nondecreasing sequence, t
Probabilistic Methods in Directed Evolution: Library Size, Mutation Rate, and Diversity
"... Directed evolution has emerged as an important tool for engineering proteins with improved or novel properties. Because of their inherent reliance on randomness, directed evolution protocols are amenable to probabilistic modeling and analysis. This chapter summarizes and reviews in a nonmathematical ..."
Abstract
 Add to MetaCart
Directed evolution has emerged as an important tool for engineering proteins with improved or novel properties. Because of their inherent reliance on randomness, directed evolution protocols are amenable to probabilistic modeling and analysis. This chapter summarizes and reviews in a nonmathematical way some of the probabilistic works related to directed evolution, with particular focus on three of the most widely used methods: saturation mutagenesis, errorprone PCR, and in vitro recombination. The ultimate aim is to provide the reader with practical information to guide the planning and design of directed evolution studies. Importantly, the applications and locations of freely available computational resources to assist with this process are described in detail.
unknown title
, 2005
"... A computer program for the estimation of protein and nucleic acid sequence diversity in random point mutagenesis libraries ..."
Abstract
 Add to MetaCart
A computer program for the estimation of protein and nucleic acid sequence diversity in random point mutagenesis libraries
Applied Probability Trust (6 June 2008) ASYMPTOTICS OF POSTERIORS FOR BINARY BRANCHING PROCESSES
"... We compute the posterior distributions of the initial population and parameter of binary branching processes, in the limit of a large number of generations. We compare this Bayesian procedure with a more näıve one, based on hitting times of some random walks. In both cases, central limit theorems a ..."
Abstract
 Add to MetaCart
We compute the posterior distributions of the initial population and parameter of binary branching processes, in the limit of a large number of generations. We compare this Bayesian procedure with a more näıve one, based on hitting times of some random walks. In both cases, central limit theorems are available, with explicit variances.
Applied Probability Trust (11 March 2008) ASYMPTOTICS OF POSTERIORS FOR BINARY BRANCHING PROCESSES
"... We compute the posterior distributions of the initial population and parameter of binary branching processes, in the limit of a large number of generations. We compare this Bayesian procedure with a more näıve one, based on hitting times of some random walks. In both cases, central limit theorems a ..."
Abstract
 Add to MetaCart
We compute the posterior distributions of the initial population and parameter of binary branching processes, in the limit of a large number of generations. We compare this Bayesian procedure with a more näıve one, based on hitting times of some random walks. In both cases, central limit theorems are available, with explicit variances.
PROCESSES
, 806
"... We compute the posterior distributions of the initial population and parameter of binary branching processes, in the limit of a large number of generations. We compare this Bayesian procedure with a more naïve one, based on hitting times of some random walks. In both cases, central limit theorems ar ..."
Abstract
 Add to MetaCart
We compute the posterior distributions of the initial population and parameter of binary branching processes, in the limit of a large number of generations. We compare this Bayesian procedure with a more naïve one, based on hitting times of some random walks. In both cases, central limit theorems are available, with explicit variances.