Results 1  10
of
44
From Boolean to Probabilistic Boolean Networks as Models of Genetic Regulatory Networks
 Proc. IEEE
, 2002
"... Mathematical and computational modeling of genetic regulatory networks promises to uncover the fundamental principles governing biological systems in an integrarive and holistic manner. It also paves the way toward the development of systematic approaches for effective therapeutic intervention in di ..."
Abstract

Cited by 83 (16 self)
 Add to MetaCart
Mathematical and computational modeling of genetic regulatory networks promises to uncover the fundamental principles governing biological systems in an integrarive and holistic manner. It also paves the way toward the development of systematic approaches for effective therapeutic intervention in disease. The central theme in this paper is the Boolean formalism as a building block for modeling complex, largescale, and dynamical networks of genetic interactions. We discuss the goals of modeling genetic networks as well as the data requirements. The Boolean formalism is justified from several points of view. We then introduce Boolean networks and discuss their relationships to nonlinear digital filters. The role of Boolean networks in understanding cell differentiation and cellular functional states is discussed. The inference of Boolean networks from real gene expression data is considered from the viewpoints of computational learning theory and nonlinear signal processing, touching on computational complexity of learning and robustness. Then, a discussion of the need to handle uncertainty in a probabilistic framework is presented, leading to an introduction of probabilistic Boolean networks and their relationships to Markov chains. Methods for quantifying the influence of genes on other genes are presented. The general question of the potential effect of individual genes on the global dynamical network behavior is considered using stochastic perturbation analysis. This discussion then leads into the problem of target identification for therapeutic intervention via the development of several computational tools based on firstpassage times in Markov chains. Examples from biology are presented throughout the paper. 1
On learning gene regulatory networks under the Boolean network model
 Machine Learning
, 2003
"... Boolean networks are a popular model class for capturing the interactions of genes and global dynamical behavior of genetic regulatory networks. Recently, a significant amount of attention has been focused on the inference or identification of the model structure from gene expression data. We consi ..."
Abstract

Cited by 22 (2 self)
 Add to MetaCart
Boolean networks are a popular model class for capturing the interactions of genes and global dynamical behavior of genetic regulatory networks. Recently, a significant amount of attention has been focused on the inference or identification of the model structure from gene expression data. We consider the Consistency as well as BestFit Extension problems in the context of inferring the networks from data. The latter approach is especially useful in situations when gene expression measurements are noisy and may lead to inconsistent observations. We propose simple efficient algorithms that can be used to answer the Consistency Problem and find one or all consistent Boolean networks relative to the given examples. The same method is extended to learning gene regulatory networks under the BestFit Extension paradigm. We also introduce a simple and fast way of finding all Boolean networks having limited error size in the BestFit Extension Problem setting. We apply the inference methods to a real gene expression data set and present the results for a selected set of genes.
Experimental Uncertainty Estimation and Statistics for Data Having Interval Uncertainty
, 2007
"... This report addresses the characterization of measurements that include epistemic uncertainties in the form of intervals. It reviews the application of basic descriptive statistics to data sets which contain intervals rather than exclusively point estimates. It describes algorithms to compute variou ..."
Abstract

Cited by 20 (14 self)
 Add to MetaCart
This report addresses the characterization of measurements that include epistemic uncertainties in the form of intervals. It reviews the application of basic descriptive statistics to data sets which contain intervals rather than exclusively point estimates. It describes algorithms to compute various means, the median and other percentiles, variance, interquartile range, moments, confidence limits, and other important statistics and summarizes the computability of these statistics as a function of sample size and characteristics of the intervals in the data (degree of overlap, size and regularity of widths, etc.). It also reviews the prospects for analyzing such data sets with the methods of inferential statistics such as outlier detection and regressions. The report explores the tradeoff between measurement precision and sample size in statistical results that are sensitive to both. It also argues that an approach based on interval statistics could be a reasonable alternative to current standard methods for evaluating, expressing and propagating measurement uncertainties.
Gene Clustering Based on Clusterwide Mutual Information
 of the school of engineering. Michel Verleysen was born in 1965 in Belgium. He received the M.S. and Ph.D. degrees in electrical engineering from the Université catholique de Louvain (Belgium) in 1987 and 1992, respectively. He was an Invited Professor at
, 2004
"... Cluster analysis of genewide expression data from DNA microarray hybridization studies has proved to be a useful tool for identifying biologically relevant groupings of genes and constructing gene regulatory networks. The motivation for considering mutual information is its capacity to measure a ge ..."
Abstract

Cited by 13 (4 self)
 Add to MetaCart
Cluster analysis of genewide expression data from DNA microarray hybridization studies has proved to be a useful tool for identifying biologically relevant groupings of genes and constructing gene regulatory networks. The motivation for considering mutual information is its capacity to measure a general dependence among gene random variables. We propose a novel clustering strategy based on minimizing mutual information among gene clusters. Simulated annealing is employed to solve the optimization problem. Bootstrap techniques are employed to get more accurate estimates of mutual information when the data sample size is small. Moreover, we propose to combine the mutual information criterion and traditional distance criteria such as the Euclidean distance and the fuzzy membership metric in designing the clustering algorithm. The performances of the new clustering methods are compared with those of some existing methods, using both synthesized data and experimental data. It is seen that the clustering algorithm based on a combined metric of mutual information and fuzzy membership achieves the best performance. The supplemental material is available at www.gspsnap.tamu.edu/gspweb/zxb/glioma_zxb.
Biclustering genefeature matrices for statistically significant dense patterns
 in: IEEE Computer Society Bioinformatics Conf
, 2004
"... Biclustering is an important problem that arises in diverse applications, including analysis of gene expression and drug interaction data. The problem can be formalized in various ways through different interpretation of data and associated optimization functions. We focus on the problem of finding ..."
Abstract

Cited by 10 (1 self)
 Add to MetaCart
Biclustering is an important problem that arises in diverse applications, including analysis of gene expression and drug interaction data. The problem can be formalized in various ways through different interpretation of data and associated optimization functions. We focus on the problem of finding unusually dense patterns in binary (01) matrices. This formulation is appropriate for analyzing experimental datasets that come from not only binary quantization of gene expression data, but also more comprehensive datasets such as genefeature matrices that include functions of coded proteins and motifs in the coding sequence. We formalize the notion of an “unusually ” dense submatrix to evaluate the interestingness of a pattern in terms of statistical significance based on the assumption of a uniform memoryless source. We then simplify it to assess statistical significance of discovered patterns. Using statistical significance as an objective function, we formulate the problem as one of finding significant dense submatrices of a large sparse matrix. Adopting a simple iterative heuristic along with randomized initialization techniques, we derive fast algorithms for discovering binary biclusters. We conduct experiments on a binary genefeature matrix and a quantized breast tumor gene expression matrix. Our experimental results show that the proposed method quickly discovers all interesting patterns in these datasets. 1.
Which is better for cDNAmicroarraybased classification: ratios or direct intensities
, 2004
"... Motivation: There are two general methods for making geneexpression microarrays: one is to hybridize a single test set of labeled targets to the probe, and measure the backgroundsubtracted intensity at each probe site; the other is to hybridize both a test and a reference set of differentially label ..."
Abstract

Cited by 8 (3 self)
 Add to MetaCart
Motivation: There are two general methods for making geneexpression microarrays: one is to hybridize a single test set of labeled targets to the probe, and measure the backgroundsubtracted intensity at each probe site; the other is to hybridize both a test and a reference set of differentially labeled targets to a single detector array, and measure the ratio of the backgroundsubtracted intensities at each probe site. Which method is better depends on the variability in the cell system and the random factors resulting from the microarray technology. It also depends on the purpose for which the microarray is being used. Classification is a fundamental application and it is the one considered here.
Fast Algorithms for Computing Statistics under Interval Uncertainty, with Applications to Computer Science and to Electrical and Computer Engineering
, 2007
"... Computing statistics is important. In many engineering applications, we are interested in computing statistics. For example, in environmental analysis, we observe a pollution level x(t) in a lake at different moments of time t, and we would like to estimate standard statistical characteristics such ..."
Abstract

Cited by 6 (3 self)
 Add to MetaCart
Computing statistics is important. In many engineering applications, we are interested in computing statistics. For example, in environmental analysis, we observe a pollution level x(t) in a lake at different moments of time t, and we would like to estimate standard statistical characteristics such as mean, variance, autocorrelation, correlation with other measurements. For each of these characteristics C, there is an expression C(x1,..., xn) that enables us to provide an estimate for C based on the observed values x1,..., xn. For example: a reasonable statistic for estimating the mean value of a probability distribution is the population average E(x1,..., xn) = 1 n · (x1 +... + xn); a reasonable statistic for estimating the variance V is the population variance V (x1,..., xn) = 1 n · n∑
Mushegian A: The choice of optimal distance measure in genomewide datasets
 Bioinformatics 2005
"... Motivation: Many types of genomic data are naturally represented as binary vectors. Numerous tasks in computational biology can be cast as analysis of relationships between these vectors, and the first step is frequently to compute their pairwise distance matrix. Many distance measures have been pro ..."
Abstract

Cited by 6 (4 self)
 Add to MetaCart
Motivation: Many types of genomic data are naturally represented as binary vectors. Numerous tasks in computational biology can be cast as analysis of relationships between these vectors, and the first step is frequently to compute their pairwise distance matrix. Many distance measures have been proposed in the literature, but there is no theory justifying the choice of distance measure. Results: We examine the approaches to measuring distances between binary vectors and study the characteristic properties of various distance measures and their performance in several tasks of genome analysis. Most distance measures between binary vectors turn out to belong to a single parametric family, namely generalized averagebased distance with different exponents. We show that descriptive statistics of distance distribution, such as skewness and kurtosis, can guide the appropriate choice of the exponent. On the contrary, the more familiar distance properties, such as metric and additivity, appear to have much less effect on the performance of distances. Availability: R code GADIST is available from the corresponding author upon request. 2
Statistical methods for microarray assays
 J Appl Genet
, 2002
"... Abstract: The paper shortly reviews statistical methods used in the area of DNA microarray studies. All stages of the experiment are taken into account: planning, data collection, data preprocessing, analysis and validation. Among the methods of data analysis, the algorithms for estimating different ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
Abstract: The paper shortly reviews statistical methods used in the area of DNA microarray studies. All stages of the experiment are taken into account: planning, data collection, data preprocessing, analysis and validation. Among the methods of data analysis, the algorithms for estimating differential expression, multivariate approaches, clustering methods, as well as classification and discrimination are reviewed. The need is stressed for routine statistical data processing protocols and for the search of links of microarray data analysis with quantitative genetic models. Key words: data analysis, data collection, DNA microarrays, planning experiments, statistical methods, validation.
Design of probabilistic Boolean networks under the requirement of contextual data consistency
 IEEE Trans. Signal Process
, 2006
"... Abstract—A key issue of genomic signal processing is the design of gene regulatory networks. A probabilistic Boolean network (PBN) is composed of a family of Boolean networks. It stochastically switches between its constituent networks (contexts). For network design, connectivity and transition rule ..."
Abstract

Cited by 5 (4 self)
 Add to MetaCart
Abstract—A key issue of genomic signal processing is the design of gene regulatory networks. A probabilistic Boolean network (PBN) is composed of a family of Boolean networks. It stochastically switches between its constituent networks (contexts). For network design, connectivity and transition rules must be inferred from data via some optimization criterion. Except rarely, the optimal rule for a gene will not be a perfect predictor because there will be inconsistencies in the data. It would be natural to model these inconsistencies to reflect changes in PBN contexts. If we assume inconsistencies result from the data arising from a random function, then design involves finding the realizations of a random function and the probability mass on those realizations so that the resulting random function best fits the data relative to the expectation of its output and does so using a minimal number of realizations. We propose PBN design satisfying the biological assumption that data are consistent within a context, for which the distribution of the network agrees with the empirical distribution of the data, and such that this is accomplished with a minimal number of contexts. The design also satisfies the biological constraint that, because the network spends the great majority of time in its attractors, all data states should be attractor states in the model. Index Terms—Data consistency, gene regulatory network, graphical model, network inference. I.