## Statistical model selection methods applied to biological networks (2005)

### Cached

### Download Links

- [www.birc.au.dk]
- [www.birc.au.dk]
- [www.math.ku.dk]
- DBLP

### Other Repositories/Bibliography

Venue: | Transactions in Computational Systems Biology |

Citations: | 8 - 2 self |

### BibTeX

@INPROCEEDINGS{Stumpf05statisticalmodel,

author = {Michaelp. H. Stumpf and Piers J. Ingram},

title = {Statistical model selection methods applied to biological networks},

booktitle = {Transactions in Computational Systems Biology},

year = {2005}

}

### OpenURL

### Abstract

Abstract. Many biological networks have been labelled scale-free as their degree distribution can be approximately described by a powerlaw distribution. While the degree distribution does not summarize all aspects of a network it has often been suggested that its functional form contains important clues as to underlying evolutionary processes that have shaped the network. Generally determining the appropriate functional form for the degree distribution has been fitted in an ad-hoc fashion. Here we apply formal statistical model selection methods to determine which functional form best describes degree distributions of protein interaction and metabolic networks. We interpret the degree distribution as belonging to a class of probability models and determine which of these models provides the best description for the empirical data using maximum likelihood inference, composite likelihood methods, the Akaike information criterion and goodness-of-fit tests. The whole data is used in order to determine the parameter that best explains the data under a given model (e.g. scale-free or random graph). As we will show, present protein interaction and metabolic network data from different organisms suggests that simple scale-free models do not provide an adequate description of real network data. 1

### Citations

2496 | Emergence of scaling in random networks - Barabási, Albert - 1999 |

1957 | Random Graphs
- Bollobás
- 2001
(Show Context)
Citation Context ...e networks, however, it is not necessary that the value of γ is restricted to values greater than 1. These powerlaws are in marked contrast to the degree distribution of the Erdös-Rényi random graphs =-=[7]-=- which is Poisson, Pr(k; λ) =e−λλk/k!. The study of random graphs is a rich field of research and many important properties can be evaluated analytically. Such Poisson random networks (PRN) are charac... |

1441 | Statistical mechanics of complex networks
- Albert, Barabási
(Show Context)
Citation Context ... of real network data. 1 Introduction Network structures which connect interacting particles such as proteins have long been recognised to be linked to the underlying dynamic or evolutionary processes=-=[13, 3]-=-. In particular the technological advances seen in molecular biology and genetics increasingly provide us with vast amounts of data about genomic, proteomic and metabolomic network structures [15, 22,... |

728 |
R.Model Selection and Multimodel Inference: A Practical Information-Theoretic Approach 2nd edn
- Burnham, Anderson
- 2002
(Show Context)
Citation Context ...tandard likelihood ratio test but have to employ a different information criterion to distinguish between models: here we use the Akaike-information criterion (AIC) to choose between different models =-=[2, 10]-=-. The AIC for a model Pr(k; θ) is defined by AIC = 2(−lk( ˆ θ)+d), (4) where ˆ θ is the maximum liklihood estimate of θ and d is the number of parameters required to define the model, i.e. the dimensi... |

423 | Random graphs with arbitrary degree distributions and their applications. Phys - Newman, Strogatz, et al. - 2001 |

319 | Evolution of networks
- Dorogovtsev, Mendes
- 2002
(Show Context)
Citation Context ... of real network data. 1 Introduction Network structures which connect interacting particles such as proteins have long been recognised to be linked to the underlying dynamic or evolutionary processes=-=[13, 3]-=-. In particular the technological advances seen in molecular biology and genetics increasingly provide us with vast amounts of data about genomic, proteomic and metabolomic network structures [15, 22,... |

247 | Specificity and stability in topology of protein networks
- Maslov, Sneppen
- 2002
(Show Context)
Citation Context ...enzymes and metabolites in the case of metabolic networks (MN), and proteins in the case of protein interaction networks (PIN) — interact can yield important insights into basic biological mechanisms =-=[16, 24, 1]-=-. For example the extent of ⋆ Corresponding author. C. Priami et al. (Eds.): Trans. on Comput. Syst. Biol. III, c○ Springer-Verlag Berlin Heidelberg 2005 LNBI 3737, pp. 65–77, 2005.66 M.P.H. Stumpf e... |

176 |
Toward a protein-protein interaction map of the budding yeast: A comprehensive system to examine two-hybrid interactions in all possible combinations between the yeast proteins
- Ito, Tashiro
(Show Context)
Citation Context ...s[13, 3]. In particular the technological advances seen in molecular biology and genetics increasingly provide us with vast amounts of data about genomic, proteomic and metabolomic network structures =-=[15, 22, 19]-=-. Understanding the way in which the different constituents of such networks, — genes and their protein products in the case of genome regulatory networks, enzymes and metabolites in the case of metab... |

143 | The yeast protein interaction network evolves rapidly and contains few redundant duplicate genes
- Wagner
- 2001
(Show Context)
Citation Context ...s[13, 3]. In particular the technological advances seen in molecular biology and genetics increasingly provide us with vast amounts of data about genomic, proteomic and metabolomic network structures =-=[15, 22, 19]-=-. Understanding the way in which the different constituents of such networks, — genes and their protein products in the case of genome regulatory networks, enzymes and metabolites in the case of metab... |

132 |
DIP: The Database of Interacting Proteins
- Xenarios, Rice, et al.
- 2000
(Show Context)
Citation Context ...ption of the degree distribution. 3.1 Analysis of PIN Data In table 2 we show the maximum composite likelihoods for the degree distributions calculated from PIN data collected in five model organisms =-=[23]-=- (the protein interaction data was taken from the DIP data-base; http://dip.doe-mbi. ucla.edu). We find that the standard scale-free model (or its finite size versions) never provides the best fit to ... |

87 | Subnets of scale-free networks are not scale-free: Sampling properties of networks - Stumpf - 2005 |

73 | Functional and topological characterization of protein interaction networks
- Yook, Oltvai, et al.
(Show Context)
Citation Context ...enzymes and metabolites in the case of metabolic networks (MN), and proteins in the case of protein interaction networks (PIN) — interact can yield important insights into basic biological mechanisms =-=[16, 24, 1]-=-. For example the extent of ⋆ Corresponding author. C. Priami et al. (Eds.): Trans. on Comput. Syst. Biol. III, c○ Springer-Verlag Berlin Heidelberg 2005 LNBI 3737, pp. 65–77, 2005.66 M.P.H. Stumpf e... |

61 |
2001 Infection dynamics on scalefree networks
- May, M, et al.
- 2004
(Show Context)
Citation Context ...t only is this wasteful in the sense that not all of the data is used but it may obfuscate real, especially finite-size, trends. The same will very likely hold true for other biological networks, too =-=[17]-=-. The approach used here, on the other hand, (i) uses all the data, and (ii) can be extended to assessing levels of confidence through combining a bootstrap procedure with the Akaike weights. What we ... |

57 |
Asymptotic theory of certain goodness-offit criteria based on stochastic processes
- Anderson, Darling
- 1952
(Show Context)
Citation Context ...rmation criteria we can also assess a model’s performance at describing the degree distribution using a range of other statistical measures. The Kolmogorov-Smirnoff (KS)[12] and Anderson-Darling (AD) =-=[4, 5]-=- goodness-of-fit statistics allow us to quantify the extent to which a theoretical or estimated model of the degree distribution describes the observed data. The former is a common and easily implemen... |

53 |
Information measures and model selection
- Akaike
- 1983
(Show Context)
Citation Context ...tandard likelihood ratio test but have to employ a different information criterion to distinguish between models: here we use the Akaike-information criterion (AIC) to choose between different models =-=[2, 10]-=-. The AIC for a model Pr(k; θ) is defined by AIC = 2(−lk( ˆ θ)+d), (4) where ˆ θ is the maximum liklihood estimate of θ and d is the number of parameters required to define the model, i.e. the dimensi... |

47 |
Statistical Models
- DAVISON
- 2003
(Show Context)
Citation Context ...f trial model provides the best description. Here we briefly introduce the basic statistical concepts employed later. These can be found in much greater detail in most modern statistics texts such as =-=[12]-=-. Tools for the analysis of other aspects of network data, e.g. cluster coefficients, path length or spectral properties of the adjacency matrix will also need to be developed in order to understand t... |

46 |
Inferring confidence sets of possibly misspecified gene trees
- Strimmer, Rambaut
- 2002
(Show Context)
Citation Context ...g models then the analysis has to be repeated. The Akaike weight formalism is very flexible and has been applied in a range of context including the assessment of confidence in phylogenetic inference =-=[20]-=-. In the next section we will apply this formalism to PIN data from five species and estimate the level of support for each of the models in table 1. 2.3 Goodness-of-Fit In addition to the AIC or simi... |

42 | A Note on Pseudolikelihood Constructed from Marginal Densities
- Cox, Reid
(Show Context)
Citation Context ...e− ln((k−θ)/m)2 /(2σ 2 ) (k−θ)σ √ 2π Stretched exponential for all k ≥ 0 0 for k<0 C exp(−αk/ ¯ k)k −γ for k>0 M4b M5 M668 M.P.H. Stumpf et al. the full likelihood is difficult to specify. Reference =-=[11]-=- provides an overview of composite likelihood methods. For a given functional form or model Pr(k; θ) of the degree distribution we can use maximum likelihood estimation applied to the composite likeli... |

25 |
Statistical ensemble of scale-free random graphs, Phys
- Burda, Correia, et al.
(Show Context)
Citation Context ...e now at a stage where simple models do not necessarily describe the data collected from complex processes to the extent that we would like them to. But as Burda, Diaz-Correia and Krzywicki point out =-=[9]-=-, even if a mechanistic model is not correct in detail, a corresponding statistical ensemble may nevertheless offer important insights. We believe that the statistical models employed here will also b... |

11 | Comparative analysis of the Saccharomyces cerevisiae and Caenorhabditis elegans protein interaction networks. BMC Evol Biol 5:23
- Agrafioti, Swire, et al.
- 2005
(Show Context)
Citation Context ...enzymes and metabolites in the case of metabolic networks (MN), and proteins in the case of protein interaction networks (PIN) — interact can yield important insights into basic biological mechanisms =-=[16, 24, 1]-=-. For example the extent of ⋆ Corresponding author. C. Priami et al. (Eds.): Trans. on Comput. Syst. Biol. III, c○ Springer-Verlag Berlin Heidelberg 2005 LNBI 3737, pp. 65–77, 2005.66 M.P.H. Stumpf e... |

6 |
O.: Mathematical results on scale-free graphs
- Bollobás, Riordan
- 2002
(Show Context)
Citation Context ...r θ (potentially vector-valued), and ˆ Pr(k) to denote the empirical degree distribution. Many studies of biological network data have suggested that the underlying networks show scale-free behaviour =-=[8]-=- and that their degree distributions follow a power-law, i.e. Pr(k; γ) =k −γ /ζ(γ) (1) where ζ(x) is Riemann’s zeta-functions which is defined for x>1 and diverges as x → 1 ↓; for finite networks, how... |

6 | Multifractal properties of growing networks - Dorogovtsev, Samukhin, et al. - 2002 |

2 |
Darling A test of goodness of fit J.Am.Stat.Assoc
- Anderson, A
- 1954
(Show Context)
Citation Context ...rmation criteria we can also assess a model’s performance at describing the degree distribution using a range of other statistical measures. The Kolmogorov-Smirnoff (KS)[12] and Anderson-Darling (AD) =-=[4, 5]-=- goodness-of-fit statistics allow us to quantify the extent to which a theoretical or estimated model of the degree distribution describes the observed data. The former is a common and easily implemen... |

2 |
Evolution of the yeast interaction network
- Qin, Lu, et al.
(Show Context)
Citation Context ...s[13, 3]. In particular the technological advances seen in molecular biology and genetics increasingly provide us with vast amounts of data about genomic, proteomic and metabolomic network structures =-=[15, 22, 19]-=-. Understanding the way in which the different constituents of such networks, — genes and their protein products in the case of genome regulatory networks, enzymes and metabolites in the case of metab... |