Results 1  10
of
33
Structured statistical models of inductive reasoning
"... Everyday inductive inferences are often guided by rich background knowledge. Formal models of induction should aim to incorporate this knowledge, and should explain how different kinds of knowledge lead to the distinctive patterns of reasoning found in different inductive contexts. We present a Baye ..."
Abstract

Cited by 65 (11 self)
 Add to MetaCart
(Show Context)
Everyday inductive inferences are often guided by rich background knowledge. Formal models of induction should aim to incorporate this knowledge, and should explain how different kinds of knowledge lead to the distinctive patterns of reasoning found in different inductive contexts. We present a Bayesian framework that attempts to meet both goals and describe four applications of the framework: a taxonomic model, a spatial model, a threshold model, and a causal model. Each model makes probabilistic inferences about the extensions of novel properties, but the priors for the four models are defined over different kinds of structures that capture different relationships between the categories in a domain. Our framework therefore shows how statistical inference can operate over structured background knowledge, and we argue that this interaction between structure and statistics is critical for explaining the power and flexibility of human reasoning.
A Statistical Test For HostParasite Coevolution
 SYST. BIOL.
, 2002
"... A new method, ParaFit, has been developed to test the significance of a global hypothesis of coevolution between parasites and their hosts. Individual hostparasite association links can also be tested. The test statistics are functions of the host and parasite phylogenetic trees and of the set of ..."
Abstract

Cited by 48 (1 self)
 Add to MetaCart
A new method, ParaFit, has been developed to test the significance of a global hypothesis of coevolution between parasites and their hosts. Individual hostparasite association links can also be tested. The test statistics are functions of the host and parasite phylogenetic trees and of the set of hostparasite association links. Numerical simulations are used to show that the method has correct rate of type I error and good power except under extreme error conditions. An application to real data (pocket gophers and chewing lice) is presented. [Coevolution; fourthcorner statistic; hostparasite; permutation test; phylogenetic analysis; power analysis; simulations; statistical test.]
Concerning the NJ algorithm and its unweighted version, UNJ
, 1997
"... In this paper we will present UNJ, an unweighted version of the NJ algorithm (Saitou and Nei 1987; Studier and Keppler 1988). We will demonstrate that UNJ is well suited when the data are of the ( ) ( ) d e ij ij ij d = + type, where ( ) d ij is a tree distance, and when the e ij are independent ..."
Abstract

Cited by 35 (8 self)
 Add to MetaCart
In this paper we will present UNJ, an unweighted version of the NJ algorithm (Saitou and Nei 1987; Studier and Keppler 1988). We will demonstrate that UNJ is well suited when the data are of the ( ) ( ) d e ij ij ij d = + type, where ( ) d ij is a tree distance, and when the e ij are independent and identically distributed noise variables. Simulations confirm this theory. On a more general level, we will study the three main components of the agglomerative approach, applied to the reconstruction of tree distances. (i) We will
Reconstruction Of Biogeographic and Evolutionary Networks Using Reticulograms
 SYST. BIOL.
, 2002
"... A reticulogram is a general network capable of representing a reticulate evolutionary structure. ..."
Abstract

Cited by 26 (3 self)
 Add to MetaCart
A reticulogram is a general network capable of representing a reticulate evolutionary structure.
Computational tools for evaluating phylogenetic and hierarchical clustering trees
 Journal of Computational and Graphical Statistics
"... ABSTRACT. Inferential summaries of tree estimates are useful in the setting of evolutionary biology, where phylogenetic trees have been built from DNA data since the 1960’s. In bioinformatics, psychometrics and data mining, hierarchical clustering techniques output the same mathematical objects, and ..."
Abstract

Cited by 11 (0 self)
 Add to MetaCart
ABSTRACT. Inferential summaries of tree estimates are useful in the setting of evolutionary biology, where phylogenetic trees have been built from DNA data since the 1960’s. In bioinformatics, psychometrics and data mining, hierarchical clustering techniques output the same mathematical objects, and practitioners have similar questions about the stability and ‘generalizability ’ of these summaries. This paper provides an implementation of the geometric distance between trees developed by Billera, Holmes, and Vogtmann (2001) equally applicable to phylogenetic trees and heirarchical clustering trees, and shows some of the applications in statistical inference for which this distance can be useful. In particular, since Billera et al. (2001) have shown that the space of trees is negatively curved (a CAT(0) space), a natural representation of a collection of trees is a tree. We compare this representation to the Euclidean approximations of treespace made available through Multidimensional Scaling of the matrix of distances between trees. We also provide applications of the distances between trees to hierarchical clustering trees constructed from microarrays. Our method gives a new way of evaluating the influence both of certain columns (positions, variables or genes) and of certain rows (whether species, observations or arrays). 1. CURRENT PRACTICES IN ESTIMATION AND STABILITY OF HIERARCHICAL TREES Trees are often used as a parameter in phylogenetic studies and for data description in hierarchical
Optimal Variable Weighting for Ultrametric and Additive Trees and Kmeans Partitioning: Methods and Software
 JOURNAL OF CLASSIFICATION
"... De Soete (1986, 1988) proposed some years ago a method for optimal variable weighting for ultrametric and additive tree fitting. This paper extends De Soete's method to optimal variable weighting for Kmeans partitioning. We also describe some new features and improvements to the algorithm prop ..."
Abstract

Cited by 11 (0 self)
 Add to MetaCart
De Soete (1986, 1988) proposed some years ago a method for optimal variable weighting for ultrametric and additive tree fitting. This paper extends De Soete's method to optimal variable weighting for Kmeans partitioning. We also describe some new features and improvements to the algorithm proposed by De Soete. Monte Carlo simulations have been conducted using different error conditions. In all cases (i.e., ultrametric or additive trees, or K means partitioning), the simulation results indicate that the optimal weighting procedure should be used for analyzing data containing noisy variables that do not contribute relevant information to the classification structure. However, if the data involve errorperturbed variables that are relevant to the classification or outliers, it seems better to cluster or partition the entities by using variables with equal weights. A new computer program, OVW, which is available to researchers as freeware, implements improved algorithms for optimal variable weighting for ultrametric and additive tree clustering, and includes a new algorithm for optimal variable weighting for Kmeans partitioning.
Can we have confidence in a tree representation
 Proceedings of JOBIM'2000, Lecture Notes in Computer Science, O.Gascuel and M.F. Sagot (Eds.), 2001
, 2002
"... Abstract. A tree representation distance method, applied to any dissimilarity array, always gives a valued tree, even if the tree model is not appropriate. In the first part, we propose some criteria to evaluate the quality of the computed tree. Some of them are metric; their values depend on the ed ..."
Abstract

Cited by 9 (2 self)
 Add to MetaCart
Abstract. A tree representation distance method, applied to any dissimilarity array, always gives a valued tree, even if the tree model is not appropriate. In the first part, we propose some criteria to evaluate the quality of the computed tree. Some of them are metric; their values depend on the edge’s lengths. The other ones only depend on the tree topology. In the second part, we calculate the average and the critical values of these criteria, according to parameters. Three models of distance are tested using simulations. On the one hand, the tree model, and on the other hand, euclidean distances, and boolean distances. In each case, we select at random distances fitting these models and add some noise. We show that the criteria values permit one to differentiate the tree model from the others. Finally, we analyze a distance between proteins and its tree representation that is valid according to the criteria values. 1
Structure in Document Browsing Spaces
, 1996
"... This study proposes and evaluates a document analysis strategy for information retrieval with visualization interfaces. The goal of document analysis is to highlight structure that helps searchers make their own relevance judgments, rather than to shift judgments from humans onto machines. Searcher ..."
Abstract

Cited by 7 (1 self)
 Add to MetaCart
This study proposes and evaluates a document analysis strategy for information retrieval with visualization interfaces. The goal of document analysis is to highlight structure that helps searchers make their own relevance judgments, rather than to shift judgments from humans onto machines. Searchers can investigate that structure with tools for visualizing multidimensional data. The structure of interest in this study is discrimination of documents into clusters. Two diagnostic measures may inform selection of document attributes for cluster discrimination: term discrimination value and the sum of pairwise termvector correlations. A series of experiments tests the reliability of these measures for predicting clustering tendency, as measured by proportion of elongated triples and skewness of the distribution of document dissimilarities.
AdditiveTree Representations
"... this paper should be adressed to: Herv'e Abdi, The University of Texas at Dallas, Program in Cognition, ms:gr.4.1., Richardson, TX750830688, USA. email: herve@utdallas.edu. The author wishes to thank Sue Viscuso and Alice O'Toole for help and comments on previous drafts. Ref: Abdi, H. (1 ..."
Abstract

Cited by 6 (1 self)
 Add to MetaCart
this paper should be adressed to: Herv'e Abdi, The University of Texas at Dallas, Program in Cognition, ms:gr.4.1., Richardson, TX750830688, USA. email: herve@utdallas.edu. The author wishes to thank Sue Viscuso and Alice O'Toole for help and comments on previous drafts. Ref: Abdi, H. (1990). Additivetree representations. Lecture Notes in Biomathematics, 84, 4359.
20 PEOPLE’S INTUITIONS ABOUT RANDOMNESS AND PROBABILITY: AN EMPIRICAL STUDY 4
"... What people mean by randomness should be taken into account when teaching statistical inference. This experiment explored subjective beliefs about randomness and probability through two successive tasks. Subjects were asked to categorize 16 familiar items: 8 real items from everyday life experiences ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
What people mean by randomness should be taken into account when teaching statistical inference. This experiment explored subjective beliefs about randomness and probability through two successive tasks. Subjects were asked to categorize 16 familiar items: 8 real items from everyday life experiences, and 8 stochastic items involving a repeatable process. Three groups of subjects differing according to their background knowledge of probability theory were compared. An important finding is that the arguments used to judge if an event is random and those to judge if it is not random appear to be of different natures. While the concept of probability has been introduced to formalize randomness, a majority of individuals appeared to consider probability as a primary concept.