Results 1 - 10
of
57
Learning relational probability trees
- In Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
, 2003
"... Classification trees are widely used in the machine learning and data mining communities for modeling propositional data. Recent work has extended this basic paradigm to probability estimation trees. Traditional tree learning algorithms assume that instances in the training data are homogenous and i ..."
Abstract
-
Cited by 96 (24 self)
- Add to MetaCart
Classification trees are widely used in the machine learning and data mining communities for modeling propositional data. Recent work has extended this basic paradigm to probability estimation trees. Traditional tree learning algorithms assume that instances in the training data are homogenous and independently distributed. Relational probability trees (RPTs) extend standard probability estimation trees to a relational setting in which data instances are heterogeneous and interdependent. Our algorithm for learning the structure and parameters of an RPT searches over a space of relational features that use aggregation functions (e.g. AVERAGE, MODE, COUNT) to dynamically propositionalize relational data and create binary splits within the RPT. Previous work has identified a number of statistical biases due to characteristics of relational data such as autocorrelation and degree disparity. The RPT algorithm uses a novel form of randomization test to adjust for these biases. On a variety of relational learning tasks, RPTs built using randomization tests are significantly smaller than other models and achieve equivalent, or better, performance. 1.
Nonparametric Permutation Tests for Functional Neuroimaging: A Primer with Examples. Human Brain Mapping
, 2001
"... The statistical analyses of functional mapping experiments usually proceeds at the voxel level, involving the formation and assessment of a statistic image: at each voxel a statistic indicating evidence of the experimental effect of interest, at that voxel, is computed, giving an image of statistics ..."
Abstract
-
Cited by 73 (6 self)
- Add to MetaCart
The statistical analyses of functional mapping experiments usually proceeds at the voxel level, involving the formation and assessment of a statistic image: at each voxel a statistic indicating evidence of the experimental effect of interest, at that voxel, is computed, giving an image of statistics, a statistic
The role of Occam’s Razor in knowledge discovery
- Data Mining and Knowledge Discovery
, 1999
"... Abstract. Many KDD systems incorporate an implicit or explicit preference for simpler models, but this use of “Occam’s razor ” has been strongly criticized by several authors (e.g., Schaffer, 1993; Webb, 1996). This controversy arises partly because Occam’s razor has been interpreted in two quite di ..."
Abstract
-
Cited by 70 (1 self)
- Add to MetaCart
Abstract. Many KDD systems incorporate an implicit or explicit preference for simpler models, but this use of “Occam’s razor ” has been strongly criticized by several authors (e.g., Schaffer, 1993; Webb, 1996). This controversy arises partly because Occam’s razor has been interpreted in two quite different ways. The first interpretation (simplicity is a goal in itself) is essentially correct, but is at heart a preference for more comprehensible models. The second interpretation (simplicity leads to greater accuracy) is much more problematic. A critical review of the theoretical arguments for and against it shows that it is unfounded as a universal principle, and demonstrably false. A review of empirical evidence shows that it also fails as a practical heuristic. This article argues that its continued use in KDD risks causing significant opportunities to be missed, and should therefore be restricted to the comparatively few applications where it is appropriate. The article proposes and reviews the use of domain constraints as an alternative for avoiding overfitting, and examines possible methods for handling the accuracy–comprehensibility trade-off.
Spatial Pattern Analysis of Functional Brain Images Using Partial Least Squares
- Neuroimage
, 1996
"... This paper introduces a new tool for functional neuroimage analysis: partial least squares (PLS). It is unique as a multivariate method in its choice of emphasis for analysis, that being the covariance between brain images and exogenous blocks representing either the experiment design or some behavi ..."
Abstract
-
Cited by 46 (3 self)
- Add to MetaCart
This paper introduces a new tool for functional neuroimage analysis: partial least squares (PLS). It is unique as a multivariate method in its choice of emphasis for analysis, that being the covariance between brain images and exogenous blocks representing either the experiment design or some behavioral measure. Whatemerges are spatial patterns of brain activity that represent the optimal association between the images and either of the blocks. This process differs substantially from other multivariate methods in that rather than attempting to predict the individual values of the image pixels, PLS attempts to explain the relation between image pixels and task or behavior. Data from a face encoding and recognition PET rCBF study are used to illustrate two types of PLS analysis: an activation analysis of task with images and a brain-- behavior analysis. The commonalities across the two analyses are suggestive of a general face memory network differentially engaged during encoding and recognition. PLS thus serves as an important extension by extracting new information from imaging data that is not accessible through other currently used univariate and multivariate image analysis tools. r 1996 Academic Press, Inc
Large Datasets Lead to Overly Complex Models: An Explanation and a Solution
, 1998
"... This paper explores unexpected results that lie at the intersection of two common themes in the KDD community: large datasets and the goal of building compact models. Experiments with many different datasets and several model construction algorithms (including tree learning algorithms suchasc4. ..."
Abstract
-
Cited by 40 (4 self)
- Add to MetaCart
This paper explores unexpected results that lie at the intersection of two common themes in the KDD community: large datasets and the goal of building compact models. Experiments with many different datasets and several model construction algorithms (including tree learning algorithms suchasc4.5 with three different pruning methods, and rule learning algorithms such as c4.5rules and ripper) show that increasing the amount of data used to build a model often results in a linear increase in model size, even when that additional complexity results in no significantincrease in model accuracy. Despite the promise of better parameter estimation held out by large datasets, as a practical matter, models built with large amounts of data are often needlessly complex and cumbersome. In the case of decision trees, the cause of this pathology is identified as a bias inherentinseveral common pruning techniques. Pruning errors made low in the tree, where there is insufficient data to make accurate parameter estimates, are propagated and magnified higher in the tree, working against the accurate parameter estimates that are made possible there by abundant data. We propose a general solution to this problem based on a statistical technique known as randomization testing, and empirically evaluate its utility.
A Relational View of Information Seeking and Learning in Social Networks
, 2003
"... Research in organizational learning has demonstrated processes and occasionally performance implications of acquisition of declarative (know-what) and procedural (know-how) knowledge. However, considerably less attention has been paid to learned characteristics of relationships that affect the decis ..."
Abstract
-
Cited by 27 (1 self)
- Add to MetaCart
Research in organizational learning has demonstrated processes and occasionally performance implications of acquisition of declarative (know-what) and procedural (know-how) knowledge. However, considerably less attention has been paid to learned characteristics of relationships that affect the decision to seek information from other people. Based on a review of the social network, information processing, and organizational learning literatures, along with the results of a previous qualitative study, we propose a formal model of information seeking in which the probability of seeking information from another person is a function of (1) knowing what that person knows; (2) valuing what that person knows; (3) being able to gain timely access to that person’s thinking; and (4) perceiving that seeking information from that person would not be too costly. We also hypothesize that the knowing, access, and cost variables mediate the relationship between physical proximity and information seeking. The model is tested using two separate research sites to provide replication. The results indicate strong support for the model and the mediation hypothesis (with the exception of the cost variable). Implications are drawn for the study of both transactive memory and organizational learning, as well as for management practice.
A Comparison of Statistical Significance Tests for Information Retrieval Evaluation
, 2007
"... Information retrieval (IR) researchers commonly use three tests of statistical significance: the Student’s paired t-test, the Wilcoxon signed rank test, and the sign test. Other researchers have previously proposed using both the bootstrap and Fisher’s randomization (permutation) test as nonparametr ..."
Abstract
-
Cited by 24 (1 self)
- Add to MetaCart
Information retrieval (IR) researchers commonly use three tests of statistical significance: the Student’s paired t-test, the Wilcoxon signed rank test, and the sign test. Other researchers have previously proposed using both the bootstrap and Fisher’s randomization (permutation) test as nonparametric significance tests for IR but these tests have seen little use. For each of these five tests, we took the ad-hoc retrieval runs submitted to TRECs 3 and 5-8, and for each pair of runs, we measured the statistical significance of the difference in their mean average precision. We discovered that there is little practical difference between the randomization, bootstrap, and t tests. Both the Wilcoxon and sign test have a poor ability to detect significance and have the potential to lead to false detections of significance. The Wilcoxon and sign tests are simplified variants of the randomization test and their use should be discontinued for measuring the significance of a difference between means.
Serial and Strategic Effects in Reading Aloud
"... Coltheart and Rastle (1994) reported that the size of the regularity effect on word naming latency decreases across position of irregularity, implicating a serial process in reading aloud. In response to criticism by Plaut, McClelland, Seidenberg, and Patterson (1996), we replicate these results her ..."
Abstract
-
Cited by 20 (3 self)
- Add to MetaCart
Coltheart and Rastle (1994) reported that the size of the regularity effect on word naming latency decreases across position of irregularity, implicating a serial process in reading aloud. In response to criticism by Plaut, McClelland, Seidenberg, and Patterson (1996), we replicate these results here using monosyllabic words which have been controlled for consistency at each of five orthographic segments. A successful simulation of these data by the DRC model (Coltheart, Curtis, Atkins, & Haller, 1993) is presented. These findings were used in a second experiment to produce a strategy effect in reading aloud. Subjects named nonword or regular word targets mixed with either first position irregular fillers or third position irregular fillers. Target naming was slowed when first position irregular fillers were present compared with target naming when third position irregular fillers were present. These data suggest that the use of the nonlexical route is not fixed; subjects can slow its ...
Framework for the statistical shape analysis of brain structures using spharm-pdm
- In Insight Journal, Special Edition on the Open Science Workshop at MICCAI
, 2006
"... Abstract — Shape analysis has become of increasing interest to the neuroimaging community due to its potential to precisely locate morphological changes between healthy and pathological structures. This manuscript presents a comprehensive set of tools for the computation of 3D structural statistical ..."
Abstract
-
Cited by 19 (3 self)
- Add to MetaCart
Abstract — Shape analysis has become of increasing interest to the neuroimaging community due to its potential to precisely locate morphological changes between healthy and pathological structures. This manuscript presents a comprehensive set of tools for the computation of 3D structural statistical shape analysis. It has been applied in several studies on brain morphometry, but can potentially be employed in other 3D shape problems. Its main limitations is the necessity of spherical topology. The input of the proposed shape analysis is a set of binary segmentation of a single brain structure, such as the hippocampus or caudate. These segmentations are converted into a corresponding spherical harmonic description (SPHARM), which is then sampled into a triangulated surfaces (SPHARM-PDM). After alignment, differences between groups of surfaces are computed using the Hotelling T 2 two sample metric. Statistical p-values, both raw and corrected for multiple comparisons, result in significance maps. Additional visualization of the group tests are provided via mean difference magnitude and vector maps, as well as maps of the group covariance information. The correction for multiple comparisons is performed via two separate methods that each have a distinct view of the problem. The first one aims to control the family-wise error rate (FWER) or false-positives via the extrema histogram of non-parametric permutations. The second method controls the false discovery rate and results in a less conservative estimate of the false-negatives. I.
Avoiding bias when aggregating relational data with degree disparity
- In Proceedings of the 20th International Conference on Machine Learning
, 2003
"... A common characteristic of relational data sets —degree disparity—can lead relational learning algorithms to discover misleading correlations. Degree disparity occurs when the frequency of a relation is correlated with the values of the target variable. In such cases, aggregation functions used by m ..."
Abstract
-
Cited by 19 (11 self)
- Add to MetaCart
A common characteristic of relational data sets —degree disparity—can lead relational learning algorithms to discover misleading correlations. Degree disparity occurs when the frequency of a relation is correlated with the values of the target variable. In such cases, aggregation functions used by many relational learning algorithms will result in misleading correlations and added complexity in models. We examine this problem through a combination of simulations and experiments. We show how two novel hypothesis testing procedures can adjust for the effects of using aggregation functions in the presence of degree disparity. 1.

