Results 1 - 10
of
12
Model selection and accounting for model uncertainty in graphical models using Occam's window
, 1993
"... We consider the problem of model selection and accounting for model uncertainty in high-dimensional contingency tables, motivated by expert system applications. The approach most used currently is a stepwise strategy guided by tests based on approximate asymptotic P-values leading to the selection o ..."
Abstract
-
Cited by 215 (42 self)
- Add to MetaCart
We consider the problem of model selection and accounting for model uncertainty in high-dimensional contingency tables, motivated by expert system applications. The approach most used currently is a stepwise strategy guided by tests based on approximate asymptotic P-values leading to the selection of a single model; inference is then conditional on the selected model. The sampling properties of such a strategy are complex, and the failure to take account of model uncertainty leads to underestimation of uncertainty about quantities of interest. In principle, a panacea is provided by the standard Bayesian formalism which averages the posterior distributions of the quantity of interest under each of the models, weighted by their posterior model probabilities. Furthermore, this approach is optimal in the sense of maximising predictive ability. However, this has not been used in practice because computing the posterior model probabilities is hard and the number of models is very large (often greater than 1011). We argue that the standard Bayesian formalism is unsatisfactory and we propose an alternative Bayesian approach that, we contend, takes full account of the true model uncertainty byaveraging overamuch smaller set of models. An efficient search algorithm is developed for nding these models. We consider two classes of graphical models that arise in expert systems: the recursive causal models and the decomposable
A decision theoretic framework for approximating concepts
- International Journal of Man-machine Studies
, 1992
"... This paper explores the implications of approximating a concept based on the Bayesian decision procedure, which provides a plausible unification of the fuzzy set and rough set approaches for approximating a concept. We show that if a given concept is approximated by one set, the same result given by ..."
Abstract
-
Cited by 27 (13 self)
- Add to MetaCart
This paper explores the implications of approximating a concept based on the Bayesian decision procedure, which provides a plausible unification of the fuzzy set and rough set approaches for approximating a concept. We show that if a given concept is approximated by one set, the same result given by the α-cut in the fuzzy set theory is obtained. On the other hand, if a given concept is approximated by two sets, we can derive both the algebraic and probabilistic rough set approximations. Moreover, based on the well known principle of maximum (minimum) entropy, we give a useful interpretation of fuzzy intersection and union. Our results enhance the understanding and broaden the applications of both fuzzy and rough sets. 1.
SIMULTANEOUSLY SEGMENTING MULTIPLE GENE EXPRESSION TIME COURSES BY ANALYZING CLUSTER DYNAMICS
"... We present a new approach to segmenting multiple time series by analyzing the dynamics of cluster formation and re-arrangement around putative segment boundaries. This approach finds application in distilling large numbers of gene expression profiles into temporal relationships underlying biological ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
We present a new approach to segmenting multiple time series by analyzing the dynamics of cluster formation and re-arrangement around putative segment boundaries. This approach finds application in distilling large numbers of gene expression profiles into temporal relationships underlying biological processes. By directly minimizing information-theoretic measures of segmentation quality derived from Kullback-Leibler (KL) divergences, our formulation reveals clusters of genes along with a segmentation such that clusters show concerted behavior within segments but exhibit significant re-grouping across segmentation boundaries. The results of the segmentation algorithm can be summarized as Gantt charts revealing temporal dependencies in the ordering of key biological processes. Applications to the yeast metabolic cycle and the yeast cell cycle are described. 1.
Information and Posterior Probability Criteria for Model Selection in Local Likelihood Estimation
- J Amer. Stat. Ass
, 1998
"... this paper we propose a modification to the methods used to motivate many information and posterior probability criteria for the weighted likelihood case. We derive weighted versions for two of the most widely known criteria, namely the AIC and BIC. Via a simple modification, the criteria are also m ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
this paper we propose a modification to the methods used to motivate many information and posterior probability criteria for the weighted likelihood case. We derive weighted versions for two of the most widely known criteria, namely the AIC and BIC. Via a simple modification, the criteria are also made useful for window span selection. The usefulness of the weighted version of these criteria are demonstrated through a simulation study and an application to three data sets. KEY WORDS: Information Criteria; Posterior Probability Criteria; Model Selection; Local Likelihood. 1. INTRODUCTION Local regression has become a popular method for smoothing scatterplots and for nonparametric regression in general. It has proven to be a useful tool in finding structure in datasets (Cleveland and Devlin 1988). Local regression estimation is a method for smoothing scatterplots (x i ; y i ), i = 1; : : : ; n in which the fitted value at x 0 is the value of a polynomial fit to the data using weighted least squares where the weight given to (x i ; y i ) is related to the distance between x i and x 0 . Stone (1977) shows that estimates obtained using the local regression methods have desirable theoretical properties. Recently, Fan (1993) has studied minimax properties of local linear regression. Tibshirani and Hastie (1987) extend the ideas of local regression to a local likelihood procedure. This procedure is designed for nonparametric regression modeling in situations where weighted least squares is inappropriate as an estimation method, for example binary data. Local regression may be viewed as a special case of local likelihood estimation. Tibshirani and Hastie (1987), Staniswalis (1989), and Loader (1999) apply local likelihood estimation to several types of data where local regressio...
An Information Theoretic Framework for Exploratory Multivariate Market Segmentation Research," Decision Sciences
, 1991
"... State-of-the-art market segmentation often involves simultaneous consideration of multiple and overlapping variables. These variables are studied to assess their relationships, select a subset of variables which best represent the subgroups (segments) within a market, and determine the likelihood of ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
State-of-the-art market segmentation often involves simultaneous consideration of multiple and overlapping variables. These variables are studied to assess their relationships, select a subset of variables which best represent the subgroups (segments) within a market, and determine the likelihood of membership of a given individual in a particular segment. Such information, obtained in the exploratory phase of a multivariate market segmentation study, leads to the construction of more parsimonious models. These models have less stringent data requirements while facilitating substantive evaluation to aid marketing managers in formulating more effective targeting and positioning strategies within different market segments. This paper utilizes the information-theoretic (IT) approach to address several issues in multivariate market segmentation studies. A marketing data set analyzed previously is employed to examine the suitability and usefulness of the proposed approach [12]. Some useful extensions of the IT framework and its applications are also discussed.
unknown title
, 1994
"... This document was created with FrameMaker 4.0.4 Adaptive Statistical Language Modelling 6 ..."
Abstract
- Add to MetaCart
This document was created with FrameMaker 4.0.4 Adaptive Statistical Language Modelling 6
Adaptive Statistical Language Modelling
, 1994
"... The trigram statistical language model is remarkably successful when used in such applications as speech recognition. However, the trigram model is static in that it only considers the previous two words when making a prediction about a future word. The work presented here attempts to improve upon t ..."
Abstract
- Add to MetaCart
The trigram statistical language model is remarkably successful when used in such applications as speech recognition. However, the trigram model is static in that it only considers the previous two words when making a prediction about a future word. The work presented here attempts to improve upon the trigram model by considering additional contextual and longer distance information. This is frequently referred to in the literature as adaptive statistical language modelling because the model is thought of as adapting to the longer term information. This work considers the creation of topic specific models, statistical evidence from the presence or absence of triggers, or related words, in the document history (document triggers) and in the current sentence (in-sentence triggers), and the incorporation of the document cache, which predicts the probability of a word by considering its frequency in the document history. An important result of this work is that the presence of self-triggers, that is, whether or not the word itself occurred in the document history, is an extremely important piece of information. A maximum entropy (ME) approach will be used in many instances to incorporate information from different sources. Maximum entropy considers a model which maximizes entropy while satisfying the constraints presented by the information we wish to incorporate. The generalized iterative scaling (GIS) algorithm can be used to compute the maximum entropy solution. This work also considers various methods of smoothing the information in a maximum entropy model. An inportant result is that smoothing improves performance noticibly and that Good-Turing discounting is an effective method of smoothing. Thesis Supervisor: Victor Zue Title: Principal Research Scientist, Departme...
1 Simultaneously Segmenting Multiple Gene Expression Time Courses by Analyzing Cluster Dynamics
"... www.nyu.edu We present a new approach to segmenting multiple time series by analyzing the dynamics of cluster rearrangement around putative segment boundaries. By directly minimizing information-theoretic measures of segmentation quality derived from Kullback-Leibler (KL) divergences, our formulatio ..."
Abstract
- Add to MetaCart
www.nyu.edu We present a new approach to segmenting multiple time series by analyzing the dynamics of cluster rearrangement around putative segment boundaries. By directly minimizing information-theoretic measures of segmentation quality derived from Kullback-Leibler (KL) divergences, our formulation reveals clusters of genes along with a segmentation such that clusters show concerted behavior within segments but exhibit significant regrouping across segmentation boundaries. This approach finds application in distilling large numbers of gene expression profiles into temporal relationships underlying biological processes. The results of the segmentation algorithm can be summarized as Gantt charts revealing temporal dependencies in the ordering of key biological processes. Applications to the yeast metabolic cycle and the yeast cell cycle are described. Keywords: Time series segmentation, clustering, KL-divergence, temporal regulation. 1.
An Empirical Study of Qualities of Association Rules from a Statistical View Point
"... Abstract: Minimum support and confidence have been used as criteria for generating association rules in all association rule mining algorithms. These criteria have their natural appeals, such as simplicity; few researchers have suspected the quality of generated rules. In this paper, we examine the ..."
Abstract
- Add to MetaCart
Abstract: Minimum support and confidence have been used as criteria for generating association rules in all association rule mining algorithms. These criteria have their natural appeals, such as simplicity; few researchers have suspected the quality of generated rules. In this paper, we examine the rules from a more rigorous point of view by conducting statistical tests. Specifically, we use contingency tables and chi-square test to analyze the data. Experimental results show that one third of the association rules derived based on the support and confidence criteria are not significant, that is, the antecedent and consequent of the rules are not correlated. It indicates that minimum support and minimum confidence do not provide adequate discovery of meaningful associations. The chi-square test can be considered as an enhancement or an alternative solution.
Examining Committee:
, 2009
"... Data mining techniques, such as clustering, have become a mainstay in many applications such as bioinformatics, geographic information systems, and marketing. Over the last decade, due to new demands posed by these applications, clustering techniques have been significantly adapted and extended. One ..."
Abstract
- Add to MetaCart
Data mining techniques, such as clustering, have become a mainstay in many applications such as bioinformatics, geographic information systems, and marketing. Over the last decade, due to new demands posed by these applications, clustering techniques have been significantly adapted and extended. One such extension is the idea of finding clusters in a dataset that preserve information about some auxiliary variable. These approaches tend to guide the clustering algorithms that are traditionally unsupervised learning techniques with the background knowledge of the auxiliary variable. The auxiliary information could be some prior class label attached to the data samples or it could be the relations between data samples across different datasets. In this dissertation, we consider the latter problem of simultaneously clustering several vector valued datasets by taking into account the relationships between the data samples. We formulate objective functions that can be used to find clusters that are local in each individual dataset and at the same time maximally similar or dissimilar with respect to clusters across datasets. We introduce diverse applications of these clustering algorithms: (1) time series segmentation (2) reconstructing temporal models from time series segmentations (3) simultaneously

