Model selection and accounting for model uncertainty in graphical models using Occam's window
, 1993
"... We consider the problem of model selection and accounting for model uncertainty in highdimensional contingency tables, motivated by expert system applications. The approach most used currently is a stepwise strategy guided by tests based on approximate asymptotic Pvalues leading to the selection o ..."
We consider the problem of model selection and accounting for model uncertainty in highdimensional contingency tables, motivated by expert system applications. The approach most used currently is a stepwise strategy guided by tests based on approximate asymptotic Pvalues leading to the selection of a single model; inference is then conditional on the selected model. The sampling properties of such a strategy are complex, and the failure to take account of model uncertainty leads to underestimation of uncertainty about quantities of interest. In principle, a panacea is provided by the standard Bayesian formalism which averages the posterior distributions of the quantity of interest under each of the models, weighted by their posterior model probabilities. Furthermore, this approach is optimal in the sense of maximising predictive ability. However, this has not been used in practice because computing the posterior model probabilities is hard and the number of models is very large (often greater than 1011). We argue that the standard Bayesian formalism is unsatisfactory and we propose an alternative Bayesian approach that, we contend, takes full account of the true model uncertainty byaveraging overamuch smaller set of models. An efficient search algorithm is developed for nding these models. We consider two classes of graphical models that arise in expert systems: the recursive causal models and the decomposable
A decision theoretic framework for approximating concepts
 International Journal of Manmachine Studies
, 1992
"... This paper explores the implications of approximating a concept based on the Bayesian decision procedure, which provides a plausible unification of the fuzzy set and rough set approaches for approximating a concept. We show that if a given concept is approximated by one set, the same result given by ..."
This paper explores the implications of approximating a concept based on the Bayesian decision procedure, which provides a plausible unification of the fuzzy set and rough set approaches for approximating a concept. We show that if a given concept is approximated by one set, the same result given by the αcut in the fuzzy set theory is obtained. On the other hand, if a given concept is approximated by two sets, we can derive both the algebraic and probabilistic rough set approximations. Moreover, based on the well known principle of maximum (minimum) entropy, we give a useful interpretation of fuzzy intersection and union. Our results enhance the understanding and broaden the applications of both fuzzy and rough sets. 1.
Unifying Dependent Clustering and Disparate Clustering for Nonhomogeneous Data
"... Modern data mining settings involve a combination of attributevalued descriptors over entities as well as specified relationships between these entities. We present an approach to cluster such nonhomogeneous datasets by using the relationships to impose either dependent clustering or disparate clus ..."
Modern data mining settings involve a combination of attributevalued descriptors over entities as well as specified relationships between these entities. We present an approach to cluster such nonhomogeneous datasets by using the relationships to impose either dependent clustering or disparate clustering constraints. Unlike prior work that views constraints as boolean criteria, we present a formulation that allows constraints to be satisfied or violated in a smooth manner. This enables us to achieve dependent clustering and disparate clustering using the same optimization framework by merely maximizing versus minimizing the objective function. We present results on both synthetic data as well as several realworld datasets.
SIMULTANEOUSLY SEGMENTING MULTIPLE GENE EXPRESSION TIME COURSES BY ANALYZING CLUSTER DYNAMICS
"... We present a new approach to segmenting multiple time series by analyzing the dynamics of cluster formation and rearrangement around putative segment boundaries. This approach finds application in distilling large numbers of gene expression profiles into temporal relationships underlying biological ..."
We present a new approach to segmenting multiple time series by analyzing the dynamics of cluster formation and rearrangement around putative segment boundaries. This approach finds application in distilling large numbers of gene expression profiles into temporal relationships underlying biological processes. By directly minimizing informationtheoretic measures of segmentation quality derived from KullbackLeibler (KL) divergences, our formulation reveals clusters of genes along with a segmentation such that clusters show concerted behavior within segments but exhibit significant regrouping across segmentation boundaries. The results of the segmentation algorithm can be summarized as Gantt charts revealing temporal dependencies in the ordering of key biological processes. Applications to the yeast metabolic cycle and the yeast cell cycle are described. 1.
Information and Posterior Probability Criteria for Model Selection in Local Likelihood Estimation
 J Amer. Stat. Ass
, 1998
"... this paper we propose a modification to the methods used to motivate many information and posterior probability criteria for the weighted likelihood case. We derive weighted versions for two of the most widely known criteria, namely the AIC and BIC. Via a simple modification, the criteria are also m ..."
this paper we propose a modification to the methods used to motivate many information and posterior probability criteria for the weighted likelihood case. We derive weighted versions for two of the most widely known criteria, namely the AIC and BIC. Via a simple modification, the criteria are also made useful for window span selection. The usefulness of the weighted version of these criteria are demonstrated through a simulation study and an application to three data sets. KEY WORDS: Information Criteria; Posterior Probability Criteria; Model Selection; Local Likelihood. 1. INTRODUCTION Local regression has become a popular method for smoothing scatterplots and for nonparametric regression in general. It has proven to be a useful tool in finding structure in datasets (Cleveland and Devlin 1988). Local regression estimation is a method for smoothing scatterplots (x i ; y i ), i = 1; : : : ; n in which the fitted value at x 0 is the value of a polynomial fit to the data using weighted least squares where the weight given to (x i ; y i ) is related to the distance between x i and x 0 . Stone (1977) shows that estimates obtained using the local regression methods have desirable theoretical properties. Recently, Fan (1993) has studied minimax properties of local linear regression. Tibshirani and Hastie (1987) extend the ideas of local regression to a local likelihood procedure. This procedure is designed for nonparametric regression modeling in situations where weighted least squares is inappropriate as an estimation method, for example binary data. Local regression may be viewed as a special case of local likelihood estimation. Tibshirani and Hastie (1987), Staniswalis (1989), and Loader (1999) apply local likelihood estimation to several types of data where local regressio...
An Information Theoretic Framework for Exploratory Multivariate Market Segmentation Research," Decision Sciences
, 1991
"... Stateoftheart market segmentation often involves simultaneous consideration of multiple and overlapping variables. These variables are studied to assess their relationships, select a subset of variables which best represent the subgroups (segments) within a market, and determine the likelihood of ..."
Stateoftheart market segmentation often involves simultaneous consideration of multiple and overlapping variables. These variables are studied to assess their relationships, select a subset of variables which best represent the subgroups (segments) within a market, and determine the likelihood of membership of a given individual in a particular segment. Such information, obtained in the exploratory phase of a multivariate market segmentation study, leads to the construction of more parsimonious models. These models have less stringent data requirements while facilitating substantive evaluation to aid marketing managers in formulating more effective targeting and positioning strategies within different market segments. This paper utilizes the informationtheoretic (IT) approach to address several issues in multivariate market segmentation studies. A marketing data set analyzed previously is employed to examine the suitability and usefulness of the proposed approach [12]. Some useful extensions of the IT framework and its applications are also discussed.
SEQUENTIAL CATEGORY AGGREGATION AND PARTITIONING APPROACHES FOR MULTIWAY CONTINGENCY TABLES BASED ON SURVEY AND CENSUS DATA 1
, 2007
"... Large contingency tables arise in many contexts but especially in the collection of survey and census data by government statistical agencies. Because the vast majority of the variables in this context have a large number of categories, agencies and users need a systematic way of constructing tables ..."
Large contingency tables arise in many contexts but especially in the collection of survey and census data by government statistical agencies. Because the vast majority of the variables in this context have a large number of categories, agencies and users need a systematic way of constructing tables which are summaries of such contingency tables. We propose such an approach in this paper by finding members of a class of restricted loglinear models which maximize the likelihood of the data and use this to find a parsimonious means of representing the table. In contrast with more standard approaches for model search in hierarchical loglinear models (HLLM), our procedure systematically reduces the number of categories of the variables. Through a series of examples, we illustrate the extent to which it can preserve the interaction structure found with HLLMs and be used as a data simplification procedure prior to HLL modeling. A feature of the procedure is that it can easily be applied to many tables with millions of cells, providing a new way of summarizing large data sets in many disciplines. The focus is on information and description rather than statistical testing. The procedure may treat each variable in the table in different ways, preserving full detail, treating it as fully nominal, or preserving ordinality.
An Empirical Study of Qualities of Association Rules from a Statistical View Point
"... Abstract: Minimum support and confidence have been used as criteria for generating association rules in all association rule mining algorithms. These criteria have their natural appeals, such as simplicity; few researchers have suspected the quality of generated rules. In this paper, we examine the ..."
Abstract: Minimum support and confidence have been used as criteria for generating association rules in all association rule mining algorithms. These criteria have their natural appeals, such as simplicity; few researchers have suspected the quality of generated rules. In this paper, we examine the rules from a more rigorous point of view by conducting statistical tests. Specifically, we use contingency tables and chisquare test to analyze the data. Experimental results show that one third of the association rules derived based on the support and confidence criteria are not significant, that is, the antecedent and consequent of the rules are not correlated. It indicates that minimum support and minimum confidence do not provide adequate discovery of meaningful associations. The chisquare test can be considered as an enhancement or an alternative solution.
unknown title
, 1994
"... This document was created with FrameMaker 4.0.4 Adaptive Statistical Language Modelling 6 ..."
This document was created with FrameMaker 4.0.4 Adaptive Statistical Language Modelling 6
Adaptive Statistical Language Modelling
, 1994
"... The trigram statistical language model is remarkably successful when used in such applications as speech recognition. However, the trigram model is static in that it only considers the previous two words when making a prediction about a future word. The work presented here attempts to improve upon t ..."
The trigram statistical language model is remarkably successful when used in such applications as speech recognition. However, the trigram model is static in that it only considers the previous two words when making a prediction about a future word. The work presented here attempts to improve upon the trigram model by considering additional contextual and longer distance information. This is frequently referred to in the literature as adaptive statistical language modelling because the model is thought of as adapting to the longer term information. This work considers the creation of topic specific models, statistical evidence from the presence or absence of triggers, or related words, in the document history (document triggers) and in the current sentence (insentence triggers), and the incorporation of the document cache, which predicts the probability of a word by considering its frequency in the document history. An important result of this work is that the presence of selftriggers, that is, whether or not the word itself occurred in the document history, is an extremely important piece of information. A maximum entropy (ME) approach will be used in many instances to incorporate information from different sources. Maximum entropy considers a model which maximizes entropy while satisfying the constraints presented by the information we wish to incorporate. The generalized iterative scaling (GIS) algorithm can be used to compute the maximum entropy solution. This work also considers various methods of smoothing the information in a maximum entropy model. An inportant result is that smoothing improves performance noticibly and that GoodTuring discounting is an effective method of smoothing. Thesis Supervisor: Victor Zue Title: Principal Research Scientist, Departme...