Results 11 - 20
of
29
Implications of the Dirichlet assumption for discretization of continuous variables in naive Bayesian classifiers
- Machine Learning
, 2003
"... Abstract. In a naive Bayesian classifier, discrete variables as well as discretized continuous variables are assumed to have Dirichlet priors. This paper describes the implications and applications of this model selection choice. We start by reviewing key properties of Dirichlet distributions. Among ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Abstract. In a naive Bayesian classifier, discrete variables as well as discretized continuous variables are assumed to have Dirichlet priors. This paper describes the implications and applications of this model selection choice. We start by reviewing key properties of Dirichlet distributions. Among these properties, the most important one is “perfect aggregation, ” which allows us to explain why discretization works for a naive Bayesian classifier. Since perfect aggregation holds for Dirichlets, we can explain that in general, discretization can outperform parameter estimation assuming a normal distribution. In addition, we can explain why a wide variety of well-known discretization methods, such as entropy-based, ten-bin, and binlogl, can perform well with insignificant difference. We designed experiments to verify our explanation using synthesized and real data sets and showed that in addition to well-known methods, a wide variety of discretization methods all perform similarly. Our analysis leads to a lazy discretization method, which discretizes continuous variables according to test data. The Dirichlet assumption implies that lazy methods can perform as well as eager discretization methods. We empirically confirmed this implication and extended the lazy method to classify set-valued and multi-interval data with a naive Bayesian classifier.
Constructing and Evaluating Sensor-Based Statistical Models of Human Interruptability
, 2006
"... ..."
C.-H.: Cbs: A new classification method by using sequential patterns
- In: SDM 2005: Proc. of the 2005 SIAM International Data Mining Conference
, 2005
"... Abstract- Data classification is an important topic in data mining field due to the wide applications. A number of related methods have been proposed based on the wellknown learning models like decision tree or neural network. However, these kinds of classification methods may not perform well in mi ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Abstract- Data classification is an important topic in data mining field due to the wide applications. A number of related methods have been proposed based on the wellknown learning models like decision tree or neural network. However, these kinds of classification methods may not perform well in mining time sequence datasets like time-series gene expression data. In this paper, we propose a new data mining method, namely Classify-By-Sequence (CBS), for classifying large time-series datasets. The main methodology of CBS method is to integrate the sequential pattern mining with the probabilistic induction such that the inherent sequential patterns can be extracted efficiently and the classification task be done more accurately. Meanwhile, CBS method has the merit of simplicity in implementation. Through experimental evaluation, the CBS method is shown to outperform other methods greatly in the classification accuracy.
Dynamic Bayesian Networks for Classification of Business Cycles
, 1999
"... We use Dynamic Bayesian networks to classify business cycle phases. We compare classifiers generated by learning the Dynamic Bayesian network structure on different sets of admissible network structures. Included are sets of network structures of the Tree Augmented Naive Bayes (TAN) classifiers of F ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
We use Dynamic Bayesian networks to classify business cycle phases. We compare classifiers generated by learning the Dynamic Bayesian network structure on different sets of admissible network structures. Included are sets of network structures of the Tree Augmented Naive Bayes (TAN) classifiers of Friedman, Geiger, and Goldszmidt (1997) adapted for dynamic domains. The performance of the developed classifiers on the given data was modest.
Combining Statistics and Semantics for Word and Document Clustering
- In Ontology Learning Workshop, IJCAI’01
, 2001
"... A new approach for constructing pseudo-keywords, referred to as Sense Units, is proposed. Sense Units are obtained by a word clustering process, where the underlying similarity reflects both statistical and semantic properties, respectively detected through Latent Semantic Analysis and WordNet. ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
A new approach for constructing pseudo-keywords, referred to as Sense Units, is proposed. Sense Units are obtained by a word clustering process, where the underlying similarity reflects both statistical and semantic properties, respectively detected through Latent Semantic Analysis and WordNet.
Methodological Note On Comparing Classifiers: Pitfalls to Avoid and a Recommended Approach
, 1996
"... Abstract. An important component of many data mining projects is finding a good classification algorithm, a process that requires very careful thought about experimental design. If not done very carefully, comparative studies of classification and other types of algorithms can easily result in stati ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Abstract. An important component of many data mining projects is finding a good classification algorithm, a process that requires very careful thought about experimental design. If not done very carefully, comparative studies of classification and other types of algorithms can easily result in statistically invalid conclusions. This is especially true when one is using data mining techniques to analyze very large databases, which inevitably contain some statistically unlikely data. This paper describes several phenomena that can, if ignored, invalidate an experimental comparison. These phenomena and the conclusions that follow apply not only to classification, but to computational experiments in almost any aspect of data mining. The paper also discusses why comparative analysis is more important in evaluating some types of algorithms than for others, and provides some suggestions about how to avoid the pitfalls suffered by many experimental studies.
Model Trees for Hybrid Data Type Classification
"... Abstract. In the task of classification, most learning methods are suitable only for certain data types. For the hybrid dataset consists of nominal and numeric attributes, to apply the learning algorithms, some attributes must be transformed into the appropriate types. This procedure could damage th ..."
Abstract
- Add to MetaCart
Abstract. In the task of classification, most learning methods are suitable only for certain data types. For the hybrid dataset consists of nominal and numeric attributes, to apply the learning algorithms, some attributes must be transformed into the appropriate types. This procedure could damage the nature of dataset. We propose a model tree approach to integrate several characteristically different learning algorithms to solve the classification problem. We employ the decision tree as the classification framework and incorporate support vector machines into the tree construction process. This design removes the discretization procedure usually necessary for tree construction and provides the powerful multivariate decisions. Experiments show that our purposed method has better performance than that of other competing methods. 1
Induction of Shallow Decision Trees
"... In this paper we describe efficient algorithms that induce shallow (i.e., low depth) decision trees. A key feature of these algorithms is their ability to induce decision trees over real-valued data that have multiple branches at each node (in contrast to algorithms that use binary splits). As a spe ..."
Abstract
- Add to MetaCart
In this paper we describe efficient algorithms that induce shallow (i.e., low depth) decision trees. A key feature of these algorithms is their ability to induce decision trees over real-valued data that have multiple branches at each node (in contrast to algorithms that use binary splits). As a special case, we describe efficient algorithms for computing the optimal partitioning of one dimensional data. These algorithms can be used to discretize numerical datasets either for decision trees or other machine learning methods. We examine the empirical performance of our algorithms on several benchmarks. Several of the algorithms can be shown theoretically to converge quickly to the best fixed-depth decision tree. We also present algorithms to learn "exact" decision trees of minimal depth that correctly classify all training examples. This topic opens up some interesting possibilities for detecting simple patterns in data mining applications. 1 Introduction One important goal of machine ...
Orange and Decisions-at-Hand: Bridging Predictive Data Mining and Decision Support
"... Data mining is often used to develop predictive models from data, but rarely addresses how these models are to be employed. To use the constructed model, the user is usually required to run an often complex data mining suite in which the model has been constructed. A better mechanism for the com ..."
Abstract
- Add to MetaCart
Data mining is often used to develop predictive models from data, but rarely addresses how these models are to be employed. To use the constructed model, the user is usually required to run an often complex data mining suite in which the model has been constructed. A better mechanism for the communication of resulting models and less complex, easy to use tools for their employment are needed. We propose a technological solution to the problem, where a predictive model is encoded in XML and then used through a Web- or Palm handheld-based decision support shell. This schema supports developer-to-user and user-to-user communication. To facilitate the communication between the developers we advocate the use of data mining scripts. 1
Predicting Web Information Content
"... In this paper, we propose a novel method to infer the web user's Information Content (IC), which is the information that the user must examine to complete her task. In particular, our method learns to predict which words (called IC-words) will be in these essential web pages (IC-pages). We firs ..."
Abstract
- Add to MetaCart
In this paper, we propose a novel method to infer the web user's Information Content (IC), which is the information that the user must examine to complete her task. In particular, our method learns to predict which words (called IC-words) will be in these essential web pages (IC-pages). We first collected relevant training data usnig an empirical study, where users explicitly identified which pages were IC-pages. We then examined page-content information from these clickstreams, to determine "browsing properties" of each individual word --- i.e., how often was in the title of a page in each session, or in the anchor to a page that was followed, or a link that was skipped, etc. This training data also labeled each word as an IC-word or not. We used this to train a classifier to identify the browsing properties associated with IC-words. Notice this classifier can predict which words are IC given any page sequence, even if those pages are in web-sites that have not been visited previously.

