Results 1 - 10
of
64
Unsupervised learning of finite mixture models
- IEEE Transactions on Pattern Analysis and Machine Intelligence
, 2002
"... AbstractÐThis paper proposes an unsupervised algorithm for learning a finite mixture model from multivariate data. The adjective ªunsupervisedº is justified by two properties of the algorithm: 1) it is capable of selecting the number of components and 2) unlike the standard expectation-maximization ..."
Abstract
-
Cited by 201 (16 self)
- Add to MetaCart
AbstractÐThis paper proposes an unsupervised algorithm for learning a finite mixture model from multivariate data. The adjective ªunsupervisedº is justified by two properties of the algorithm: 1) it is capable of selecting the number of components and 2) unlike the standard expectation-maximization (EM) algorithm, it does not require careful initialization. The proposed method also avoids another drawback of EM for mixture fitting: the possibility of convergence toward a singular estimate at the boundary of the parameter space. The novelty of our approach is that we do not use a model selection criterion to choose one among a set of preestimated candidate models; instead, we seamlessly integrate estimation and model selection in a single algorithm. Our technique can be applied to any type of parametric mixture model for which it is possible to write an EM algorithm; in this paper, we illustrate it with experiments involving Gaussian mixtures. These experiments testify for the good performance of our approach. Index TermsÐFinite mixtures, unsupervised learning, model selection, minimum message length criterion, Bayesian methods, expectation-maximization algorithm, clustering. 1
MML clustering of multi-state, Poisson, von Mises circular and Gaussian distributions
- Statistics Computing
, 2000
"... Minimum Message Length (MML) is an invariant Bayesian point estimation technique which is also statistically consistent and efficient. We provide a brief overview of MML inductive inference ..."
Abstract
-
Cited by 29 (8 self)
- Add to MetaCart
Minimum Message Length (MML) is an invariant Bayesian point estimation technique which is also statistically consistent and efficient. We provide a brief overview of MML inductive inference
Active Virtual Network Management Prediction
- In Parallel and Discrete Event Simulation Conference (PADS) '99
, 1999
"... Active Networking provides a framework in which executable code within data packets can execute upon intermediate network nodes. Active Virtual Network Management Prediction (AVNMP) provides a network prediction service that utilizes the capability of Active Networks to easily inject fine-grained mo ..."
Abstract
-
Cited by 21 (10 self)
- Add to MetaCart
Active Networking provides a framework in which executable code within data packets can execute upon intermediate network nodes. Active Virtual Network Management Prediction (AVNMP) provides a network prediction service that utilizes the capability of Active Networks to easily inject fine-grained models into the communication network to enhance network performance. The models injected into the network allow state to be predicted and propagated throughout an active network enabling the network to operate simultaneously in real time and in the future. State information such as load, security intrusion, mobile location, faults, and other state information found in typical Management Information Bases (MIB) is available for use by the management system both with current values and with values expected to exist in the future. Implementing a load prediction and CPU prediction application has experimentally validated AVNMP. AVNMP implements a distributed, active, and truly proactive network management system. Active Networking enables the implementation of new concepts utilized in AVNMP such as the ability to quickly and easily inject models into a network. In addition, Active Networking enables the ability of messages to refine their prediction as they travel through the network as well as several enhancements to the basic AVNMP algorithm, including migration of AVNMP components and reduction in overhead by means of message fusion.
Information Assurance through Kolmogorov Complexity
, 2001
"... The problem of Information Assurance is approached from the point of view of Kolmogorov Complexity and Minimum Message Length criteria. Several theoretical results are obtained, possible applications are discussed and a new metric for measuring complexity is introduced. Utilization of Kolmogorov Com ..."
Abstract
-
Cited by 19 (7 self)
- Add to MetaCart
The problem of Information Assurance is approached from the point of view of Kolmogorov Complexity and Minimum Message Length criteria. Several theoretical results are obtained, possible applications are discussed and a new metric for measuring complexity is introduced. Utilization of Kolmogorov Complexity like metrics as conserved parameters to detect abnormal system behavior is explored. Data and process vulnerabilities are put forward as two different dimensions of vulnerability that can be discussed in terms of Kolmogorov Complexity. Finally, these results are utilized to conduct complexitybased vulnerability analysis. 1. Introduction Information security (or lack thereof) is too often dealt with after security has been lost. Back doors are opened, Trojan horses are placed, passwords are guessed and firewalls are broken down -- in general, security is lost as barriers to hostile attackers are breached and one is put in the undesirable position of detecting and patching holes. In ...
Grammar Model-based Program Evolution
- In Proceedings of the 2004 IEEE Congress on Evolutionary Computation
, 2004
"... In Evolutionary Computation, genetic operators, such as mutation and crossover, are employed to perturb individuals to generate the next population. However these fixed, problem independent genetic operators may destroy the subsolution, usually called building blocks, instead of discovering and pres ..."
Abstract
-
Cited by 19 (1 self)
- Add to MetaCart
In Evolutionary Computation, genetic operators, such as mutation and crossover, are employed to perturb individuals to generate the next population. However these fixed, problem independent genetic operators may destroy the subsolution, usually called building blocks, instead of discovering and preserving them. One way to overcome this problem is to build a model based on the good individuals, and sample this model to obtain the next population. There is a wide range of such work in Genetic Algorithms
Privacy issues in knowledge discovery and data mining
- In Proc. of Australian Institute of Computer Ethics Conference (AICEC99
, 1999
"... Recent developments in information technology have enabled collection and processing of vast amounts of personal data, such as criminal records, shopping habits, credit and medical history, and driving records. This information is undoubtedly very useful in many areas, including medical research, la ..."
Abstract
-
Cited by 13 (0 self)
- Add to MetaCart
Recent developments in information technology have enabled collection and processing of vast amounts of personal data, such as criminal records, shopping habits, credit and medical history, and driving records. This information is undoubtedly very useful in many areas, including medical research, law enforcement and national security. However, there is an increasing public concern about the individuals ' privacy. Privacy is commonly seen as the right of individuals to control information about themselves. The appearance of technology for Knowledge Discovery and Data Mining (KDDM) has revitalized concern about the following general privacy issues: • secondary use of the personal information, • handling misinformation, and • granulated access to personal information. They demonstrate that existing privacy laws and policies are well behind the developments in technology, and no longer offer adequate protection. We also discuss new privacy threats posed KDDM, which includes massive data collection, data warehouses, statistical analysis and deductive learning techniques. KDDM uses vast amounts of data to generate hypotheses and discover general patterns. KDDM poses the following new challenges to privacy. • stereotypes, • guarding personal data from KDDM researchers, • individuals from training sets, and • combination of patterns. We discuss the possible solutions and their impact on the quality of discovered patterns. 1
Unan algorithm for the unsupervised learning of morphology. «Natural Language Engineering
, 2006
"... This paper describes in detail an algorithm for the unsupervised learning of natural language morphology, with emphasis on challenges that are encountered in languages typologically similar to European languages. It utilizes the Minimum Description Length analysis described in Goldsmith 2001 and has ..."
Abstract
-
Cited by 11 (2 self)
- Add to MetaCart
This paper describes in detail an algorithm for the unsupervised learning of natural language morphology, with emphasis on challenges that are encountered in languages typologically similar to European languages. It utilizes the Minimum Description Length analysis described in Goldsmith 2001 and has been implemented in software that is available for downloading and testing. 1. Scope of this paper This paper describes in detail an algorithm used for the unsupervised learning of natural language morphology which works well for European languages and other languages in which the average number of morphemes per word is not too high. 1 It has been implemented and tested in Linguistica, and is based on the theoretical principles described in Goldsmith 2001. The present paper describes that framework briefly, but the reader is referred there for a more careful development. The executable for this program, and the source code as well, is available at
Discussion on Kolmogorov Complexity and Statistical Analysis
, 1999
"... equality (1) could be explained as follows: any object x # A has a two-part description. The first part is (a description of a) program p. The second part is the number of x in the enumeration of A (the element that appears first has number 1, the next element has number 2, etc.). The first part r ..."
Abstract
-
Cited by 10 (0 self)
- Add to MetaCart
equality (1) could be explained as follows: any object x # A has a two-part description. The first part is (a description of a) program p. The second part is the number of x in the enumeration of A (the element that appears first has number 1, the next element has number 2, etc.). The first part requires K ( p) bits. The second part requires at most log 2 | A| bits. (Additional O(log n) bits are needed to form a pair; we omit the details.) We are interested in `efficient' two-part descriptions for which the inequality (1) is close to equality. For any string x there are many efficient descriptions. Here are two `extreme' examples: (a) The set A consists of x only: A ={x}; the program p that enumerates
Univariate Polynomial Inference by Monte Carlo Message Length Approximation
- in Int. Conf. Machine Learning
, 2002
"... We apply the Message from Monte Carlo (MMC) algorithm to inference of univariate polynomials. MMC is an algorithm for point estimation from a Bayesian posterior sample. ..."
Abstract
-
Cited by 9 (5 self)
- Add to MetaCart
We apply the Message from Monte Carlo (MMC) algorithm to inference of univariate polynomials. MMC is an algorithm for point estimation from a Bayesian posterior sample.
Suboptimal behavior of Bayes and MDL in classification under misspecification
- COLT
, 2004
"... We show that forms of Bayesian and MDL inference that are often applied to classification problems can be inconsistent. This means that there exists a learning problem such that for all amounts of data the generalization errors of the MDL classifier and the Bayes classifier relative to the Bayesian ..."
Abstract
-
Cited by 9 (1 self)
- Add to MetaCart
We show that forms of Bayesian and MDL inference that are often applied to classification problems can be inconsistent. This means that there exists a learning problem such that for all amounts of data the generalization errors of the MDL classifier and the Bayes classifier relative to the Bayesian posterior both remain bounded away from the smallest achievable generalization error. From a Bayesian point of view, the result can be reinterpreted as saying that Bayesian inference can be inconsistent under misspecification, even for countably infinite models. We extensively discuss the result from both a Bayesian and an MDL perspective.

