Fast FullSearch Equivalent NearestNeighbour Search Algorithms
, 1999
Abstract

A fundamental activity common to many image processing, pattern classification, and clustering algorithms involves searching a set of n, kdimensional data for the one which is nearest to a given target item with respect to a distance function. Our goal is to find fast search algorithms which are fullsearch equivalentthat is, the resulting match is as good as what we could obtain if we were to search the set exhaustively. We propose a framework made up of three components, namely (i) a technique for obtaining a good initial match, (ii) an inexpensive method for determining whether the current match is a fullsearch equivalent match, and (iii) an effective technique for improving the current match. Our approach is to consider good solutions for each component in order to find an algorithm which balances the overall complexity of the search. We also propose a technique for hierarchical ordering and cluster elimination using a minimal cost spanning tree. Our experiments on vector quantisation coding of images show that the framework and techniques we proposed can be used to construct suitable algorithms for most of our data sets which require fullsearch equivalent matches at an average arithmetic cost of less than O(k log n) while using only O(n) space.
On the Integration of Learning, Logical Deduction and Probabilistic inductive Inference
, 1991
Abstract

This paper introduces the conjecture that many kinds of cognition and computing may usefully be seen as a search for efficiency in information, where efficiency is defined in terms of Shannon's (1949) concept of redundancy in information. A prototype of a new kind of computing system which is based on this theory is described in outline. Examples from the prototype are presented showing how a search for efficiency in information may achieve autonomous inductive learning, logical deduction and probabilistic inductive inference. 2 INTRODUCTION The nascent field of inductive logic programming has been defined by Muggleton (1990) as "the intersection of Logic Programming and Machine Learning". Within this field, two themes may be distinguished: . Most attention is being paid at present to ways in which logic programs may be created by automatic or semiautomatic induction. . Some attention has been given to the possibility that inductive learning and logical deduction may be integrated. ...
Bayesian Posterior Comprehension via Message from Monte Carlo
, 2003
Abstract

We discuss the problem of producing an epitome, or brief summary, of a Bayesian posterior distribution  and then investigate a general solution based on the Minimum Message Length (MML) principle. Clearly, the optimal criterion for choosing such an epitome is determined by the epitome's intended use. The interesting general case is where this use is unknown since, in order to be practical, the choice of epitome criterion becomes subjective. We identify a number of desirable properties that an epitome could have  facilitation of point estimation, human comprehension, and fast approximation of posterior expectations. We call these the properties of Bayesian Posterior Comprehension and show that the Minimum Message Length principle can be viewed as an epitome criterion that produces epitomes having these properties. We then present and extend Message from Monte Carlo as a means for constructing instantaneous Minimum Message Length codebooks (and epitomes) using Markov Chain Monte Carlo methods. The Message from Monte Carlo methodology is illustrated for binary regression, generalised linear model, and multiple changepoint problems.
Unsupervised Classification With Stochastic Complexity
, 1992
Abstract

this paper we are particularly interested in quadratic cluster boundaries, which amounts to selecting the model class of multivariate normal densities. We need to calculate the code length L(X jc)
5th International Workshop on Intelligent Data Analysis in Medicine and Pharmacology (IDAMAP2000)
, 2000
Abstract
ion and representation of repeated patterns in highfrequency data ::::::::::::32 8. Y.L. O: Analysis of primary care data :::::::::::::::::::::::::::::::::::::::::::::::::::::::::40 9. K.M. de Oliveira, A.A. Ximenes, S. Matwin, G. Travassos and A.R. Rocha: A generic architecture for knowledge acquisition tools in cardiology ::::::::::::::::::::43 10. P.Perner: Mining knowledge in Xray images for lung cancer :::::::::::::::::::::::::::::::::::::46 11. A. Smith and S.S. Anand: Patient survival estimation with multiple attributes: Adaptation of Cox's regression to give an individual's point prediction ::::::::::::::::::::::::::::::::::::::::::::::::51 12. W. Stuhlinger, O. Hogl, H. Stoyan and M. Muller: Intelligent data mining for medical quality management :::::::::::::::::::::::::::::::55 13. K. Vikkii, E. Kentala, M. Juhola and I. Pyykko: Confounding values in decision trees constructed for six otoneurological diseases ::::::::58 14. C. Wroe, W.D. Solomon, A.L. Rector and J.E. Roge...
Mining a database of Fungi for Pharmacological Use via Minimum Message Length Encoding
, 2000
Abstract
. This paper concerns the use of fungi in pharmaceutical design. More specifically, this research involves mining a database of fungi to determine which ones have waste products that are unusual in their spectral fingerprints, and therefore worth being tested for medicinal properties. The technique described in this paper involves Minimum Message Length encoding. Minimum Message Length (sometimes called Minimum Description Length) encoding is a method of choosing a binary coding for a set of data. The method's goal is to use the frequency of occurrence of each data point to ensure that frequently occurring data are given short codes. Minimum Message Length encoding provides a solution that is optimal in the sense that if the entire data set is employed in the encoding, then the code generated will have the property that no other unambiguous prefix code will provide a shorter encoded version of the entire set. In this paper, the process is turned on its head. The problem that is addres...
AUTOMATIC CLASSIFICATION
Abstract
In this chapter I shall attempt to present a coherent account of classification in such a way that the principles involved will be sufficiently understood for anyone wishing to use classification techniques in IR to do so without too much difficulty. The emphasis will be
Advance Access publication on June 18, 2008 doi:10.1093/comjnl/bxm117
Abstract
One of the second generation of computer scientists, Chris Wallace completed his tertiary education in 1959 with a Ph.D. in nuclear physics, on cosmic ray showers, under Dr Paul George at Sydney University. Needless to say, computer science was not, at that stage, an established academic discipline. With Max Brennan 1 andJohnMaloshehaddesignedand built a large automatic data logging system for recording cosmic ray air shower events and with Max Brennan also developed a complex computer programme for Bayesian analysis of cosmic ray events on the recently installed SILLIAC computer. Appointed lecturer in Physics at Sydney in 1960 he was sent almost immediately to the University of Illinois to copy the design of ILLIAC II, a duplicate of which was to be built at Sydney. ILLIAC II was not in fact completed at that stage and, after an initial less than warm welcome by a department who seemed unsure exactly what this Australian was doing in their midst, his talents were recognized and he was invited to join their staff (under very generous conditions) to assist in ILLIAC II design 2. He remained there for two years helping in particular to design the input output channels and aspects of the advanced control unit (first stage pipeline). In the event, Sydney decided it would be too expensive to build a copy of ILLIAC II, although a successful copy (the Golem) was built in Israel using circuit designs developed by Wallace and Ken Smith. In spite of the considerable financial and academic inducements to remain in America, Wallace returned to Australia after three months spent in England familiarizing himself with the KDF9 computer being purchased by Sydney University to replace SILLIAC. Returning to the School of Physics he joined the Basser