Results 1 
4 of
4
Linear Concepts and Hidden Variables
, 2000
"... We study a learning problem which allows for a \fair" comparison between unsupervised learning methodsprobabilistic model construction, and more traditional algorithms that directly learn a classication. The merits of each approach are intuitively clear: inducing a model is more expensive c ..."
Abstract

Cited by 21 (15 self)
 Add to MetaCart
We study a learning problem which allows for a \fair" comparison between unsupervised learning methodsprobabilistic model construction, and more traditional algorithms that directly learn a classication. The merits of each approach are intuitively clear: inducing a model is more expensive computationally, but may support a wider range of predictions. Its performance, however, will depend on how well the postulated probabilistic model ts that data. To compare the paradigms we consider a model which postulates a single binaryvalued hidden variable on which all other attributes depend. In this model, nding the most likely value of any one variable (given known values for the others) reduces to testing a linear function of the observed values. We learn the model with two techniques: the standard EM algorithm, and a new algorithm we develop based on covariances. We compare these, in a controlled fashion, against an algorithm (a version of Winnow) that attempts to nd a good l...
Linear concepts and hidden variables: An empirical study
 In Neural Information Processing Systems
, 1998
"... Some learning techniques for classification tasks work indirectly, by first trying to fit a full probabilistic model to the observed data. Whether this is a good idea or not depends on the robustness with respect to deviations from the postulated model. We study this question experimentally in a res ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
Some learning techniques for classification tasks work indirectly, by first trying to fit a full probabilistic model to the observed data. Whether this is a good idea or not depends on the robustness with respect to deviations from the postulated model. We study this question experimentally in a restricted, yet nontrivial and interesting case: we consider a conditionally independent attribute (CIA) model which postulates a single binaryvalued hidden variable z on which all other attributes (i.e., the target and the observables) depend. In this model, finding the most likely value of any one variable (given known values for the others) reduces to testing a linear function of the observed values. We learn CIA with two techniques: the standard EM algorithm, and a new algorithm we develop based on covariances. We compare these, in a controlled fashion, against an algorithm (a version of Winnow) that attempts to find a good linear classifier directly. Our conclusions help delimit the frag...
Applying Machine Learning for Ensemble
"... Abstract. The problem of predicting the outcome of a conditional branch instruction is a prerequisite for high performance in modern processors. It has been shown that combining different branch predictors can yield more accurate prediction schemes, but the existing research only examines selection ..."
Abstract
 Add to MetaCart
Abstract. The problem of predicting the outcome of a conditional branch instruction is a prerequisite for high performance in modern processors. It has been shown that combining different branch predictors can yield more accurate prediction schemes, but the existing research only examines selectionbased approaches where one predictor is chosen without considering the actual predictions of the available predictors. The machine learning literature contains many papers addressing the problem of predicting a binary sequence in the presence of an ensemble of predictors or experts. We show that the Weighted Majority algorithm applied to an ensemble of branch predictors yields a prediction scheme that results in a 511 % reduction in mispredictions. We also demonstrate that a variant of the Weighted Majority algorithm that is simplified for efficient hardware implementation still achieves misprediction rates that are within 1.2 % of the ideal case. 1
Abstract Microarchitecture for BillionTransistor VLSI Superscalar Processors
, 2002
"... The vast computational resources in billiontransistor VLSI microchips can continue to be used to build aggressively clocked uniprocessors for extracting large amounts of instruction level parallelism. This dissertation addresses the problems of implementing wide issue, outoforder execution, supe ..."
Abstract
 Add to MetaCart
The vast computational resources in billiontransistor VLSI microchips can continue to be used to build aggressively clocked uniprocessors for extracting large amounts of instruction level parallelism. This dissertation addresses the problems of implementing wide issue, outoforder execution, superscalar processors capable of handling hundreds of inflight instructions. The specific issues covered by this dissertation are the critical circuits that comprise the superscalar core, the increasing levelone data cache latency, the need for more accurate branch prediction to keep such a large processor busy, and the difficulty in quickly evaluating such complex processor designs. Using scalable circuit designs, large instruction windows may be implemented with fast clock speeds. We design and optimize the critical circuits in a superscalar execution core. At comparable clock speeds, an instruction window implemented with our circuits can simultaneously wakeup and schedule 128 instructions, compared to only twenty instructions in the Alpha 21264. Augmenting our processor with clustered, speculative Level Zero (L0) data caches provides fast accesses to the data cache despite the increasing distance across the core to the Level One cache. Large superscalar execution cores of future processors may take up so much area that a load from memory requires multiple cycles to propagate across the core, access the cache, and propagate the result back. Multiple L0 caches