Results 1 - 10
of
25
Grafting: Fast, Incremental Feature Selection by Gradient Descent in Function Space
- Journal of Machine Learning Research
, 2003
"... We present a novel and flexible approach to the problem of feature selection, called grafting.Rather than considering feature selection as separate from learning, grafting treats the selection of suitable features as an integral part of learning a predictor in a regularized learning framework. To ..."
Abstract
-
Cited by 51 (1 self)
- Add to MetaCart
We present a novel and flexible approach to the problem of feature selection, called grafting.Rather than considering feature selection as separate from learning, grafting treats the selection of suitable features as an integral part of learning a predictor in a regularized learning framework. To make this regularized learning process sufficiently fast for large scale problems, grafting operates in an incremental iterative fashion, gradually building up a feature set while training a predictor model using gradient descent. At each iteration, a fast gradient-based heuristic is used to quickly assess which feature is most likely to improve the existing model, that feature is then added to the model, and the model is incrementally optimized using gradient descent. The algorithm scales linearly with the number of data points and at most quadratically with the number of features. Grafting can be used with a variety of predictor model classes, both linear and non-linear, and can be used for both classification and regression. Experiments are reported here on a variant of grafting for classification, using both linear and non-linear models, and using a logistic regression-inspired loss function. Results on a variety of synthetic and real world data sets are presented. Finally the relationship between grafting, stagewise additive modelling, and boosting is explored.
Probabilistic Analysis and Scheduling of Critical Soft Real-Time Systems
, 1999
"... In addition to correctness requirements, a real-time system must also meet its temporal constraints, often expressed as deadlines. We call safety or mission critical real-time systems which may miss some deadlines critical soft real-time systems to distinguish them from hard real-time systems, where ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
In addition to correctness requirements, a real-time system must also meet its temporal constraints, often expressed as deadlines. We call safety or mission critical real-time systems which may miss some deadlines critical soft real-time systems to distinguish them from hard real-time systems, where all deadlines must be met, and from soft real-time systems which are not safety or mission critical. The performance of a critical soft real-time system is acceptable as long as the deadline miss rate is below an application specific threshold. Architectural features of computer systems, such as caches and branch prediction hardware, are designed to improve average performance. Deterministic real-time design and analysis approaches require that such features be disabled to increase predictability. Alternatively, allowances must be made for for their effects by designing for the worst case. Either approach leads to a decrease in average performance. Since critical soft real-time systems do not require that all deadlines be met, average performance can be improved by adopting a probabilitistic approach. In order to allow a trade-off between deadlines met and average
Quantifiable Data Mining Using Ratio Rules
"... Association Rule Mining algorithms operate on a data matrix (e.g., customers \Theta products) to derive association rules (Agrawal, Imielinski, & Swami, 1993b; Srikant & Agrawal, 1996). We propose a new paradigm, namely, Ratio Rules, which are quantifiable in that we can measure the "goodness" of a ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
Association Rule Mining algorithms operate on a data matrix (e.g., customers \Theta products) to derive association rules (Agrawal, Imielinski, & Swami, 1993b; Srikant & Agrawal, 1996). We propose a new paradigm, namely, Ratio Rules, which are quantifiable in that we can measure the "goodness" of a set of discovered rules. We also propose the "guessing error" as a measure of the "goodness", that is, the root-mean-square error of the reconstructed values of the cells of the given matrix, when we pretend that they are unknown. Another contribution is a novel method to guess missing/hiddenvalues from the Ratio Rules that our method derives. For example, if somebody bought $10 of milk and $3 of bread, our rules can "guess" the amount spent on butter. Thus, unlike association rules, Ratio Rules can perform a variety of important tasks such as forecasting, answering "what-if" scenarios, detecting outliers, and visualizing the data. Moreover, we show that we can compute Ratio Rules in a sing...
Towards an efficient functional implementation of the NAS benchmark FT
- Proceedings of the 7th International Conference on Parallel Computing Technologies (PaCT’03), Nizhni Novgorod, Russia. Volume 2763 of Lecture Notes in Computer Science
, 2003
"... Abstract. This paper compares a high-level implementation of the NAS benchmark FT in the functional array language SaC with traditional solutions based on Fortran-77 and C. The impact of abstraction on expressiveness, readability, and maintainability of code as well as on clarity of underlying mathe ..."
Abstract
-
Cited by 6 (4 self)
- Add to MetaCart
Abstract. This paper compares a high-level implementation of the NAS benchmark FT in the functional array language SaC with traditional solutions based on Fortran-77 and C. The impact of abstraction on expressiveness, readability, and maintainability of code as well as on clarity of underlying mathematical concepts is discussed. The associated impact on runtime performance is quantified both in a uniprocessor environment as well as in a multiprocessor environment based on automatic parallelization and on OpenMP. 1 Introduction Low-level sequential base languages, e.g. Fortran-77 or C, and message passing libraries, mostly Mpi, form the prevailing tools for generating parallel applications, in particular for numerical problems. This choice offers almost literal control over data layout and program execution, including communication and synchronization. Expertised programmers are enabled to adapt their code to hardware characteristics of target machines, e.g. properties of memory hierarchies, and to enhance the runtime performance to whatever a machine is able to deliver. During the process of performance tuning, numerical code inevitably mutates from a (maybe) human-readable representation of an abstract algorithm to one that almost certainly is suitable for machines only. Ideas and concepts of underlying mathematical algorithms are completely disguised. Even minor changes to underlying algorithms may require a major re-design of the implementation. Moreover, particular demand is made on the qualification of programmers as they have to be experts in computer architecture and programming technique in addition to their specific application domains. As a consequence, development and maintenance of parallel code is prohibitively expensive.
Relaxation Labeling Using Augmented Lagrange-Hopfield Method
- Pattern Recognition
, 1998
"... This paper presents a novel relaxation labeling method called Augmented Lagrangian-Hopfield (ALH) method based on the Augmented Lagrangian multipliers and the graded Hopfield neural network. In the ALH method, RL is formulated as a problem of constrained real optimization. The augmented Lagrange mul ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
This paper presents a novel relaxation labeling method called Augmented Lagrangian-Hopfield (ALH) method based on the Augmented Lagrangian multipliers and the graded Hopfield neural network. In the ALH method, RL is formulated as a problem of constrained real optimization. The augmented Lagrange multiplier method [13,14] is used for optimization with the constraints and the Hopfield method [15,16] for bridging the gap between discrete and continuous optimization. The ALH needs no gradient projection nor other normalization operations in its updating equations in keeping the labeling constraints. Therefore, it is more amenable for a neural network implementation than the exiting RL algorithms. Experiments show that the ALH produces good quality solutions in terms of the optimized objective values at a reasonable number of iterations. A recent result shows that the ALH method significantly improves the Hopfield type networks in solving the traveling salesman problem [17]. The ALH has also been used for image restoration and segmentation [18]. The rest of the paper is organized as follows: Section 2 introduces the continuous RL Method. Section 3 poses RL as a constrained optimization problem and presents the ALH method for solving it. Section 4 discusses the constrained optimization methods in connection to RL. Section 5 gives a neural network structure for the ALH computation. Section 6 presents the experimental results.
One-Dimensional Dithering
- Int. Symposium on Electronic Image Capture and Publishing (EICP'98
, 1998
"... The Cambridge Research Laboratory was founded in 1987 to advance the state of the art in both core computing and human-computer interaction, and to use the knowledge so gained to support the Company's corporate objectives. We believe this is best accomplished through interconnected pursuits in techn ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
The Cambridge Research Laboratory was founded in 1987 to advance the state of the art in both core computing and human-computer interaction, and to use the knowledge so gained to support the Company's corporate objectives. We believe this is best accomplished through interconnected pursuits in technology creation, advanced systems engineering, and business development. We are actively investigating scalable computing; mobile computing; vision-based human and scene sensing; speech interaction; computeranimated synthetic persona; intelligent information appliances; and the capture, coding, storage, indexing, retrieval, decoding, and rendering of multimedia data. We recognize and embrace a technology creation model which is characterized by three major phases: Freedom: The lifeblood of the Laboratory comes from the observations and imaginations of our research staff. It is here that challenging research problems are uncovered (through discussions with customers, through interactions with others in the Corporation, through other professional interactions, through reading, and the like) or that new ideas are born. For any such problem or idea, this phase culminates in the nucleation of a project team around a well-articulated central research question and the outlining of a research plan. Focus: Once a team is formed, we aggressively pursue the creation of new technology based on the plan.
Models and Search Strategies for Applied Molecular Evolution
- Annual Reports in Combinatorial Chemistry and Molecular Diversity
, 1997
"... Introduction In just a few years, molecular diversity techniques have revolutionized pharmaceutical design and experimental methods for studying receptor binding, consensus sequences, genetic regu- latory mechanisms, and many other issues in biochemistry and chemistry [30, 69 71, 78, 79]. Because o ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Introduction In just a few years, molecular diversity techniques have revolutionized pharmaceutical design and experimental methods for studying receptor binding, consensus sequences, genetic regu- latory mechanisms, and many other issues in biochemistry and chemistry [30, 69 71, 78, 79]. Because of the enormous libraries of ligands that can be used and the rapidity of the techniques, methods of applied molecular evolution such as SELEX and phage display have become particularly popular [30, 78, 86,126,127, 142,151]. These methods have been enormously successful, yet the theoretical work developed for them so far is quite limited. The success of these methods is not trivial: the huge number of sequences being searched through, the low concentrations of individual species, and the noise and biases inherent in the techniques would seem to make these experiments very difficult. Understanding why they work so well, and showing how they can perform better and for more complex molecular se
Integration of Volume Visualization and Compression: A Survey
, 2000
"... Volume visualization has become more and more important in modern world due to its wide applicability. Numerous techniques have been developed to render data sets in the form of regular grids (voxel data) and irregular grids. As the volume data sets grow bigger and bigger, data compression algorithm ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Volume visualization has become more and more important in modern world due to its wide applicability. Numerous techniques have been developed to render data sets in the form of regular grids (voxel data) and irregular grids. As the volume data sets grow bigger and bigger, data compression algorithms are required to reduce the disk storage size, and potentially the memory size during rendering as well. This paper surveys several techniques of volume visualization and volume compression, together with their integration or interaction. In general the strategies include: decompression the whole data set before rendering, on-the-fly rendering during decompression, on-the-fly decompression during rendering, and rendering in the compression domain.
Evolutionary Based Autocalibration from the Fundamental Matrix
, 2002
"... We describe a new method of achieving autocalibration that uses a stochastic optimization approach taken from the field of evolutionary computing and we perform a number of experiments on standardized data sets that show the effectiveness of the approach. The basic assumption of this method is t ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
We describe a new method of achieving autocalibration that uses a stochastic optimization approach taken from the field of evolutionary computing and we perform a number of experiments on standardized data sets that show the effectiveness of the approach. The basic assumption of this method is that the internal (intrinsic) camera parameters remain constant throughout the image sequence, i.e. they are taken from the same camera without varying the focal length. We show that for the autocalibration of focal length and aspect ratio, the evolutionary method achieves comparable results without the implementation complexity of other methods. Autocalibrating from the fundamental matrix is simply transformed into a global minimization problem utilizing a cost function based on the properties of the fundamental matrix and the essential matrix.
Choosing SNPs Using Feature Selection
"... A major challenge for genomewide disease association studies is the high cost of genotyping large number of single nucleotide polymorphisms (SNP). The correlations between SNPs, however, make it possible to select a parsimonious set of informative SNPs, known as “tagging ” SNPs, able to capture most ..."
Abstract
- Add to MetaCart
A major challenge for genomewide disease association studies is the high cost of genotyping large number of single nucleotide polymorphisms (SNP). The correlations between SNPs, however, make it possible to select a parsimonious set of informative SNPs, known as “tagging ” SNPs, able to capture most variation in a population. Considerable research interest has recently focused on the development of methods for finding such SNPs. In this paper, we present an efficient method for finding tagging SNPs. The method does not involve computation-intensive search for SNP subsets but discards redundant SNPs using a feature selection algorithm. In contrast to most existing methods, the method presented here does not limit itself to using only correlations between SNPs in local groups. By using correlations that occur across different chromosomal regions, the method can reduce the number of globally redundant SNPs. Experimental results show that the number of tagging SNPs selected by our method is smaller than by using block-based methods. Supplementary website:

