Results 1 - 10
of
129
The Bayesian image retrieval system, PicHunter: Theory, implementation, and psychophysical experiments
- IEEE TRANSACTIONS ON IMAGE PROCESSING
, 2000
"... This paper presents the theory, design principles, implementation, and performance results of PicHunter, a prototype content-based image retrieval (CBIR) system that has been developed over the past three years. In addition, this document presents the rationale, design, and results of psychophysica ..."
Abstract
-
Cited by 150 (2 self)
- Add to MetaCart
This paper presents the theory, design principles, implementation, and performance results of PicHunter, a prototype content-based image retrieval (CBIR) system that has been developed over the past three years. In addition, this document presents the rationale, design, and results of psychophysical experiments that were conducted to address some key issues that arose during PicHunter’s development. The PicHunter project makes four primary contributions to research on content-based image retrieval. First, PicHunter represents a simple instance of a general Bayesian framework we describe for using relevance feedback to direct a search. With an explicit model of what users would do, given what target image they want, PicHunter uses Bayes’s rule to predict what is the target they want, given their actions. This is done via a probability distribution over possible image targets, rather than by refining a query. Second, an entropy-minimizing display algorithm is described that attempts to maximize the information obtained from a user at each iteration of the search. Third, PicHunter makes use of hidden annotation rather than a possibly inaccurate/inconsistent annotation structure that the user must learn and make queries in. Finally, PicHunter introduces two experimental paradigms to quantitatively evaluate the performance of the system, and psychophysical experiments are presented that support the theoretical claims.
Clustering Association Rules
, 1997
"... We consider the problem of clustering two-dimensional association rules in large databases. We present a geometric-based algorithm, BitOp, for performing the clustering, embedded within an association rule clustering system, ARCS. Association rule clustering is useful when the user desires to segmen ..."
Abstract
-
Cited by 99 (0 self)
- Add to MetaCart
We consider the problem of clustering two-dimensional association rules in large databases. We present a geometric-based algorithm, BitOp, for performing the clustering, embedded within an association rule clustering system, ARCS. Association rule clustering is useful when the user desires to segment the data. We measure the quality of the segmentation generated by ARCS using the Minimum Description Length (MDL) principle of encoding the clusters on several databases including noise and errors. Scale-up experiments show that ARCS, using the BitOp algorithm, scales linearly with the amount of data. 1 Introduction Data mining, or the efficient discovery of interesting patterns from large collections of data, has been recognized as an important area of database research. The most commonly sought patterns are association rules as introduced in [AIS93b]. Intuitively, an association rule identifies a frequently occuring pattern of information in a database. Consider a supermarket database w...
Parallel Performance Prediction using Lost Cycles Analysis
- IN PROCEEDINGS OF SUPERCOMPUTING '94
, 1994
"... Most performance debugging and tuning of parallel programs is based on the "measure-modify" approach, which is heavily dependent on detailed measurements of programs during execution. This approach is extremely time-consuming and does not lend itself to predicting performance under varying condition ..."
Abstract
-
Cited by 62 (1 self)
- Add to MetaCart
Most performance debugging and tuning of parallel programs is based on the "measure-modify" approach, which is heavily dependent on detailed measurements of programs during execution. This approach is extremely time-consuming and does not lend itself to predicting performance under varying conditions. Analytic modeling and scalability analysis provide predictive power, but are not widely used inpractice, due primarily to their emphasis on asymptotic behavior and the difficulty of developing accurate models that work for real-world programs. In this paper we describe a set of tools for performance tuning of parallel programs that bridges this gap between measurement and modeling. Our approach is based on lost cycles analysis, which involves measurement and modeling of all sources of overhead in a parallel program. We first describe a tool for measuring overheads in parallel programs that we have incorporated into the runtime environment for Fortran programs on the Kendall Square KSR1. We then describe a tool that ts these overhead measurements to analytic forms. We illustrate the use of these tools by analyzing the performance tradeoffs among parallel implementations of 2D FFT. These examples show how our tools enable programmers to develop accurate performance models of parallel applications without requiring extensive performance modeling expertise.
User Interface Affordances in a Planning Representation
- Human Computer Interaction
, 1999
"... This article shows how the concept of affordance in the user interface fits into a wellunderstood artificial intelligence (AI) model of acting in an environment. In this model AI planning research is used to interpret affordances in terms of the costs associated with the generation and execution of ..."
Abstract
-
Cited by 23 (8 self)
- Add to MetaCart
This article shows how the concept of affordance in the user interface fits into a wellunderstood artificial intelligence (AI) model of acting in an environment. In this model AI planning research is used to interpret affordances in terms of the costs associated with the generation and execution of operators in a plan. We motivate our approach with a brief survey of the affordance literature and its connections to the planning literature, and then explore its implications through examples of common user interface mechanisms described in affordance terms. Despite its simplicity, our modeling approach ties together several different threads of practical and theoretical work on affordance into a single conceptual framework. Affordances in a planning representation 3 Contents 1 INTRODUCTION 4 2 PERSPECTIVES ON THE NATURE OF AFFORDANCES 5 3 AFFORDANCES IN PLANNING TERMS 8 4 GENERIC USER INTERFACE AFFORDANCES 13 4.1 Programmable User Models for Affordance Evaluation . . . . . . . . . . ....
Fast and effective orchestration of compiler optimizations for automatic performance tuning
- In Proceedings of the International Symposium on Code Generation and Optimization (CGO
, 2006
"... Although compile-time optimizations generally improve program performance, degradations caused by individual techniques are to be expected. One promising research direction to overcome this problem is the development of dynamic, feedback-directed optimization orchestration algorithms, which automati ..."
Abstract
-
Cited by 23 (1 self)
- Add to MetaCart
Although compile-time optimizations generally improve program performance, degradations caused by individual techniques are to be expected. One promising research direction to overcome this problem is the development of dynamic, feedback-directed optimization orchestration algorithms, which automatically search for the combination of optimization techniques that achieves the best program performance. The challenge is to develop an orchestration algorithm that finds, in an exponential search space, a solution that is close to the best, in acceptable time. In this paper, we build such a fast and effective algorithm, called Combined Elimination (CE). The key advance of CE over existing techniques is that it takes the least tuning time (57% of the closest alternative), while achieving the same program performance. We conduct the experiments on both a Pentium IV machine and a SPARC II machine, by measuring performance of SPEC CPU2000 benchmarks under a large set of 38 GCC compiler options. Furthermore, through orchestrating a small set of optimizations causing the most degradation, we show that the performance achieved by CE is close to the upper bound obtained by an exhaustive search algorithm. The gap is less than 0.2 % on average. 1
Remembrance of Circuits Past : Macromodeling by Data Mining in Large Analog Design Spaces
- in Proceedings of DAC
, 2002
"... The introduction of simulation-based analog synthesis tools creates a new challenge for analog modeling. These tools routinely visit 103 to 105 fully simulated circuit solution candidates. What might we do with all this circuit data? We show how to adapt recent ideas from large-scale data mining to ..."
Abstract
-
Cited by 19 (0 self)
- Add to MetaCart
The introduction of simulation-based analog synthesis tools creates a new challenge for analog modeling. These tools routinely visit 103 to 105 fully simulated circuit solution candidates. What might we do with all this circuit data? We show how to adapt recent ideas from large-scale data mining to build models that capture significant regions of this visited performance space, parametefized by variables manipulated by synthesis, trained by the data points visited during synthesis. Experimental restfits show that we can automatically build useful nonlinear regression models for large analog design spaces.
Policy Search using Paired Comparisons
- Journal of Machine Learning Research
, 2002
"... Direct policy search is a practical way to solve reinforcement learning (RL) problems involving continuous state and action spaces. The goal becomes finding policy parameters that maximize a noisy objective function. The Pegasus method converts this stochastic optimization problem into a determinist ..."
Abstract
-
Cited by 16 (3 self)
- Add to MetaCart
Direct policy search is a practical way to solve reinforcement learning (RL) problems involving continuous state and action spaces. The goal becomes finding policy parameters that maximize a noisy objective function. The Pegasus method converts this stochastic optimization problem into a deterministic one, by using fixed start states and fixed random number sequences for comparing policies (Ng and Jordan, 2000). We evaluate Pegasus, and new paired comparison methods, using the mountain car problem, and a difficult pursuer-evader problem. We conclude that: (i) paired tests can improve performance of optimization procedures; (ii) several methods are available to reduce the `overfitting' effect found with Pegasus; (iii) adapting the number of trials used for each comparison yields faster learning; (iv) pairing also helps stochastic search methods such as differential evolution.
Towards Scalable Support Vector Machines using Squashing
- In Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
, 2000
"... Support vector machines (SVMs) provide classification models with strong theoretical foundations as well as excellent empirical performance on a variety of applications. One of the major drawbacks of SVMs is the necessity to solve a large-scale quadratic programming problem. This paper combines like ..."
Abstract
-
Cited by 15 (2 self)
- Add to MetaCart
Support vector machines (SVMs) provide classification models with strong theoretical foundations as well as excellent empirical performance on a variety of applications. One of the major drawbacks of SVMs is the necessity to solve a large-scale quadratic programming problem. This paper combines likelihood-based squashing with a probabilistic formulation of SVMs, enabling fast training on squashed data sets. We reduce the problem of training the SVMs on the weighted "squashed" data to a quadratic programming problem and show that it can be solved using Platt's sequential minimal optimization (SMO) algorithm. We compare performance of the SMO algorithm on the squashed and the full data, as well as on simple random and boosted samples of the data. Experiments on a number of datasets show that squashing allows one to speed-up training, decrease memory requirements, and obtain parameter estimates close to that of the full data. More importantly, squashing produces close to optimal classific...
Sampling Strategies for Computer Experiments: Design and Analysis
, 2001
"... Computer-based simulation and analysis is used extensively in engineering for a variety of tasks. Despite the steady and continuing growth of computing power and speed, the computational cost of complex high-fidelity engineering analyses and simulations limit their use in important areas like design ..."
Abstract
-
Cited by 14 (2 self)
- Add to MetaCart
Computer-based simulation and analysis is used extensively in engineering for a variety of tasks. Despite the steady and continuing growth of computing power and speed, the computational cost of complex high-fidelity engineering analyses and simulations limit their use in important areas like design optimization and reliability analysis. Statistical approximation techniques such as design of experiments and response surface methodology are becoming widely used in engineering to minimize the computational expense of running such computer analyses and circumvent many of these limitations. In this paper, we compare and contrast five experimental design types and four approximation model types in terms of their capability to generate accurate approximations for two engineering applications with typical engineering behaviors and a wide range of nonlinearity. The first example involves the analysis of a two-member frame that has three input variables and three responses of interest. The second example simulates the roll-over potential of a semi-tractor-trailer for different combinations of input variables and braking and steering levels. Detailed error analysis reveals that uniform designs provide good sampling for generating accurate approximations using different sample sizes while kriging models provide accurate approximations that are robust for use with a variety of experimental designs and sample sizes.
Likelihood-based Data Squashing: A Modeling Approach to Instance Construction.
, 2002
"... Squashing is a lossy data compression technique that preserves statistical information. Specifically, squashing compresses a massive dataset to a much smaller one so that outputs from statistical analyses carried out on the smaller (squashed) dataset reproduce outputs from the same statistical analy ..."
Abstract
-
Cited by 14 (1 self)
- Add to MetaCart
Squashing is a lossy data compression technique that preserves statistical information. Specifically, squashing compresses a massive dataset to a much smaller one so that outputs from statistical analyses carried out on the smaller (squashed) dataset reproduce outputs from the same statistical analyses carried out on the original dataset. Likelihood-based data squashing (LDS) differs from a previously published squashing algorithm insofar as it uses a statistical model to squash the data. The results show that LDS provides excellent squashing performance even when the target statistical analysis departs from the model used to squash the data.

