Results 1  10
of
602
Survey of clustering algorithms
 IEEE TRANSACTIONS ON NEURAL NETWORKS
, 2005
"... Data analysis plays an indispensable role for understanding various phenomena. Cluster analysis, primitive exploration with little or no prior knowledge, consists of research developed across a wide variety of communities. The diversity, on one hand, equips us with many tools. On the other hand, the ..."
Abstract

Cited by 383 (3 self)
 Add to MetaCart
(Show Context)
Data analysis plays an indispensable role for understanding various phenomena. Cluster analysis, primitive exploration with little or no prior knowledge, consists of research developed across a wide variety of communities. The diversity, on one hand, equips us with many tools. On the other hand, the profusion of options causes confusion. We survey clustering algorithms for data sets appearing in statistics, computer science, and machine learning, and illustrate their applications in some benchmark data sets, the traveling salesman problem, and bioinformatics, a new field attracting intensive efforts. Several tightly related topics, proximity measure, and cluster validation, are also discussed.
Highdimensional data analysis: The curses and blessings of dimensionality
 AMS CONFERENCE ON MATH CHALLENGES OF THE 21ST CENTURY
, 2000
"... The coming century is surely the century of data. A combination of blind faith and serious purpose makes our society invest massively in the collection and processing of data of all kinds, on scales unimaginable until recently. Hyperspectral Imagery, Internet Portals, Financial tickbytick data, a ..."
Abstract

Cited by 141 (0 self)
 Add to MetaCart
The coming century is surely the century of data. A combination of blind faith and serious purpose makes our society invest massively in the collection and processing of data of all kinds, on scales unimaginable until recently. Hyperspectral Imagery, Internet Portals, Financial tickbytick data, and DNA Microarrays are just a few of the betterknown sources, feeding data in torrential streams into scientific and business databases worldwide. In traditional statistical data analysis, we think of observations of instances of particular phenomena (e.g. instance ↔ human being), these observations being a vector of values we measured on several variables (e.g. blood pressure, weight, height,...). In traditional statistical methodology, we assumed many observations and a few, wellchosen variables. The trend today is towards more observations but even more so, to radically larger numbers of variables – voracious, automatic, systematic collection of hyperinformative detail about each observed instance. We are seeing examples where the observations gathered on individual instances are curves, or spectra, or images, or
A Multivalued Logic Approach to Integrating Planning and Control
 Artificial Intelligence
, 1995
"... Intelligent agents embedded in a dynamic, uncertain environment should incorporate capabilities for both planned and reactive behavior. Many current solutions to this dual need focus on one aspect, and treat the other one as secondary. We propose an approach for integrating planning and control base ..."
Abstract

Cited by 114 (9 self)
 Add to MetaCart
(Show Context)
Intelligent agents embedded in a dynamic, uncertain environment should incorporate capabilities for both planned and reactive behavior. Many current solutions to this dual need focus on one aspect, and treat the other one as secondary. We propose an approach for integrating planning and control based on behavior schemas, which link physical movements to abstract action descriptions. Behavior schemas describe behaviors of an agent, expressed as trajectories of control actions in an environment, and goals can be defined as predicates on these trajectories. Goals and behaviors can be combined to produce conjoint goals and complex controls. The ability of multivalued logics to represent graded preferences allows us to formulate tradeoffs in the combination. Two composition theorems relate complex controls to complex goals, and provide the key to using standard knowledgebased deliberation techniques to generate complex controllers. We report experiments in planning and execution on a mobi...
Data Exploration Using SelfOrganizing Maps
 ACTA POLYTECHNICA SCANDINAVICA: MATHEMATICS, COMPUTING AND MANAGEMENT IN ENGINEERING SERIES NO. 82
, 1997
"... Finding structures in vast multidimensional data sets, be they measurement data, statistics, or textual documents, is difficult and timeconsuming. Interesting, novel relations between the data items may be hidden in the data. The selforganizing map (SOM) algorithm of Kohonen can be used to aid the ..."
Abstract

Cited by 107 (4 self)
 Add to MetaCart
Finding structures in vast multidimensional data sets, be they measurement data, statistics, or textual documents, is difficult and timeconsuming. Interesting, novel relations between the data items may be hidden in the data. The selforganizing map (SOM) algorithm of Kohonen can be used to aid the exploration: the structures in the data sets can be illustrated on special map displays. In this work, the methodology of using SOMs for exploratory data analysis or data mining is reviewed and developed further. The properties of the maps are compared with the properties of related methods intended for visualizing highdimensional multivariate data sets. In a set of case studies the SOM algorithm is applied to analyzing electroencephalograms, to illustrating structures of the standard of living in the world, and to organizing fulltext document collections. Measures are proposed for evaluating the quality of different types of maps in representing a given data set, and for measuring the robu...
Discovering hierarchy in reinforcement learning with hexq
 In Nineteenth International Conference on Machine Learning
, 2002
"... An open problem in reinforcement learning is discovering hierarchical structure. HEXQ, an algorithm which automatically attempts to decompose and solve a modelfree factored MDP hierarchically is described. By searching for aliased Markov subspace regions based on the state variables the algorithm ..."
Abstract

Cited by 90 (5 self)
 Add to MetaCart
An open problem in reinforcement learning is discovering hierarchical structure. HEXQ, an algorithm which automatically attempts to decompose and solve a modelfree factored MDP hierarchically is described. By searching for aliased Markov subspace regions based on the state variables the algorithm uses temporal and state abstraction to construct a hierarchy of interlinked smaller MDPs. 1.
Approximate Solutions to Markov Decision Processes
, 1999
"... One of the basic problems of machine learning is deciding how to act in an uncertain world. For example, if I want my robot to bring me a cup of coffee, it must be able to compute the correct sequence of electrical impulses to send to its motors to navigate from the coffee pot to my office. In fact, ..."
Abstract

Cited by 75 (10 self)
 Add to MetaCart
One of the basic problems of machine learning is deciding how to act in an uncertain world. For example, if I want my robot to bring me a cup of coffee, it must be able to compute the correct sequence of electrical impulses to send to its motors to navigate from the coffee pot to my office. In fact, since the results of its actions are not completely predictable, it is not enough just to compute the correct sequence; instead the robot must sense and correct for deviations from its intended path. In order for any machine learner to act reasonably in an uncertain environment, it must solve problems like the above one quickly and reliably. Unfortunately, the world is often so complicated that it is difficult or impossible to find the optimal sequence of actions to achieve a given goal. So, in order to scale our learners up to realworld problems, we usually must settle for approximate solutions. One representation for a learner's environment and goals is a Markov decision process or MDP. ...
Offline recognition of unconstrained handwritten texts using HMMs and statistical language models
 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
, 2004
"... This paper presents a system for the offline recognition of large vocabulary unconstrained handwritten texts. The only assumption made about the data is that it is written in English. This allows the application of Statistical Language Models in order to improve the performance of our system. Severa ..."
Abstract

Cited by 74 (9 self)
 Add to MetaCart
(Show Context)
This paper presents a system for the offline recognition of large vocabulary unconstrained handwritten texts. The only assumption made about the data is that it is written in English. This allows the application of Statistical Language Models in order to improve the performance of our system. Several experiments have been performed using both single and multiple writer data. Lexica of variable size (from 10,000 to 50,000 words) have been used. The use of language models is shown to improve the accuracy of the system (when the lexicon contains 50,000 words, error rate is reduced by ∼50 % for single writer data and by ∼25 % for multiple writer data). Our approach is described in detail and compared with other methods presented in the literature to deal with the same problem. An experimental setup to correctly deal with unconstrained text recognition is proposed.
Random search for hyperparameter optimization
 In: Journal of Machine Learning Research
"... Grid search and manual search are the most widely used strategies for hyperparameter optimization. This paper shows empirically and theoretically that randomly chosen trials are more efficient for hyperparameter optimization than trials on a grid. Empirical evidence comes from a comparison with a ..."
Abstract

Cited by 71 (5 self)
 Add to MetaCart
(Show Context)
Grid search and manual search are the most widely used strategies for hyperparameter optimization. This paper shows empirically and theoretically that randomly chosen trials are more efficient for hyperparameter optimization than trials on a grid. Empirical evidence comes from a comparison with a large previous study that used grid search and manual search to configure neural networks and deep belief networks. Compared with neural networks configured by a pure grid search, we find that random search over the same domain is able to find models that are as good or better within a small fraction of the computation time. Granting random search the same computational budget, random search finds better models by effectively searching a larger, less promising configuration space. Compared with deep belief networks configured by a thoughtful combination of manual search and grid search, purely random search over the same 32dimensional configuration space found statistically equal performance on four of seven data sets, and superior performance on one of seven. A Gaussian process analysis of the function from hyperparameters to validation set performance reveals that for most data sets only a few of the hyperparameters really matter, but that different hyperparameters are important on different data sets. This phenomenon makes
Optimal operation of multi reservoir systems: stateoftheart review
 J. Water Resour. Plann. Manag
, 2004
"... Abstract: With construction of new largescale water storage projects on the wane in the U.S. and other developed countries, attention must focus on improving the operational effectiveness and efficiency of existing reservoir systems for maximizing the beneficial uses of these projects. Optimal coor ..."
Abstract

Cited by 69 (0 self)
 Add to MetaCart
Abstract: With construction of new largescale water storage projects on the wane in the U.S. and other developed countries, attention must focus on improving the operational effectiveness and efficiency of existing reservoir systems for maximizing the beneficial uses of these projects. Optimal coordination of the many facets of reservoir systems requires the assistance of computer modeling tools to provide information for rational management and operational decisions. The purpose of this review is to assess the stateoftheart in optimization of reservoir system management and operations and consider future directions for additional research and application. Optimization methods designed to prevail over the highdimensional, dynamic, nonlinear, and stochastic characteristics of reservoir systems are scrutinized, as well as extensions into multiobjective optimization. Application of heuristic programming methods using evolutionary and genetic algorithms are described, along with application of neural networks and fuzzy rulebased systems for inferring reservoir system operating rules.
Algorithms for numerical analysis in high dimensions
 SIAM J. Sci. Comput
, 2005
"... Abstract. Nearly every numerical analysis algorithm has computational complexity that scales exponentially in the underlying physical dimension. The separated representation, introduced previously, allows many operations to be performed with scaling that is formally linear in the dimension. In this ..."
Abstract

Cited by 68 (10 self)
 Add to MetaCart
Abstract. Nearly every numerical analysis algorithm has computational complexity that scales exponentially in the underlying physical dimension. The separated representation, introduced previously, allows many operations to be performed with scaling that is formally linear in the dimension. In this paper we further develop this representation by: (i) discussing the variety of mechanisms that allow it to be surprisingly efficient; (ii) addressing the issue of conditioning; (iii) presenting algorithms for solving linear systems within this framework; and (iv) demonstrating methods for dealing with antisymmetric functions, as arise in the multiparticle Schrödinger equation in quantum mechanics. Numerical examples are given. Key words. curse of dimensionality; multidimensional function; multidimensional operator; algorithms in high dimensions; separation of variables; separated representation; alternating least squares; separationrank reduction; separated