Results 1  10
of
95
Efficient Distributionfree Learning of Probabilistic Concepts
 Journal of Computer and System Sciences
, 1993
"... In this paper we investigate a new formal model of machine learning in which the concept (boolean function) to be learned may exhibit uncertain or probabilistic behaviorthus, the same input may sometimes be classified as a positive example and sometimes as a negative example. Such probabilistic c ..."
Abstract

Cited by 197 (8 self)
 Add to MetaCart
In this paper we investigate a new formal model of machine learning in which the concept (boolean function) to be learned may exhibit uncertain or probabilistic behaviorthus, the same input may sometimes be classified as a positive example and sometimes as a negative example. Such probabilistic concepts (or pconcepts) may arise in situations such as weather prediction, where the measured variables and their accuracy are insufficient to determine the outcome with certainty. We adopt from the Valiant model of learning [27] the demands that learning algorithms be efficient and general in the sense that they perform well for a wide class of pconcepts and for any distribution over the domain. In addition to giving many efficient algorithms for learning natural classes of pconcepts, we study and develop in detail an underlying theory of learning pconcepts. 1 Introduction Consider the following scenarios: A meteorologist is attempting to predict tomorrow's weather as accurately as pos...
A Survey of Dimension Reduction Techniques
, 2002
"... this paper, we assume that we have n observations, each being a realization of the p dimensional random variable x = (x 1 , . . . , x p ) with mean E(x) = = ( 1 , . . . , p ) and covariance matrix E{(x )(x = # pp . We denote such an observation matrix by X = i,j : 1 p, 1 ..."
Abstract

Cited by 87 (0 self)
 Add to MetaCart
this paper, we assume that we have n observations, each being a realization of the p dimensional random variable x = (x 1 , . . . , x p ) with mean E(x) = = ( 1 , . . . , p ) and covariance matrix E{(x )(x = # pp . We denote such an observation matrix by X = i,j : 1 p, 1 n}. If i and # i = # (i,i) denote the mean and the standard deviation of the ith random variable, respectively, then we will often standardize the observations x i,j by (x i,j i )/ # i , where i = x i = 1/n j=1 x i,j , and # i = 1/n j=1 (x i,j x i )
Additivity in protein–DNA interactions: how good an approximation is it
 Nucleic Acids Res
, 2002
"... Man and Stormo and Bulyk et al. recently presented their results on the study of the DNA binding af®nity of proteins. In both of these studies the main conclusion is that the additivity assumption, usually applied in methods to search for binding sites, is not true. In the ®rst study, the analysis o ..."
Abstract

Cited by 79 (11 self)
 Add to MetaCart
Man and Stormo and Bulyk et al. recently presented their results on the study of the DNA binding af®nity of proteins. In both of these studies the main conclusion is that the additivity assumption, usually applied in methods to search for binding sites, is not true. In the ®rst study, the analysis of binding af®nity data from the Mnt repressor protein bound to all possible DNA (sub)targets at positions 16 and 17 of the binding site, showed that those positions are not independent. In the second study, the authors analysed DNA binding af®nity data of the wildtype mouse EGR1 protein and four variants differing on the middle ®nger. The binding af®nity of these proteins was measured to all 64 possible trinucleotide (sub)targets of the middle ®nger using microarray technology. The analysis of the measurements also showed interdependence among the positions in the DNA target. In the present report, we review the data of both studies and we reanalyse them using various statistical methods, including a comparison with a multiple regression approach. We conclude that despite the fact that the additivity assumption does not ®t the data perfectly, in most cases it provides a very good approximation of the true nature of the speci®c protein±DNA interactions. Therefore, additive models can be very useful for the discovery and prediction of binding sites in genomic DNA.
An experimental comparison of several clustering and intialization methods
, 1998
"... We examine methods for clustering in high dimensions. In the first part of the paper, we perform an experimental comparison between three batch clustering algorithms: the Expectation–Maximization (EM) algorithm, a “winner take all ” version of the EM algorithm reminiscent of the Kmeans algorithm, a ..."
Abstract

Cited by 78 (0 self)
 Add to MetaCart
We examine methods for clustering in high dimensions. In the first part of the paper, we perform an experimental comparison between three batch clustering algorithms: the Expectation–Maximization (EM) algorithm, a “winner take all ” version of the EM algorithm reminiscent of the Kmeans algorithm, and modelbased hierarchical agglomerative clustering. We learn naiveBayes models with a hidden root node, using highdimensional discretevariable data sets (both real and synthetic). We find that the EM algorithm significantly outperforms the other methods, and proceed to investigate the effect of various initialization schemes on the final solution produced by the EM algorithm. The initializations that we consider are (1) parameters sampled from an uninformative prior, (2) random perturbations of the marginal distribution of the data, and (3) the output of hierarchical agglomerative clustering. Although the methods are substantially different, they lead to learned models that are strikingly similar in quality. 1
Twitter Power: Tweets as Electronic Word of Mouth
"... In this paper we report research results investigating microblogging as a form of electronic wordofmouth for sharing consumer opinions concerning brands. We analyzed more than 150,000 microblog postings containing branding comments, sentiments, and opinions.We investigated the overall structure of ..."
Abstract

Cited by 77 (1 self)
 Add to MetaCart
In this paper we report research results investigating microblogging as a form of electronic wordofmouth for sharing consumer opinions concerning brands. We analyzed more than 150,000 microblog postings containing branding comments, sentiments, and opinions.We investigated the overall structure of these microblog postings, the types of expressions, and the movement in positive or negative sentiment.We compared automated methods of classifying sentiment in these microblogs with manual coding. Using a case study approach, we analyzed the range, frequency, timing, and content of tweets in a corporate account. Our research findings show that 19% of microblogs contain mention of a brand. Of the branding microblogs, nearly 20 % contained some expression of brand sentiments. Of these, more than 50 % were positive and 33 % were critical of the company or product. Our comparison of automated and manual coding showed no significant differences between the two approaches. In analyzing microblogs for structure and composition, the linguistic structure of tweets approximate the linguistic patterns of natural language expressions. We find that microblogging is an online tool for customer word of mouth communications and discuss the implications for corporations using microblogging as part of their overall marketing strategy.
The psychometric function: I. Fitting, sampling, and goodness of fit
, 2001
"... The psychometric function relates an observer’s performance to an independent variable, usually some physical quantity of a stimulus in a psychophysical task. This paper, together with its companion paper (Wichmann & Hill, 2001), describes an integrated approach to (1) fitting psychometric functions ..."
Abstract

Cited by 70 (10 self)
 Add to MetaCart
The psychometric function relates an observer’s performance to an independent variable, usually some physical quantity of a stimulus in a psychophysical task. This paper, together with its companion paper (Wichmann & Hill, 2001), describes an integrated approach to (1) fitting psychometric functions, (2) assessing the goodness of fit, and (3) providing confidence intervals for the function’s parameters and other estimates derived from them, for the purposes of hypothesis testing. The present paper deals with the first two topics, describing a constrained maximumlikelihood method of parameter estimation and developing several goodnessoffit tests. Using Monte Carlo simulations, we deal with two specific difficulties that arise when fitting functions to psychophysical data. First, we note that human observers are prone to stimulusindependent errors (or lapses). We show that failure to account for this can lead to serious biases in estimates of the psychometric function’s parameters and illustrate how the problem may be overcome. Second, we note that psychophysical data sets are usually rather small by the standards required by most of the commonly applied statistical tests. We demonstrate the potential errors of applying traditional c 2 methods to psychophysical data and advocate use of Monte Carlo resampling techniques that do not rely on asymptotic theory. We have made available the software to implement our methods. The performance of an observer on a psychophysical
ExperienceDependent Integration of Texture and Motion Cues to Depth
, 1999
"... Previous investigators have shown that observers' visual cue combination strategies are remarkably flexible in the sense that these strategies adapt on the basis of the estimated reliabilities of the visual cues. However, these researchers have not addressed how observers' acquire these estimated re ..."
Abstract

Cited by 44 (3 self)
 Add to MetaCart
Previous investigators have shown that observers' visual cue combination strategies are remarkably flexible in the sense that these strategies adapt on the basis of the estimated reliabilities of the visual cues. However, these researchers have not addressed how observers' acquire these estimated reliabilities. This article studies observers' abilities to learn cue combination strategies. Subjects made depth judgments about simulated cylinders whose shapes were indicated by motion and texture cues. Because the two cues could indicate different shapes, it was possible to design tasks in which one cue provided useful information for making depth judgments, whereas the other cue was irrelevant. The results of experiment 1 suggest that observers' cue combination strategies are adaptable as a function of training; subjects adjusted their cue combination rules to use a cue more heavily when the cue was informative on a task versus when the cue was irrelevant. Experiment 2 demonstrated that experiencedependent adaptation of cue combination rules is contextsensitive. On trials with presentations of short cylinders, one cue was informative, whereas on trials with presentations of tall cylinders, the other cue was informative. The results suggest that observers can learn multiple cue combination rules, and can learn to apply each rule in the appropriate context. Experiment 3 demonstrated a possible limitation on the contextsensitivity of adaptation of cue combination rules. One cue was informative on trials with presentations of cylinders at a left oblique orientation, whereas the other cue was informative on trials with presentations of cylinders at a right oblique orientation. The results indicate that observers did not learn to use different cue combination rules in differe...
Perceptual Distortion Contributes to the Curvature of Human Reaching Movements.
 Experimental Brain Research
, 1994
"... Unconstrained pointtopoint human arm movements are generally gently curved, a fact which has been used to assess the validity of models of trajectory formation. In this study we examined the relationship between curvature perception and movement curvature for planar sagittal and transverse arm mov ..."
Abstract

Cited by 16 (3 self)
 Add to MetaCart
Unconstrained pointtopoint human arm movements are generally gently curved, a fact which has been used to assess the validity of models of trajectory formation. In this study we examined the relationship between curvature perception and movement curvature for planar sagittal and transverse arm movements. We found a significant correlation (p ! 0:0001, n = 16) between the curvature perceived as straight and the curvature of actual arm movements. We suggest that subjects try to make straightline movements, but that actual movements are curved because visual perceptual distortion makes the movements appear to be straighter than they really are. We conclude that perceptual distortion of curvature contributes to the curvature seen in human pointtopoint arm movements and that this must be taken into account in the assessment of models of trajectory formation. Introduction There are several invariant features of pointtopoint human arm movements: trajectories 1 tend to be gently curv...
Control variates for probability and quantile estimation
, 1998
"... In stochastic systems, quantiles indicate the level of system performance that can be delivered with a specified probability, while probabilities indicate the likelihood that a specified level of system performance can be achieved. We present new estimators for use in simulation experiments designed ..."
Abstract

Cited by 13 (1 self)
 Add to MetaCart
In stochastic systems, quantiles indicate the level of system performance that can be delivered with a specified probability, while probabilities indicate the likelihood that a specified level of system performance can be achieved. We present new estimators for use in simulation experiments designed to estimate such quantiles or probabilities of system performance. All of the estimators exploit control variates to increase their precision, which is especially important when extreme quantiles (in the tails of the distribution of system performance) or extreme probabilities (near zero or one) are of interest. Control variates are auxiliary random variables with known properties—in this case, known quantiles—and a strong stochastic association with the performance measure of interest. Since transforming a control variate can increase its effectiveness, we propose both continuous and discrete approximations to the optimal (varianceminimizing) transformation for estimating probabilities, and then invert the probability estimators to obtain corresponding quantile estimators. We also propose a direct controlvariate quantile estimator that is not based on inverting a probability estimator. An empirical study using queueing, inventory and projectplanning examples shows that substantial reductions in mean squared error can be obtained when estimating the 0.9, 0.95, and 0.99 quantiles. (Simulation; Variance Reduction; Control Variates; Statistics) 1.