Results 1  10
of
16
Joint distribution:
"... Learning in highdim. space is hard and expensive. Good news: intrinsic dimensionality is often low. Observations lie on a lowdim. manifold embedded in a highdim. space. Manifold learning: uncover the lowdim. manifold structure. Our Goal Recover data manifold in a Bayesian probabilistic way, whil ..."
Abstract
 Add to MetaCart
Learning in highdim. space is hard and expensive. Good news: intrinsic dimensionality is often low. Observations lie on a lowdim. manifold embedded in a highdim. space. Manifold learning: uncover the lowdim. manifold structure. Our Goal Recover data manifold in a Bayesian probabilistic way, while preserving geometric properties of local neighbourhoods. Advantages: Fully probabilistic. Uncertainty estimates available. Principled way to evaluate manifold dimensionality. Learned model can handle unseen data points naturally. Our Approach: LLLVM Assume a locally linear mapping between tangent spaces in low and high dimensional spaces highdimensional space lowdimension space T yi yi yj
Densitydifference estimation
 Neural Computation
"... We address the problem of estimating the difference between two probability densities. A naive approach is a twostep procedure of first estimating two densities separately and then computing their difference. However, such a twostep procedure does not necessarily work well because the first step i ..."
Abstract

Cited by 18 (17 self)
 Add to MetaCart
We address the problem of estimating the difference between two probability densities. A naive approach is a twostep procedure of first estimating two densities separately and then computing their difference. However, such a twostep procedure does not necessarily work well because the first step is performed without regard to the second step and thus a small estimation error incurred in the first stage can cause a big error in the second stage. In this paper, we propose a singleshot procedure for directly estimating the density difference without separately estimating two densities. We derive a nonparametric finitesample error bound for the proposed singleshot densitydifference estimator and show that it achieves the optimal convergence rate. We then show how the proposed densitydifference estimator can be utilized in L 2distance approximation. Finally, we experimentally demonstrate the usefulness of the proposed method in robust distribution comparison such as classprior estimation and changepoint detection. 1
QAST: Question Answering System for Thai Wikipedia
"... We propose an opendomain question answering system using Thai Wikipedia as the knowledge base. Two types of information are used for answering a question: (1) structured information extracted and stored in the form of Resource Description Framework (RDF), and (2) unstructured texts stored as a sear ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
We propose an opendomain question answering system using Thai Wikipedia as the knowledge base. Two types of information are used for answering a question: (1) structured information extracted and stored in the form of Resource Description Framework (RDF), and (2) unstructured texts stored as a search index. For the structured information, SPARQL transformed query is applied to retrieve a short answer from the RDF base. For the unstructured information, keywordbased query is used to retrieve the shortest text span containing the questions’s key terms. From the experimental results, the system which integrates both approaches could achieve an average MRR of 0.47 based on 215 test questions. 1
Summary
, 2015
"... Given two samples {xi}ni=1 ∼ P and {yi}ni=1 ∼ Q with unknown P,Q defined on Rd, the goal of the two sample test is to test the hypotheses H0: P = Q v.s. H1: P 6 = Q. A nonparameteric kernelbased test which considers such general alternatives was recently proposed by Gretton et al. [2012]. Although ..."
Abstract
 Add to MetaCart
Given two samples {xi}ni=1 ∼ P and {yi}ni=1 ∼ Q with unknown P,Q defined on Rd, the goal of the two sample test is to test the hypotheses H0: P = Q v.s. H1: P 6 = Q. A nonparameteric kernelbased test which considers such general alternatives was recently proposed by Gretton et al. [2012]. Although the power of the test was studied under the setting that n → ∞ with fixed d, it is unclear how the power is affected when (n, d) → ∞. The main contribution of the paper is to characterize the power of the linear MMD (with a Gaussian kernel) test under a meanshift alternative (i.e., H0: Ex∼P [x] = Ey∼Q[y] and H1:Ex∼P [x] 6 = Ey∼Q[y]) under the (n, d)→ ∞ setting. 1 Hypothesis testing Let X(n): = {xi}ni=1 and Y (n): = {yi}ni=1. A test is a function to a specific hypotheses H0 and H1, that takes X(n) and Y (n) and outputs either 0 or 1, where 1 indicates the rejection of H0, and 0 means failure to reject H0 due to insufficient evidence. The type1 error α or false positive rate is defined as α = p(reject H0  H0 is true). The type2 error β or false negative rate is defined as β = p(not reject H0  H1 is true). Generally decreasing one will increase the other. We refer to 1 − β as the power of the test i.e., the probability of correctly rejecting H0. Many tests compute a test statistic T: = T (X (n), Y (n)) and reject H0 if T> cα where the rejection threshold cα depends on the distribution of T under H0, and a prechosen significance level α. 1.1 Twosample test with MMD One of the most popular tests for nonparameteric twosample testing is the kernel twosample test proposed by Gretton et al. [2012]. The test uses maximum mean discrepancy (MMD) as the test statistic T. Given a symmetric positive definite kernel function k(x, y), MMD is defined as MMD2(P,Q): = Ex∼PEx′∼Pk(x, x′) + Ey∼QEy′∼Qk(y, y′) − 2Ex∼PEy∼Qk(x, y). An unbiased estimator is given by
HighDimensional Feature Selection by FeatureWise Kernelized Lasso
, 2013
"... The goal of supervised feature selection is to find a subset of input features that are responsible for predicting output values. The least absolute shrinkage and selection operator (Lasso) allows computationally efficient feature selection based on linear dependency between input features and outpu ..."
Abstract

Cited by 9 (2 self)
 Add to MetaCart
The goal of supervised feature selection is to find a subset of input features that are responsible for predicting output values. The least absolute shrinkage and selection operator (Lasso) allows computationally efficient feature selection based on linear dependency between input features and output values. In this paper, we consider a featurewise kernelized Lasso for capturing nonlinear inputoutput dependency. We first show that, with particular choices of kernel functions, nonredundant features with strong statistical dependence on output values can be found in terms of kernelbased independence measures such as the HilbertSchmidt independence criterion (HSIC). We then show that the globally optimal solution can be efficiently computed; this makes the approach scalable to highdimensional problems. The effectiveness of the proposed method is demonstrated through feature selection experiments for classification and regression with thousands of features.
1HighDimensional Feature Selection by FeatureWise Kernelized Lasso
"... The goal of supervised feature selection is to find a subset of input features that are responsible for predicting output values. The least absolute shrinkage and selection operator (Lasso) allows computationally efficient feature selection based on linear dependency between input features and outpu ..."
Abstract
 Add to MetaCart
The goal of supervised feature selection is to find a subset of input features that are responsible for predicting output values. The least absolute shrinkage and selection operator (Lasso) allows computationally efficient feature selection based on linear dependency between input features and output values. In this paper, we consider a featurewise kernelized Lasso for capturing nonlinear inputoutput dependency. We first show that, with particular choices of kernel functions, nonredundant features with strong statistical dependence on output values can be found in terms of kernelbased independence measures such as the HilbertSchmidt independence criterion (HSIC). We then show that the globally optimal solution can be efficiently computed; this makes the approach scalable to highdimensional problems. The effectiveness of the proposed method is demonstrated through feature selection experiments for classification and regression with thousands of features. 1
Squaredloss Mutual Information Regularization: A Novel Informationtheoretic Approach to Semisupervised Learning
"... We propose squaredloss mutual information regularization (SMIR) for multiclass probabilistic classification, following the information maximization principle. SMIR is convex under mild conditions and thus improves the nonconvexity of mutual information regularization. It offers all of the followin ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
We propose squaredloss mutual information regularization (SMIR) for multiclass probabilistic classification, following the information maximization principle. SMIR is convex under mild conditions and thus improves the nonconvexity of mutual information regularization. It offers all of the following four abilities to semisupervised algorithms: Analytical solution, outofsample/multiclass classification, and probabilistic output. Furthermore, novel generalization error bounds are derived. Experiments show SMIR compares favorably with stateoftheart methods. 1.
JustInTime Kernel Regression for Expectation Propagation
"... We propose an efficient nonparametric strategy for learning a message operator in expectation propagation (EP), which takes as input the set of incoming messages to a factor node, and produces an outgoing message as output. This learned operator replaces the multivariate integral required in class ..."
Abstract
 Add to MetaCart
We propose an efficient nonparametric strategy for learning a message operator in expectation propagation (EP), which takes as input the set of incoming messages to a factor node, and produces an outgoing message as output. This learned operator replaces the multivariate integral required in classical EP, which may not have an analytic expression. We use kernelbased regression, which is trained on a set of probability distributions representing the incoming messages, and the associated outgoing messages. The kernel approach has two main advantages: first, it is fast, as it is implemented using a novel twolayer random feature representation of the input message distributions; second, it has principled uncertainty estimates, and can be cheaply updated online, meaning it can request and incorporate new training data when it encounters inputs on which it is uncertain. In experiments, our approach is able to solve learning problems where a single message operator is required for multiple, substantially different data sets (logistic regression for a variety of classification problems), where it is essential to accurately assess uncertainty and to efficiently and robustly update the message operator. 1.
Notes on mean embeddings and covariance operators
, 2015
"... This note contains more detailed proofs of certain results in the lecture notes on mean embeddings and covariance operators. The notes are not as complete as for lectures 1 and 2, but cover only the trickier concepts. Please let me know ..."
Abstract
 Add to MetaCart
This note contains more detailed proofs of certain results in the lecture notes on mean embeddings and covariance operators. The notes are not as complete as for lectures 1 and 2, but cover only the trickier concepts. Please let me know
Results 1  10
of
16