Results 1  10
of
27
Vector Machines, and Parzen Window Classifiers
"... Abstract: This paper presents a unifying view of two wellknown kernelbased classifiers, namely support vector machines (SVMs) and Parzen window classifiers. In particular, given the training data, both learning algorithms can be viewed as a solution to a regularization problem on probability distr ..."
Abstract
 Add to MetaCart
Abstract: This paper presents a unifying view of two wellknown kernelbased classifiers, namely support vector machines (SVMs) and Parzen window classifiers. In particular, given the training data, both learning algorithms can be viewed as a solution to a regularization problem on probability distributions, depending on how the distributions are constructed from the training data. This simple insight may shed light on the unification of various kernelbased learning algorithms.
Learning from distributions via support measure machines
 Advances in Neural Information Processing Systems 25
, 2012
"... This paper presents a kernelbased discriminative learning framework on probability measures. Rather than relying on large collections of vectorial training examples, our framework learns using a collection of probability distributions that have been constructed to meaningfully represent training da ..."
Abstract

Cited by 28 (10 self)
 Add to MetaCart
This paper presents a kernelbased discriminative learning framework on probability measures. Rather than relying on large collections of vectorial training examples, our framework learns using a collection of probability distributions that have been constructed to meaningfully represent training data. By representing these probability distributions as mean embeddings in the reproducing kernel Hilbert space (RKHS), we are able to apply many standard kernelbased learning techniques in straightforward fashion. To accomplish this, we construct a generalization of the support vector machine (SVM) called a support measure machine (SMM). Our analyses of SMMs provides several insights into their relationship to traditional SVMs. Based on such insights, we propose a flexible SVM (FlexSVM) that places different kernel functions on each training example. Experimental results on both synthetic and realworld data demonstrate the effectiveness of our proposed framework. 1
Supplementary Material to Learning from Distributions via Support Measure Machines
"... increasing function Ω: [0,+∞) → R, and a loss function ℓ: (P × R2)m → R ∪ {+∞}, any f ∈ H minimizing the regularized risk functional ℓ (P1, y1,EP1 [f],...,Pm, ym,EPm [f]) + Ω (‖f‖H) (1) admits a representation of the form f =∑mi=1 αiµPi for some αi ∈ R, i = 1,...,m. Proof. By virtue of Proposition ..."
Abstract
 Add to MetaCart
increasing function Ω: [0,+∞) → R, and a loss function ℓ: (P × R2)m → R ∪ {+∞}, any f ∈ H minimizing the regularized risk functional ℓ (P1, y1,EP1 [f],...,Pm, ym,EPm [f]) + Ω (‖f‖H) (1) admits a representation of the form f =∑mi=1 αiµPi for some αi ∈ R, i = 1,...,m. Proof. By virtue of Proposition 2 in [1], the linear functional EP[·] are bounded for all P ∈ P. Then, given P1,P2,...,Pm, any f ∈ H can be decomposed as f = fµ + f where fµ ∈ H lives in the span of µPi, i.e., fµ = ∑m i=1 αiµPi and f ⊥ ∈ H satisfying, for all j, 〈f⊥, µPj 〉 = 0. Hence, for all j, we have EPj [f] = EPj [fµ + f ⊥] = 〈fµ + f ⊥, µPj 〉 = 〈fµ, µPj 〉+ 〈f ⊥, µPj 〉 = 〈fµ, µPj 〉 which is independent of f⊥. As a result, the loss functional ℓ in (1) does not depend on f⊥. For the regularization functional Ω, since f ⊥ is orthogonal to ∑m i=1 αiµPi and Ω is strictly monotonically increasing, we have Ω(‖f‖) = Ω(‖fµ + f ⊥‖) = Ω( ‖fµ‖2 + ‖f⊥‖2) ≥ Ω(‖fµ‖) with equality if and only if f ⊥ = 0 and thus f = fµ. Consequently, any minimizer must take the form f = ∑m i=1 αiµPi = ∑m i=1 αiEPi [k(x, ·)].
From Data Points to Probability Measures Data Points Dirac Measures Probability Measures
"... Risk Deviation Bound: Given an arbitrary distribution P with finite variance σ 2, a Lipschitz continuous function f: R → R with constantCf, an arbitrary loss functionℓ: R×R → R that is Lipschitz continuous in the second argument with constantCℓ, it follows, for anyy ∈ R, that Ex∼P[ℓ(y,f(x))]−ℓ(y,Ex ..."
Abstract
 Add to MetaCart
Risk Deviation Bound: Given an arbitrary distribution P with finite variance σ 2, a Lipschitz continuous function f: R → R with constantCf, an arbitrary loss functionℓ: R×R → R that is Lipschitz continuous in the second argument with constantCℓ, it follows, for anyy ∈ R, that Ex∼P[ℓ(y,f(x))]−ℓ(y,Ex∼P[f(x)])  ≤ 2CℓCfσ Potential applications: Learning with noisy/uncertain examples (astronomical/biological data). Learning from groups of samples (population genetics, group anomaly detection, and preference learning). Learning under changing environments (domain adaptation/generalization). Largescale machine learning (data squashing). Hilbert Space Embedding The kernel mean map from a space of distributions P into a reproducing kernel Hilbert space (RKHS) H: µ: P → H, P ↦− → k(x,·)dP(x). The kernelk is said to be characteristic if and only if the mapµis injective, i.e., there is no loss of information. Representer Theorem Given training examples (Pi,yi) ∈ P × R, i =
PACS (Picture Archiving Communication System) for Dentistry
"... Abstract This paper proposes PACS (Picture Archiving Communication System) to manage and transfer information for dental field focusing on 2 main fields as follows. First application was to open Digital Imaging and Communications in Medicine (DICOM) files of patients inside the database via Local Ar ..."
Abstract
 Add to MetaCart
Abstract This paper proposes PACS (Picture Archiving Communication System) to manage and transfer information for dental field focusing on 2 main fields as follows. First application was to open Digital Imaging and Communications in Medicine (DICOM) files of patients inside the database via Local Area Network (LAN) and Hypertext Transfer Protocol [HTTP]. Second application was to pass patient’s personal data and treatment data on the network by applying MySQL database [4] with Graphic User Interface (GUI) implement using Borland C++ BuilderTM.
unknown title
"... This paper investigates domain generalization: How to take knowledge acquired from an arbitrary number of related domains and apply it to previously unseen domains? We propose DomainInvariant Component Analysis (DICA), a kernelbased optimization algorithm that learns an invariant transformation by ..."
Abstract
 Add to MetaCart
This paper investigates domain generalization: How to take knowledge acquired from an arbitrary number of related domains and apply it to previously unseen domains? We propose DomainInvariant Component Analysis (DICA), a kernelbased optimization algorithm that learns an invariant transformation by minimizing the dissimilarity across domains, whilst preserving the functional relationship between input and output variables. A learningtheoretic analysis shows that reducing dissimilarity improves the expected generalization ability of classifiers on new domains, motivating the proposed algorithm. Experimental results on synthetic and realworld datasets demonstrate that DICA successfully learns invariant features and improves classifier performance in practice. Domain Generalization Standard Setting: Assume that the training data and test data come from the same distribution, learn a classifier/regressor that generalizes well to the test data. Domain Adaptation: the training data and test data may come from different distributions. The common assumption is that we observe the test data at the training time. Adapt the classifier/regressor trained using the training data to the specific set of test data. Covariate Shift: The marginal P(X) changes, but the conditional P(Y X) stays the same. Target Shift/Concept Drift The marginal P(Y) or conditional P(Y X) may also change. Domain Generalization: The training data comes from different distributions. Learn a classifier/regressor that generalizes well to the unseen test data, which also comes from different distribution. Applications: medical diagnosis: aggregating the diagnosis of previous patients to the new patients who have similar demographic and medical profiles. training data unseen test data
∥∥∥∥2 −∆
"... (i) Since µ̌λ = µ ̂ λ λ+1 = µ̂Pλ+1, we have ‖µ̌λ − µP ‖ = ∥∥∥ ∥ µ̂Pλ+ 1 − µP ∥∥∥ ∥ ≤ ∥∥∥ ∥ µ̂Pλ+ 1 − µPλ+ 1 ∥∥∥∥+ ∥∥∥ ∥ µPλ+ 1 − µP ∥∥∥ ∥ ≤ ‖µ̂P − µP‖+ λ‖µP‖. From [1], we have that ‖µ̂P − µP ‖ = OP(n−1/2) and therefore the result follows. (ii) Define ∆: = EP‖µ̂P − µP‖2 = k(x,x) dP(x)−‖µP‖ 2 n. C ..."
Abstract
 Add to MetaCart
(i) Since µ̌λ = µ ̂ λ λ+1 = µ̂Pλ+1, we have ‖µ̌λ − µP ‖ = ∥∥∥ ∥ µ̂Pλ+ 1 − µP ∥∥∥ ∥ ≤ ∥∥∥ ∥ µ̂Pλ+ 1 − µPλ+ 1 ∥∥∥∥+ ∥∥∥ ∥ µPλ+ 1 − µP ∥∥∥ ∥ ≤ ‖µ̂P − µP‖+ λ‖µP‖. From [1], we have that ‖µ̂P − µP ‖ = OP(n−1/2) and therefore the result follows. (ii) Define ∆: = EP‖µ̂P − µP‖2 = k(x,x) dP(x)−‖µP‖ 2 n. Consider EP‖µ̌λ − µP‖2 − ∆ = EP ∥∥∥ ∥ nβnβ + c (µ̂P − µP) − µP
Oneclass support measure machines for group anomaly detection
 In Proceedings of the 29th Conference on Uncertainty in Artificial Intelligence (UAI
, 1987
"... We propose oneclass support measure machines (OCSMMs) for group anomaly detection. Unlike traditional anomaly detection, OCSMMs aim at recognizing anomalous aggregate behaviors of data points. The OCSMMs generalize wellknown oneclass support vector machines (OCSVMs) to a space of probability ..."
Abstract

Cited by 7 (4 self)
 Add to MetaCart
We propose oneclass support measure machines (OCSMMs) for group anomaly detection. Unlike traditional anomaly detection, OCSMMs aim at recognizing anomalous aggregate behaviors of data points. The OCSMMs generalize wellknown oneclass support vector machines (OCSVMs) to a space of probability measures. By formulating the problem as quantile estimation on distributions, we can establish interesting connections to the OCSVMs and variable kernel density estimators (VKDEs) over the input space on which the distributions are defined, bridging the gap between largemargin methods and kernel density estimators. In particular, we show that various types of VKDEs can be considered as solutions to a class of regularization problems studied in this paper. Experiments on Sloan Digital Sky Survey dataset and High Energy Particle Physics dataset demonstrate the benefits of the proposed framework in realworld applications. 1
Domain Generalization via Invariant Feature Representation
"... This paper investigates domain generalization: How to take knowledge acquired from an arbitrary number of related domains and apply it to previously unseen domains? We propose DomainInvariant Component Analysis (DICA), a kernelbased optimization algorithm that learns an invariant transformation by ..."
Abstract

Cited by 6 (1 self)
 Add to MetaCart
This paper investigates domain generalization: How to take knowledge acquired from an arbitrary number of related domains and apply it to previously unseen domains? We propose DomainInvariant Component Analysis (DICA), a kernelbased optimization algorithm that learns an invariant transformation by minimizing the dissimilarity across domains, whilst preserving the functional relationship between input and output variables. A learningtheoretic analysis shows that reducing dissimilarity improves the expected generalization ability of classifiers on new domains, motivating the proposed algorithm. Experimental results on synthetic and realworld datasets demonstrate that DICA successfully learns invariant features and improves classifier performance in practice. 1.
Results 1  10
of
27