#### DMCA

## Batch and online learning algorithms for Nonconvex Neyman-Pearson classification

### Citations

12872 | Statistical Learning Theory
- Vapnik
- 1998
(Show Context)
Citation Context ...oose the decision function f within a restricted class H that is unlikely to contain the optimal decision function. This approach is supported by standard results in statistical learning theory [e.g. =-=Vapnik 1998-=-] and their extension to the Neyman-Pearson formulation [Scott and Nowak 2005]. Secondly, the empirical counterparts of problems (1) and (2) involve the 0–1 loss function Iy f(x)≤0 that is neither con... |

1458 |
Fast training of support vector machines using sequential minimal optimization
- Platt
- 1999
(Show Context)
Citation Context ... the cost assigned to positive samples. Both algorithms were tested with these hyperparameters for values of ρ in {0.01, 0.05, 0.1, 0.2}. The algorithms were implemented in C++ using a SMO optimizer [=-=Platt 1999-=-] and a kernel cache. The cache vastly speeds up the algorithms because kernel matrix coefficients computed during the earlier iterations can be used during the later iterations. Table II reports resu... |

768 |
Statistical significance for genome-wide studies
- Storey, Tibshirani
- 2003
(Show Context)
Citation Context ... events are tested against a null hypothesis to discriminate real biological effects from noisy data. In microarray experiments, the expression level of genes is tested against the background signal [=-=Storey and Tibshirani 2003-=-]. In mass spectrometry protein screening, potentially positive matches between spectra and peptide sequences are tested for correctness against peptide-spectrum matches that are known to be negative ... |

298 | A support vector method for multivariate performance measures
- Joachims
- 2005
(Show Context)
Citation Context ...loss, such as the guaranteed convergence to global optimum, are balanced by the necessity of searching for different asymmetric costs to achieve the desired NP constraint [Bach et al. 2006]. SVMPerf [=-=Joachims 2005-=-] optimizes in polynomial time a convex upper bound of any performance measures computable from the confusion table. Since ˜ Pfa and ˜ Pnd can be computed from the confusion table, SVMPerf can address... |

276 | Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry - Elias, Gygi |

193 | A dual coordinate descent method for largescale linear SVM
- Hsieh, Chang, et al.
- 2008
(Show Context)
Citation Context ...e datasets Covertype and RCV1-V2 were carried out using a number of speed optimizations. The SVM solver for AC-SVM was replaced by LIBLINEAR which is one of the most efficient solvers for linear SVM [=-=Hsieh et al. 2008-=-]. For the ONP-SVM experiments on RCV1-V2, we modified the stochastic gradient code of [Bottou 2007] because it handles sparse vectors more efficiently than our baseline code. 6 During experiments car... |

145 | AUC optimization vs. error rate minimization - Cortes, Mohri - 2004 |

101 | G.: Introduction to Numerical Linear Algebra and Optimization - Ciarlet - 1989 |

91 | Large scale transductive SVMs - Collobert, Sinz, et al. |

87 | Trading convexity for scalability - Collobert, Sinz, et al. - 2006 |

43 |
Adaptation and Learning in Automatic Systems
- Tsypkin
- 1971
(Show Context)
Citation Context .... They differ in the minimization step. The first algorithm uses the DC1 approach [Tao and An 1998] and is suitable for kernel machines. The second algorithm relies on a stochastic gradient approach [=-=Tsypkin 1971-=-; Andrieu et al. 2007] suitable for processing large datasets. 3.2 Batch learning of NP-SVM The most difficult step in the Uzawa algorithm is minimization of L over f for λ fixed. In the case of the S... |

37 | Considering Cost Asymmetry in Learning Classifiers
- Bach, Heckerman, et al.
(Show Context)
Citation Context ... of problems (1) and (2) involve the 0–1 loss function Iy f(x)≤0 that is neither continuous nor convex. Replacing this 0– 1 loss with the SVM Hinge loss has been studied for both the Asymmetric Cost [=-=Bach et al. 2006-=-] and Neyman-Pearson [Davenport et al. 2010] formulations. This substitution introduces additional complexities. In particular, in order to hit the specified goals on Pnd and Pfa, one must use asymmet... |

37 | A Neyman-Pearson approach to statistical learning
- Scott, Nowak
- 2005
(Show Context)
Citation Context ...ikely to contain the optimal decision function. This approach is supported by standard results in statistical learning theory [e.g. Vapnik 1998] and their extension to the Neyman-Pearson formulation [=-=Scott and Nowak 2005-=-]. Secondly, the empirical counterparts of problems (1) and (2) involve the 0–1 loss function Iy f(x)≤0 that is neither continuous nor convex. Replacing this 0– 1 loss with the SVM Hinge loss has been... |

26 | Ranking the best instances
- Clemencon, Vayatis
(Show Context)
Citation Context ...ion is most accurate in the threshold area. Theoretical investigations of this problem conclude that Neyman-Pearson classification remains an important primitive for such focussed ranking algorithms [=-=Clémençon and Vayatis 2007-=-; 2009]. This contribution proposes two practical and efficient algorithms for NP classification using nonconvex but continuous and mostly differentiable loss functions. In particular, these algorithm... |

24 |
A semi-supervised machine learning technique for peptide identification from shotgun proteomics datasets
- Käll
- 2007
(Show Context)
Citation Context ...e uncertain nature of the positive labels. Then, they use the ordering induced by the classifier to assign q-values, and, finally, select a subset of examples with q-values below a desired threshold [=-=Käll et al. 2007-=-]. This approach is equivalent to constructing an ROC curve of the classifier and taking a single point on the ROC curve. Alternatively, Spivak et al. [2009] propose an algorithm that directly optimiz... |

16 | An exponential lower bound on the complexity of regularization paths
- Gärtner, Jaggi, et al.
(Show Context)
Citation Context ...path procedure [Yu et al. 2009]. However, this would increase the computing times. In particular the regularization path approach, in worst case scenario, has to visit an exponential number of kinks [=-=Gärtner et al. 2009-=-]. SVMPerf matches the speed and the accuracy of ONP-SVM on the Spambase and Pageblocks datasets. It remains competitive in speed on the GammaTelescope dataset, but shows poor accuracies for small val... |

15 | Overlaying classifiers: a practical approach to optimal scoring - Clémençon, Vayatis - 2010 |

12 | Improvements to the percolator algorithm for peptide identification from shotgun proteomics data sets
- Spivak, Weston, et al.
- 2009
(Show Context)
Citation Context ...ts are used as positive examples. The q-value optimization experiments were carried out using a proteomics dataset consisting of 139410 samples with positive and negative samples equally represented [=-=Spivak et al. 2009-=-]. 5.2 Speedups achieved by the Annealed NonConvex NP-SVM algorithm This section compares the performance of the Annealed NonConvex NP-SVM (algorithm 3) with the performance of a more direct implement... |

7 | IEEE—―Tuning Support Vector Machines for Minimax and Neyman-Pearson Classification
- Davenport, Member, et al.
(Show Context)
Citation Context ...–1 loss function Iy f(x)≤0 that is neither continuous nor convex. Replacing this 0– 1 loss with the SVM Hinge loss has been studied for both the Asymmetric Cost [Bach et al. 2006] and Neyman-Pearson [=-=Davenport et al. 2010-=-] formulations. This substitution introduces additional complexities. In particular, in order to hit the specified goals on Pnd and Pfa, one must use asymmetric costs that are different from C+ and C−... |

6 |
Imbalanced learning with a biased minimax probability machine
- Huang, Yang, et al.
- 2006
(Show Context)
Citation Context ... ˆ Σ−w where Φ is the cumulative distribution function of standard normal distribution and ˆµ ± and ˆ Σ± are empirical estimations. Since the Gaussian assumption proves too restrictive, some authors [=-=Huang et al. 2006-=-; Kim et al. 2006] replace the cumulative Φ by a Chebyshev bound Ψ(u) = [u] 2 +/(1 + [u] 2 +), [u]+ = max(0, u). The scheme is extended to nonlinear discrimination using the kernel trick. A third flav... |

5 | Pareto optimal linear classification
- Kim, Magnani, et al.
- 2006
(Show Context)
Citation Context ...-layered neural network to estimate class-conditional distributions as mixture of Gaussians. The discriminant function is then inferred with a likelihood ratio test. In the same vein, recent methods [=-=Kim et al. 2006-=-] assume the class-conditional distributions are Gaussian with means µ± and covariances Σ±, and consider a linear classifier f(x) = 〈w, x〉+ b. This amounts to solving (2) with the following definition... |

4 | Stochastic programming with probability constraints. http://fr.arxiv.org/abs/0708.0281 (Accessed 24
- Andrieu, Cohen, et al.
- 2007
(Show Context)
Citation Context ...in the minimization step. The first algorithm uses the DC1 approach [Tao and An 1998] and is suitable for kernel machines. The second algorithm relies on a stochastic gradient approach [Tsypkin 1971; =-=Andrieu et al. 2007-=-] suitable for processing large datasets. 3.2 Batch learning of NP-SVM The most difficult step in the Uzawa algorithm is minimization of L over f for λ fixed. In the case of the SVM classifier, the La... |

3 |
Learning with large datasets
- Bottou
- 2007
(Show Context)
Citation Context ... point is guaranteed under mild conditions [Tsypkin 1971; Andrieu et al. 2007]. Such direct results are in fact stronger than those available for the generic Uzawa approach. Linear NP-SVM. Following [=-=Bottou 2007-=-], when training a linear SVM (that is f(x) = 〈w, x〉 + b), we choose a learning rate γt = γ0(1 + λc t) −1 with an initial learning rate γ0 chosen to ensure that the initial updates of w are compatible... |

3 |
General solution and learning method for binary classification with performance constraints
- Bounsiar, Beauseroy, et al.
(Show Context)
Citation Context ...max(0, u). The scheme is extended to nonlinear discrimination using the kernel trick. A third flavor of generative approach addresses the estimation of class-condition distributions by Parzen window [=-=Bounsiar et al. 2008-=-]. These generative methods share the same drawbacks: (1) the final classifiers are derived from estimated distributions whose accuracy is questionable when the datasets are small, and (2) the kernel ... |

3 |
A neural network for optimum Neyman-Pearson classification
- Streit
- 1990
(Show Context)
Citation Context ...tion. 2.1 Previous work The NP classification problem has been extensively studied. Past methods can be roughly divided in two categories: generative and discriminative. One of the earliest attempts [=-=Streit 1990-=-] uses multi-layered neural network to estimate class-conditional distributions as mixture of Gaussians. The discriminant function is then inferred with a likelihood ratio test. In the same vein, rece... |

2 | The entire quantile path of a risk-agnostic svm classifier - Yu, Vishwanathan, et al. |

1 |
ACM TIST and online learning algorithms for Nonconvex Neyman-Pearson classification · 21
- Tao, An
- 1998
(Show Context)
Citation Context ...5 1 Fig. 1. Approximations of 0-1 loss. ℓatan(z) = 1/2+arctan(−βz)/π and ℓerf(z) = 1/2+erf(−βz)/2. Definitions of the other cost functions are detailed in the text. nonconvex optimization techniques [=-=Tao and An 1998-=-]. The second algorithm is a stochastic gradient algorithm suitable for very large datasets. Various experimental results illustrate their properties. 2. EMPIRICAL RISK NP FORMULATION Our approach con... |