Results 1 
4 of
4
New methods for splice site recognition
, 2002
"... Splice sites are locations in DNA which separate proteincoding regions (exons) from noncoding regions (introns). Accurate splice site detectors thus form important components of computational gene finders. We pose splice site recognition as a classification problem with the classifier learnt from ..."
Abstract

Cited by 21 (4 self)
 Add to MetaCart
Splice sites are locations in DNA which separate proteincoding regions (exons) from noncoding regions (introns). Accurate splice site detectors thus form important components of computational gene finders. We pose splice site recognition as a classification problem with the classifier learnt from a labeled data set consisting of only local information around the potential splice site. Note that finding the correct position of splice sites without using global information is a rather hard task. We analyze the genomes of the nematode Caenorhabditis elegans and of humans using specially designed support vector kernels. One of the kernels is adapted from our previous work on detecting translation initiation sites in vertebrates and another uses an extension to the wellknown Fisherkernel. We find excellent performance on both data sets.
Probabilistic Score Estimation with Piecewise Logistic Regression
 In Prof. of ICML ’04
, 2004
"... Wellcalibrated probabilities are necessary in many applications like probabilistic frameworks or costsensitive tasks. Based on previous success of asymmetric Laplace method in calibrating text classi ers' scores, we propose to use piecewise logistic regression, which is a simple extension o ..."
Abstract

Cited by 10 (0 self)
 Add to MetaCart
Wellcalibrated probabilities are necessary in many applications like probabilistic frameworks or costsensitive tasks. Based on previous success of asymmetric Laplace method in calibrating text classi ers' scores, we propose to use piecewise logistic regression, which is a simple extension of standard logistic regression, as an alternative method in the discriminative family. We show that both methods have the exibility to be piecewise linear functions in logodds, but they are based on quite dierent assumptions. We evaluated asymmetric Laplace method, piecewise logistic regression and standard logistic regression over standard text categorization collections (Reuters21578 and TRECAP) with three classi ers (SVM, Naive Bayes and Logistic Regression Classi er), and observed that piecewise logistic regression performs signi cantly better than the other two methods in the logloss metric.
Supervised Learning of Bayesian Network Parameters Made Easy
 Level Perspective on Branch Architecture Performance, IEEE Micro28
, 2002
"... Bayesian network models are widely used for supervised prediction tasks such as classification. Usually the parameters of such models are determined using `unsupervised' methods such as maximization of the joint likelihood. In many cases, the reason is that it is not clear how to find the parameters ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
Bayesian network models are widely used for supervised prediction tasks such as classification. Usually the parameters of such models are determined using `unsupervised' methods such as maximization of the joint likelihood. In many cases, the reason is that it is not clear how to find the parameters maximizing the supervised (conditional) likelihood. We show how the supervised learning problem can be solved e#ciently for a large class of Bayesian network models, including the Naive Bayes (NB) and treeaugmented NB (TAN) classifiers. We do this by showing that under a certain general condition on the network structure, the supervised learning problem is exactly equivalent to logistic regression. Hitherto this was known only for Naive Bayes models. Since logistic regression models have a concave loglikelihood surface, the global maximum can be easily found by local optimization methods.
Improving Rocchio with weakly supervised clustering
 In Proceedings of the European Conference on Machine Learning (ECML
, 2003
"... This paper presents a novel approach for adapting the complexity of a text categorization system to the diculty of the task. In this study, we adapt a simple text classi er (Rocchio), using weakly supervised clustering techniques. The idea is to identify subtopics of the original classes which ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
This paper presents a novel approach for adapting the complexity of a text categorization system to the diculty of the task. In this study, we adapt a simple text classi er (Rocchio), using weakly supervised clustering techniques. The idea is to identify subtopics of the original classes which can help improve the categorization process. To this end, we propose several clustering algorithms, and report results of various evaluations on standard benchmark corpora such as the Newsgroups corpus.