Results 1 -
5 of
5
Agnostic active learning
- In ICML
, 2006
"... We state and analyze the first active learning algorithm which works in the presence of arbitrary forms of noise. The algorithm, A2 (for Agnostic Active), relies only upon the assumption that the samples are drawn i.i.d. from a fixed distribution. We show that A2 achieves an exponential improvement ..."
Abstract
-
Cited by 80 (10 self)
- Add to MetaCart
We state and analyze the first active learning algorithm which works in the presence of arbitrary forms of noise. The algorithm, A2 (for Agnostic Active), relies only upon the assumption that the samples are drawn i.i.d. from a fixed distribution. We show that A2 achieves an exponential improvement (i.e., requires only O � ln 1 ɛ samples to find an ɛ-optimal classifier) over the usual sample complexity of supervised learning, for several settings considered before in the realizable case. These include learning threshold classifiers and learning homogeneous linear separators with respect to an input distribution which is uniform over the unit sphere. 1.
A PAC-style Model for Learning from Labeled and Unlabeled Data
- In Proceedings of the 18th Annual Conference on Computational Learning Theory (COLT
, 2005
"... There has been growing interest in practice in using unlabeled data together with labeled data in machine learning, and a number of di#erent approaches have been developed. However, the assumptions these methods are based on are often quite distinct and not captured by standard theoretical model ..."
Abstract
-
Cited by 44 (8 self)
- Add to MetaCart
There has been growing interest in practice in using unlabeled data together with labeled data in machine learning, and a number of di#erent approaches have been developed. However, the assumptions these methods are based on are often quite distinct and not captured by standard theoretical models. In this paper we describe a PAC-style framework that can be used to model many of these assumptions, and analyze sample-complexity issues in this setting: that is, how much of each type of data one should expect to need in order to learn well, and what are the basic quantities that these numbers depend on. Our model can be viewed as an extension of the standard PAC model, where in addition to a concept class C, one also proposes a type of compatibility that one believes the target concept should have with the underlying distribution.
A General and Multi-lingual Phrase Chunking Model based on Masking Method
- In CICLING
, 2006
"... Abstract. Several phrase chunkers have been proposed over the past few years. Some state-of-the-art chunkers achieved better performance via integrating external resources, e.g., parsers and additional training data, or combining multiple learners. However, in many languages and domains, such extern ..."
Abstract
-
Cited by 7 (4 self)
- Add to MetaCart
Abstract. Several phrase chunkers have been proposed over the past few years. Some state-of-the-art chunkers achieved better performance via integrating external resources, e.g., parsers and additional training data, or combining multiple learners. However, in many languages and domains, such external materials are not easily available and the combination of multiple learners will increase the cost of training and testing. In this paper, we propose a mask method to improve the chunking accuracy. The experimental results show that our chunker achieves better performance in comparison with other deep parsers and chunkers. For CoNLL-2000 data set, our system achieves 94.12 in F rate. For the base-chunking task, our system reaches 92.95 in F rate. When porting to Chinese, the performance of the base-chunking task is 92.36 in F rate. Also, our chunker is quite efficient. The complete chunking time of a 50K words document is about 50 seconds. 1
New Theoretical Frameworks for Machine Learning
, 2007
"... This thesis develops and analyzes theoretical frameworks for new emerging paradigms of Machine Learning including Semi-supervised, Active, and Similarity-based Learning. These are areas of significant practical importance and significant activity in Machine Learning, and a number of different algori ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
This thesis develops and analyzes theoretical frameworks for new emerging paradigms of Machine Learning including Semi-supervised, Active, and Similarity-based Learning. These are areas of significant practical importance and significant activity in Machine Learning, and a number of different algorithmic approaches have been developed for each of them. Standard Learning Theory frameworks such as PAC or Statistical Learning Theory models tend to not capture these learning approaches, hence developing sound and rigorous models that provide a thorough understanding of these new paradigms is desirable. The purpose of this thesis is to propose and to study new theoretical frameworks and algorithms for better understanding and extending some of these learning approaches. In addition, this dissertation also presents new applications of techniques from Machine Learning Theory to new emerging areas of Computer Science at large, such as Auction and Mechanism Design. In Machine Learning, there has been growing interest in using unlabeled data together with labeled data due to the availability of large amounts of unlabeled data in many applications. As a result, a number of different algorithmic approaches have been developed for this
Arbitrary Phrase Identification using Linear Kernel with Mask Method
"... Abstract. In this paper, we proposed an efficient and accurate text chunking system using linear SVM kernel and a new technique called mask method. Previous researches indicated that systems combination or external parsers can highlight the chunking performance. However the cost of constructing mult ..."
Abstract
- Add to MetaCart
Abstract. In this paper, we proposed an efficient and accurate text chunking system using linear SVM kernel and a new technique called mask method. Previous researches indicated that systems combination or external parsers can highlight the chunking performance. However the cost of constructing multiclassifiers is even higher than developing a single processor. Besides, the use of external resources will complicate the original tagging process. To remedy these problems, we employ richer features and propose a masked-based method to solve unknown word problem to enhance system performance. In this way, no external resources and complex heuristics are necessary for the chunking system. The experiments show that when training with the CoNLL-2000 chunking data set, our system achieves 94.12 in F (β) rate and 94.21 with SVM POS-tagger. Furthermore, our chunker is quite efficient since it adopts linear kernel SVM. The turn around tagging time on CoNLL-2000 testing data is less than 52 seconds. 1

