Results 1 - 10
of
39
Learning to Resolve Natural Language Ambiguities: A Unified Approach
, 1998
"... We analyze a few of the commonly used statistics based and machine learning algorithms for natural language disambiguation tasks and observe that they can be recast as learning linear separators in the feature space. Each of the methods makes a priori assumptions, which it employs, given the data, w ..."
Abstract
-
Cited by 154 (75 self)
- Add to MetaCart
We analyze a few of the commonly used statistics based and machine learning algorithms for natural language disambiguation tasks and observe that they can be recast as learning linear separators in the feature space. Each of the methods makes a priori assumptions, which it employs, given the data, when searching for its hypothesis. Nevertheless, as we show, it searches a space that is as rich as the space of all linear separators. We use this to build an argument for a data driven approach which merely searches for a good linear separator in the feature space, without further assumptions on the domain or a specific problem. We present such an approach - a sparse network of linear separators, utilizing the Winnow learning algorithm - and show how to use it in a variety of ambiguity resolution problems. The learning approach presented is attribute-efficient and, therefore, appropriate for domains having very large number of attributes. In particular, we present an extensive experimental ...
A SNoW-Based Face Detector
- Advances in Neural Information Processing Systems 12
, 2000
"... A novel learning approach for human face detection using a network of linear units is presented. The SNoW learning architecture is a sparse network of linear functions over a pre-defined or incrementally learned feature space and is specifically tailored for learning in the presence of a very large ..."
Abstract
-
Cited by 98 (16 self)
- Add to MetaCart
A novel learning approach for human face detection using a network of linear units is presented. The SNoW learning architecture is a sparse network of linear functions over a pre-defined or incrementally learned feature space and is specifically tailored for learning in the presence of a very large number of features. A wide range of face images in different poses, with different expressions and under different lighting conditions are used as a training set to capture the variations of human faces. Experimental results on commonly used benchmark data sets of a wide range of face images show that the SNoW-based approach outperforms methods that use neural networks, Bayesian methods, support vector machines and others. Furthermore, learning and evaluation using the SNoW-based method are significantly more efficient than with other methods.
A Learning Approach to Shallow Parsing
- IN PROCEEDINGS OF EMNLP-WVLC'99. ASSOCIATION FOR COMPUTATIONAL LINGUISTICS
, 1999
"... A SNoW based learning approach to shallow parsing tasks is presented and studied experimentally. The approach learns to identify syntactic patterns by combining simple predictors to produce a coherent inference. Two instantiations of this approach are studied and experimental results for Noun-Phrase ..."
Abstract
-
Cited by 57 (23 self)
- Add to MetaCart
A SNoW based learning approach to shallow parsing tasks is presented and studied experimentally. The approach learns to identify syntactic patterns by combining simple predictors to produce a coherent inference. Two instantiations of this approach are studied and experimental results for Noun-Phrases (NP) and Subject-Verb (SV) phrases that compare favorably with the best published results are presented. In doing that, we compare two ways of modeling the problem of learning to recognize patterns and suggest that shallow parsing patterns are bet- ter learned using open/close predictors than using inside/outside predictors.
Constraint classification: A new approach to multiclass classification and ranking
- In Advances in Neural Information Processing Systems 15
, 2002
"... We introduce constraint classification, a framework capturing many flavors of multiclass classification including multilabel classification and ranking, and present a meta-algorithm for learning in this framework. We provide generalization bounds when using a collection of k linear functions to repr ..."
Abstract
-
Cited by 54 (5 self)
- Add to MetaCart
We introduce constraint classification, a framework capturing many flavors of multiclass classification including multilabel classification and ranking, and present a meta-algorithm for learning in this framework. We provide generalization bounds when using a collection of k linear functions to represent each hypothesis. We also present empirical and theoretical evidence that constraint classification is more powerful than existing methods of multiclass classification. 1
A second-order hidden markov model for part-of-speech tagging
- In Proceedings of the 37th Annual Meeting of the ACL
, 1999
"... This paper describes an extension to the hidden Markov model for part-of-speech tagging using second-order approximations for both contex-tual and lexical probabilities. This model in-creases the accuracy of the tagger to state of the art levels. These approximations make use of more contextual info ..."
Abstract
-
Cited by 51 (5 self)
- Add to MetaCart
This paper describes an extension to the hidden Markov model for part-of-speech tagging using second-order approximations for both contex-tual and lexical probabilities. This model in-creases the accuracy of the tagger to state of the art levels. These approximations make use of more contextual information than standard statistical systems. New methods of smoothing the estimated probabilities are also introduced to address the sparse data problem. 1
Learning in natural language
- Proceedings of the 16th International Joint Conference on Artificial Intelligence (IJCAI ’99); 31 July–6
, 1999
"... Statistics-based classifiers in natural language are developed typically by assuming a generative model for the data, estimating its parameters from training data and then using Bayes rule to obtain a classifier. For many problems the assumptions made by the generative models are evidently wrong, le ..."
Abstract
-
Cited by 40 (20 self)
- Add to MetaCart
Statistics-based classifiers in natural language are developed typically by assuming a generative model for the data, estimating its parameters from training data and then using Bayes rule to obtain a classifier. For many problems the assumptions made by the generative models are evidently wrong, leaving open the question of why these approaches work. This paper presents a learning theory account of the major statistical approaches to learning in natural language. A class of Linear Statistical Queries (LSQ) hypotheses is defined and learning with it is shown to exhibit some robustness properties. Many statistical learners used in natural language, including naive Bayes, Markov Models and Maximum Entropy models are shown to be LSQ hypotheses, explaining the robustness of these predictors even when the underlying probabilistic assumptions do not hold. This coherent view of when and why learning approaches work in this context may help to develop better learning methods and an understanding of the role of learning in natural language inferences. 1
A Classification Approach to Word Prediction
, 2000
"... The eventual goal of a language model is to accurately predict the value of a missing word given its context. We present an approach to word prediction that is based on learning a representation for each word as a function of words and linguistics predicates in its context. This approach raises a fe ..."
Abstract
-
Cited by 33 (8 self)
- Add to MetaCart
The eventual goal of a language model is to accurately predict the value of a missing word given its context. We present an approach to word prediction that is based on learning a representation for each word as a function of words and linguistics predicates in its context. This approach raises a few new questions that we address. First, in order to learn good word representations it is necessary to use an expressive representation of the context. We present a way that uses external knowledge to generate expressive context representations, along with a learning method capable of handling the large number of features generated this way that can, potentially, contribute to each prediction. Second, since the number of words "competing" for each prediction is large, there is a need to "focus the attention" on a smaller subset of these. We exhibit the contribution of a "focus of attention" mechanism to the performance of the word predictor. Finally, we describe a large scale experimental study in which the approach presented is shown to yield significant improvements in word prediction tasks.
A Sequential Model for Multi-Class Classification. EMNLP ’01
, 2001
"... Many classification problems require decisions among a large number of competing classes. These tasks, however, are not handled well by general purpose learning methods and are usually addressed in an ad-hoc fashion. We suggest a general approach – a sequential learning model that utilizes classifie ..."
Abstract
-
Cited by 32 (11 self)
- Add to MetaCart
Many classification problems require decisions among a large number of competing classes. These tasks, however, are not handled well by general purpose learning methods and are usually addressed in an ad-hoc fashion. We suggest a general approach – a sequential learning model that utilizes classifiers to sequentially restrict the number of competing classes while maintaining, with high probability, the presence of the true outcome in the candidates set. Some theoretical and computational properties of the model are discussed and we argue that these are important in NLP-like domains. The advantages of the model are illustrated in an experiment in partof-speech tagging. 1
Have Things Changed Now? An Empirical Study of Bug Characteristics in Modern Open Source Software
- Proc. of 1st Workshop on Architectural and System Support for Improving Software Dependability
, 2006
"... Software errors are a major cause for system failures. To effectively design tools and support for detecting and recovering from software failures requires a deep understanding of bug 1 characteristics. Recently, software and its development process have significantly changed in many ways, including ..."
Abstract
-
Cited by 29 (4 self)
- Add to MetaCart
Software errors are a major cause for system failures. To effectively design tools and support for detecting and recovering from software failures requires a deep understanding of bug 1 characteristics. Recently, software and its development process have significantly changed in many ways, including more help from bug detection tools, shift towards multi-threading architecture, the opensource development paradigm and increasing concerns about security and user-friendly interface. Therefore, results from previous studies may not be applicable to present software. Furthermore, many new aspects such as security, concurrency and open-sourcerelated characteristics have not well studied. Additionally, previous studies were based on a small number of bugs, which may lead to non-representative results. To investigate the impacts of the new factors on software errors,
Relational Learning for NLP using Linear Threshold Elements
, 1999
"... We describe a coherent view of learning and reasoning with relational representations in the context of natural language processing. In particular, we discuss the Neuroidal Architecture, Inductive Logic Programming and the SNoW system explaining the relationships among these, and thereby oer an expl ..."
Abstract
-
Cited by 28 (12 self)
- Add to MetaCart
We describe a coherent view of learning and reasoning with relational representations in the context of natural language processing. In particular, we discuss the Neuroidal Architecture, Inductive Logic Programming and the SNoW system explaining the relationships among these, and thereby oer an explanation of the theoretical basis for the SNoW system. We suggest that extensions of this system along the lines suggested by the theory may provide new levels of scalability and functionality. 1 Introduction The paper explores some aspects of relational knowledge representation and their learnability. While the discussion is to a large extent general it is made in the context of low-level natural language processing (NLP) tasks. Recent eorts in NLP emphasize empirical approaches, that attempt to learn how to perform various natural language tasks by being trained using an annotated corpus. These approaches have been used for a wide variety of fairly low level tasks such as part-of-speech...

