Results 1 -
8 of
8
Online Learning of Approximate Dependency Parsing Algorithms
- In Proc. of EACL
, 2006
"... In this paper we extend the maximum spanning tree (MST) dependency parsing framework of McDonald et al. (2005c) to incorporate higher-order feature representations and allow dependency structures with multiple parents per word. We show that those extensions can make the MST framework computationally ..."
Abstract
-
Cited by 111 (8 self)
- Add to MetaCart
In this paper we extend the maximum spanning tree (MST) dependency parsing framework of McDonald et al. (2005c) to incorporate higher-order feature representations and allow dependency structures with multiple parents per word. We show that those extensions can make the MST framework computationally intractable, but that the intractability can be circumvented with new approximate parsing algorithms. We conclude with experiments showing that discriminative online learning using those approximate algorithms achieves the best reported parsing accuracy for Czech and Danish. 1
Scalable Techniques for Mining Causal Structures
- Data Mining and Knowledge Discovery
, 1998
"... Mining for association rules in market basket data has proved a fruitful area of research. Measures such as conditional probability (confidence) and correlation have been used to infer rules of the form "the existence of item A implies the existence of item B." However, such rules indicate only a st ..."
Abstract
-
Cited by 82 (1 self)
- Add to MetaCart
Mining for association rules in market basket data has proved a fruitful area of research. Measures such as conditional probability (confidence) and correlation have been used to infer rules of the form "the existence of item A implies the existence of item B." However, such rules indicate only a statistical relationship between A and B. They do not specify the nature of the relationship: whether the presence of A causes the presence of B, or the converse, or some other attribute or phenomenon causes both to appear together. In applications, knowing such causal relationships is extremely useful for enhancing understanding and effecting change. While distinguishing causality from correlation is a truly difficult problem, recent work in statistics and Bayesian learning provide some avenues of attack. In these fields, the goal has generally been to learn complete causal models, which are essentially impossible to learn in large-scale data mining applications with a large number of variab...
Discriminative Learning and Spanning Tree Algorithms for Dependency Parsing
, 2006
"... In this thesis we develop a discriminative learning method for dependency parsing using
online large-margin training combined with spanning tree inference algorithms. We will
show that this method provides state-of-the-art accuracy, is extensible through the feature
set and can be implemented effici ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
In this thesis we develop a discriminative learning method for dependency parsing using
online large-margin training combined with spanning tree inference algorithms. We will
show that this method provides state-of-the-art accuracy, is extensible through the feature
set and can be implemented efficiently. Furthermore, we display the language independent
nature of the method by evaluating it on over a dozen diverse languages as well as show its
practical applicability through integration into a sentence compression system.
We start by presenting an online large-margin learning framework that is a generaliza-
tion of the work of Crammer and Singer [34, 37] to structured outputs, such as sequences
and parse trees. This will lead to the heart of this thesis – discriminative dependency pars-
ing. Here we will formulate dependency parsing in a spanning tree framework, yielding
efficient parsing algorithms for both projective and non-projective tree structures. We will
then extend the parsing algorithm to incorporate features over larger substructures with-
out an increase in computational complexity for the projective case. Unfortunately, the
non-projective problem then becomes NP-hard so we provide structurally motivated ap-
proximate algorithms. Having defined a set of parsing algorithms, we will also define a
rich feature set and train various parsers using the online large-margin learning framework.
We then compare our trained dependency parsers to other state-of-the-art parsers on 14
diverse languages: Arabic, Bulgarian, Chinese, Czech, Danish, Dutch, English, German,
Japanese, Portuguese, Slovene, Spanish, Swedish and Turkish.
Having built an efficient and accurate discriminative dependency parser, this thesis will
then turn to improving and applying the parser. First we will show how additional re-
sources can provide useful features to increase parsing accuracy and to adapt parsers to
new domains. We will also argue that the robustness of discriminative inference-based
learning algorithms lend themselves well to dependency parsing when feature representa-
tions or structural constraints do not allow for tractable parsing algorithms. Finally, we
integrate our parsing models into a state-of-the-art sentence compression system to show
its applicability to a real world problem.
Structural Learning of Dynamic Bayesian Networks in Speech Recognition
, 2001
"... this paper, X i denotes a continuous or discrete random variable. Values of the random variable will be indicated by lower case letters as in x i . For a discrete variable that takes r values, x i denote a speci c assignment for 1 k r. A set of variables is denoted in boldface letters X = fX 1 ; ..."
Abstract
-
Cited by 6 (3 self)
- Add to MetaCart
this paper, X i denotes a continuous or discrete random variable. Values of the random variable will be indicated by lower case letters as in x i . For a discrete variable that takes r values, x i denote a speci c assignment for 1 k r. A set of variables is denoted in boldface letters X = fX 1 ; : : : ; Xn g
Implementing Data Mining Algorithms with
- SQL Server™, Proc. of the Data Mining III, 3rd Int'l Conf. on Data Mining
, 2002
"... The OLE DB for DM (Microsoft's object-based technology for sharing information and services across process and machine boundaries focused on database mining applications) specification provides an industry standard for implementation of data mining algorithms aggregated with Microsoft SQL Server 2 ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
The OLE DB for DM (Microsoft's object-based technology for sharing information and services across process and machine boundaries focused on database mining applications) specification provides an industry standard for implementation of data mining algorithms aggregated with Microsoft SQL Server 2000. The Simple Naive Bayes classifier is implemented using the OLE DB for DM Resource Kit. Numeric input attributes, multiple prediction trees and incremental classification are considered. All necessary steps to implement this algorithm are explained and discussed. Some results are shown to illustrate the capabilities of the implementation.
How an Expert can use Imperfect Knowledge to Improve an Imperfect Theory
"... This report addresses the challenge of using auxiliary information I A to improve a given theory, encoded as a belief net BE . In contrast with many other "knowledge revision" systems, we deal with the situation where this I A may be imperfect , which means BE should not necessarily incorporate tha ..."
Abstract
- Add to MetaCart
This report addresses the challenge of using auxiliary information I A to improve a given theory, encoded as a belief net BE . In contrast with many other "knowledge revision" systems, we deal with the situation where this I A may be imperfect , which means BE should not necessarily incorporate that information. Instead, we provide tools to help the expert decide how to use I A . After presenting objective criteria for measuring how much I A differs from BE , we discuss ways to evaluate whether this difference is statistically significant . We then provide tools to isolate the differences --- to tell the domain expert which parts of the belief net (e.g., which links, and/or which nodes) account for the discrepancy. Two of our tools involve techniques that are of independent interest: viz., the use of a noncentral Ø 2 -test to compute the relative likelihood of two similar belief nets, and a sensitivity analysis that provides the "error-bars" around the answers returned by a belief...
A New Algorithm For Learning Bayesian Classifiers From Data
"... We introduce a new algorithm for the induction of classi ers from data, based on Bayesian networks. Basically this problem has already been examined from two perspectives: rst, the induction of classi ers by learning algorithms for Bayesian networks, second, the induction of classi ers based on t ..."
Abstract
- Add to MetaCart
We introduce a new algorithm for the induction of classi ers from data, based on Bayesian networks. Basically this problem has already been examined from two perspectives: rst, the induction of classi ers by learning algorithms for Bayesian networks, second, the induction of classi ers based on the naive Bayesian classi er. Our approach is located between these two perspectives; it eliminates the disadvantages of both while exploiting their advantages. In contrast to recently appeared re nements of the naive Bayes classi er, which captures single correlations in the data, we have developed an approach which captures multiple correlations and furthermore does a trade-o between complexity and accuracy. In this paper we evaluate the implementation of our approach with data sets from the machine learning repository and data sets arti cially generated by Bayesian networks.
Evaluating the Scalability of Data Mining
"... Two classifiers implemented as Data Mining Providers are considered. These providers runs as a stand-alone servers or aggregated with Microsoft SQL Server. One of these classifiers is the Microsoft Decision Trees algorithm. The other is the Simple Naive Bayes incremental classifier, that supports co ..."
Abstract
- Add to MetaCart
Two classifiers implemented as Data Mining Providers are considered. These providers runs as a stand-alone servers or aggregated with Microsoft SQL Server. One of these classifiers is the Microsoft Decision Trees algorithm. The other is the Simple Naive Bayes incremental classifier, that supports continuous input attributes, multiple discrete predictable attributes and incremental updating of the training data set. The performance study carried out to verify the scalability of the classifiers includes factors of cardinality (number of training cases), number of input attributes, number of states of the input attributes and number of predictable attributes.

