Results 1  10
of
110
Convolution Kernels for Natural Language
 Advances in Neural Information Processing Systems 14
, 2001
"... We describe the application of kernel methods to Natural Language Processing (NLP) problems. In many NLP tasks the objects being modeled are strings, trees, graphs or other discrete structures which require some mechanism to convert them into feature vectors. We describe kernels for various natural ..."
Abstract

Cited by 254 (7 self)
 Add to MetaCart
We describe the application of kernel methods to Natural Language Processing (NLP) problems. In many NLP tasks the objects being modeled are strings, trees, graphs or other discrete structures which require some mechanism to convert them into feature vectors. We describe kernels for various natural language structures, allowing rich, high dimensional representations of these structures. We show how a kernel over trees can be applied to parsing using the voted perceptron algorithm, and we give experimental results on the ATIS corpus of parse trees.
New Ranking Algorithms for Parsing and Tagging: Kernels over Discrete Structures, and the Voted Perceptron
, 2002
"... This paper introduces new learning algorithms for natural language processing based on the perceptron algorithm. We show how the algorithms can be efficiently applied to exponential sized representations of parse trees, such as the "all subtrees" (DOP) representation described by (Bod 98), or a r ..."
Abstract

Cited by 214 (6 self)
 Add to MetaCart
This paper introduces new learning algorithms for natural language processing based on the perceptron algorithm. We show how the algorithms can be efficiently applied to exponential sized representations of parse trees, such as the "all subtrees" (DOP) representation described by (Bod 98), or a representation tracking all subfragments of a tagged sentence. We give experimental results showing significant improvements on two tasks: parsing Wall Street Journal text, and namedentity extraction from web data.
The Negotiation and Acquisition of Recursive Grammars as a Result of Competition Among Exemplars
 Linguistic Evolution through Language Acquisition: Formal and Computational Models
, 1999
"... this paper is an investigation of how recursive communication systems can come to be. In particular, the investigation explores the possibility that such a system could emerge among the members of a population as the result of a process I characterize as "negotiation," because each individual both c ..."
Abstract

Cited by 63 (0 self)
 Add to MetaCart
this paper is an investigation of how recursive communication systems can come to be. In particular, the investigation explores the possibility that such a system could emerge among the members of a population as the result of a process I characterize as "negotiation," because each individual both contributes to, and conforms with, the system 1
Parameter Estimation for Statistical Parsing Models: Theory and Practice of DistributionFree Methods
, 2001
"... A fundamental problem in statistical parsing is the choice of criteria and algorithms used to estimate the parameters in a model. The predominant approach in computational linguistics has been to use a parametric model with some variant of maximumlikelihood estimation. The assumptions under which m ..."
Abstract

Cited by 53 (10 self)
 Add to MetaCart
A fundamental problem in statistical parsing is the choice of criteria and algorithms used to estimate the parameters in a model. The predominant approach in computational linguistics has been to use a parametric model with some variant of maximumlikelihood estimation. The assumptions under which maximumlikelihood estimation is justified are arguably quite strong. This paper discusses the statistical theory underlying various parameterestimation methods, and gives algorithms which depend on alternatives to (smoothed) maximumlikelihood estimation. We first give an overview of results from statistical learning theory. We then show how important concepts from the classification literature  specifically, generalization results based on margins on training data  can be derived for parsing models. Finally, we describe parameter estimation algorithms which are motivated by these generalization bounds.
ABL: AlignmentBased Learning
, 2000
"... This pal)or int;roduces a new type of grammar learning algorit;hm, insl)ircd l)y sl,ring edii, dis tan(;c (Wagner an(t Fis(:hcr, 1974). The algorithm takes a (:oft)us of fiat senl,en(:cs as intml, and rcLurns a corpus of labelled, 1)ra(:keted senl, en(:es. Th( lnel,hod works on pairs of Lured sellt ..."
Abstract

Cited by 36 (1 self)
 Add to MetaCart
This pal)or int;roduces a new type of grammar learning algorit;hm, insl)ircd l)y sl,ring edii, dis tan(;c (Wagner an(t Fis(:hcr, 1974). The algorithm takes a (:oft)us of fiat senl,en(:cs as intml, and rcLurns a corpus of labelled, 1)ra(:keted senl, en(:es. Th( lnel,hod works on pairs of Lured sellt,ellCeS l,ha[, have oBe o1: illore words in (:ommon. When t, wo sentences are (tivi(led int,o t)arLs i;haL m'e Lhc same in 1)ol, h s(mLen(:es and t)arLs that m:e (litlrenL, this interreal,ion is used to find ])m'Ls l, haL are hd;cr(:hmgeablc. These t)arLs m'e tak(m as possible (:onsLii, uenLs same type. Afi,er this aligmnent learning step, the sele(:tion learning s(,c 1) s(l(z(:l,s i,he mosL at)le (:onsl;ihmnl;s fi'om all possible (:onsLiLuent,s. This method was used 1,o booLsLra t) stru(:hrc on the A.TIS (:oftres (Mm'(:us et, al., 1993) and on the OVI'S 1 corpus (Bornmina eL al., 1997). While Lhc results are en(:om'aging (we o})l, aincd up t,o 89.25 % noncrossing l)ra(:ket,s 1)rc(:ision), this paper will 1)oini; ouL some of the shorl,COlnings of our apl)rom:h and will suggest 1)ossible sohd,ions.
Parsing with the Shortest Derivation
 Proceedings COLING2000
, 2000
"... tens @ scs.lecd s.ac.uk Common wisdom has it that tile bias of stochastic grammars in favor of shorter deriwttions of a sentence is hamfful and should be redressed. We show that the common wisdom is wrong for stochastic grammars that use elementary trees instead o1 ' conlextl'ree rules, such as Sto ..."
Abstract

Cited by 36 (14 self)
 Add to MetaCart
tens @ scs.lecd s.ac.uk Common wisdom has it that tile bias of stochastic grammars in favor of shorter deriwttions of a sentence is hamfful and should be redressed. We show that the common wisdom is wrong for stochastic grammars that use elementary trees instead o1 ' conlextl'ree rules, such as Stochastic TreeSubstitution Grammars used by DataOriented Parsing models. For such grammars a nonprobabilistic metric based on tile shortest derivation outperforms a probabilistic metric on the ATIS and OVIS corpora, while it obtains competitive results on the Wall Street Journal (WSJ) corpus. This paper also contains the first publislmd experiments with DOP on the WSJ. 1.
Probabilistic Syntax
, 2002
"... istic methods for syntax, just as for a long time McCarthy and Hayes (1969) discouraged exploration of probabilistic methods in Artificial Intelligence. Among his arguments were that: (i) Probabilistic models wrongly mix in world knowledge (New York occurs more in text than Dayton, Ohio, but for no ..."
Abstract

Cited by 34 (1 self)
 Add to MetaCart
istic methods for syntax, just as for a long time McCarthy and Hayes (1969) discouraged exploration of probabilistic methods in Artificial Intelligence. Among his arguments were that: (i) Probabilistic models wrongly mix in world knowledge (New York occurs more in text than Dayton, Ohio, but for no linguistic reason), (ii) Probabilistic models don't model grammaticality (neither Colorless green ideas sleep furiously nor Furiously sleep ideas green colorless have previously been uttered  and hence must be estimated to have probability zero, Chomsky wrongly assumes  but the former is grammatical while the latter is not, and (iii) Use of probabilities does not meet the goal of describing the mindinternal Ilanguage as opposed to the observedintheworld Elanguage. This chapter is not meant to be a detailed critique of Chomsky's arguments  Abney (1996) provides a survey and a rebuttal, and Pereira (2000) has further useful discussion  but some of these concerns are still importa
The DOP estimation method is biased and inconsistent
 Computational Linguistics
, 1998
"... this paper, an estimator is unbiased iff P E # # (#(X)) = P # # for all # , i.e., its expected parameter estimate specifies the same distribution as the true parameters. Similarly, the loss function is mean squared difference between the "true" and estimated distributions, i.e., if# is the eve ..."
Abstract

Cited by 33 (0 self)
 Add to MetaCart
this paper, an estimator is unbiased iff P E # # (#(X)) = P # # for all # , i.e., its expected parameter estimate specifies the same distribution as the true parameters. Similarly, the loss function is mean squared difference between the "true" and estimated distributions, i.e., if# is the event space (in DOP1, the space of all phrasestructure trees) then: ### P # #(#)(P # #(#) P #(x) (#))
An SVM Based Voting Algorithm with Application to Parse Reranking
 In Proc. of CoNLL 2003
, 2003
"... This paper introduces a novel Support Vector Machines (SVMs) based voting algorithm for reranking, which provides a way to solve the sequential models indirectly. We have presented a risk formulation under the PAC framework for this voting algorithm. We have applied this algorithm to the parse ..."
Abstract

Cited by 32 (4 self)
 Add to MetaCart
This paper introduces a novel Support Vector Machines (SVMs) based voting algorithm for reranking, which provides a way to solve the sequential models indirectly. We have presented a risk formulation under the PAC framework for this voting algorithm. We have applied this algorithm to the parse reranking problem, and achieved labeled recall and precision of 89.4%/89.8% on WSJ section 23 of Penn Treebank.