Results 1 - 10
of
39
Joint inference in information extraction
- In Proceedings of the 22nd National Conference on Artificial Intelligence (2007
"... The goal of information extraction is to extract database records from text or semi-structured sources. Traditionally, information extraction proceeds by first segmenting each candidate record separately, and then merging records that refer to the same entities. While computationally efficient, this ..."
Abstract
-
Cited by 53 (7 self)
- Add to MetaCart
The goal of information extraction is to extract database records from text or semi-structured sources. Traditionally, information extraction proceeds by first segmenting each candidate record separately, and then merging records that refer to the same entities. While computationally efficient, this approach is suboptimal, because it ignores the fact that segmenting one candidate record can help to segment similar ones. For example, resolving a well-segmented field with a lessclear one can disambiguate the latter’s boundaries. In this paper we propose a joint approach to information extraction, where segmentation of all records and entity resolution are performed together in a single integrated inference process. While a number of previous authors have taken steps in this direction (e.g., Pasula et al. (2003), Wellner et al. (2004)), to our knowledge this is the first fully joint approach. In experiments on the CiteSeer and Cora citation matching datasets, joint inference improved accuracy, and our approach outperformed previous ones. Further, by using Markov logic and the existing algorithms for it, our solution consisted mainly of writing the appropriate logical formulas, and required much less engineering than previous ones.
A robust combination strategy for semantic role labeling
- Journal of Artificial Intelligence Research
, 2005
"... This paper focuses on semantic role labeling using automatically-generated syntactic information. A simple and robust strategy for system combination is presented, which allows to partially recover from input parsing errors and to significantly boost results of individual systems. This combination s ..."
Abstract
-
Cited by 25 (7 self)
- Add to MetaCart
This paper focuses on semantic role labeling using automatically-generated syntactic information. A simple and robust strategy for system combination is presented, which allows to partially recover from input parsing errors and to significantly boost results of individual systems. This combination scheme is also very flexible since the individual systems are not required to provide any information other than their solution. Extensive experimental evaluation in the CoNLL-2005 shared task framework supports our previous claims. The proposed architecture outperforms the best results reported in that evaluation exercise.
/* iComment: Bugs or Bad Comments? */
- PROCEEDINGS OF THE 21ST ACM SIGOPS SYMPOSIUM ON OPERATING SYSTEMS PRINCIPLES
, 2007
"... Commenting source code has long been a common practice in software development. Compared to source code, comments are more direct, descriptive and easy-to-understand. Comments and source code provide relatively redundant and independent information regarding a program’s semantic behavior. As softwar ..."
Abstract
-
Cited by 18 (4 self)
- Add to MetaCart
Commenting source code has long been a common practice in software development. Compared to source code, comments are more direct, descriptive and easy-to-understand. Comments and source code provide relatively redundant and independent information regarding a program’s semantic behavior. As software evolves, they can easily grow out-of-sync, indicating two problems: (1) bugs-the source code does not follow the assumptions and requirements specified by correct program comments; (2) bad comments- comments that are inconsistent with correct code, which can confuse and mislead programmers to introduce bugs in subsequent versions. Unfortunately, as most comments are written in natural language, no solution has been proposed to automatically analyze comments and detect inconsistencies between comments and source code. This paper takes the first step in automatically analyzing comments written in natural language to extract implicit program rules and use these rules to automatically detect inconsistencies between comments and source code, indicating either bugs or bad comments. Our solution, iComment, combines Natural Language Processing (NLP), Machine Learning, Statistics and Program Analysis techniques to achieve these goals. We evaluate iComment on four large code bases: Linux, Mozilla, Wine and Apache. Our experimental results show that iComment automatically extracts 1832 rules from comments with 90.8-100% accuracy and detects 60 comment-code inconsistencies, 33 new bugs and 27 bad comments, in the latest versions of the four programs. Nineteen of them (12 bugs and 7 bad comments) have already been confirmed by the corresponding developers while the others are currently being analyzed by the developers.
Modeling Discriminative Global Inference
- PROCEEDINGS OF THE FIRST IEEE INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING (ICSC)
, 2007
"... Many recent advances in complex domains such as Natural Language Processing (NLP) have taken a discriminative approach in conjunction with the global application of structural and domain specific constraints. We introduce LBJ, a new modeling language for specifying exact inference systems of this ty ..."
Abstract
-
Cited by 16 (12 self)
- Add to MetaCart
Many recent advances in complex domains such as Natural Language Processing (NLP) have taken a discriminative approach in conjunction with the global application of structural and domain specific constraints. We introduce LBJ, a new modeling language for specifying exact inference systems of this type, combining ideas from machine learning, optimization, First Order Logic (FOL), and Object Oriented Programming (OOP). Expressive constraints are specified declaratively as arbitrary FOL formulas over functions and objects. The language's run-time library translates them to a mathematical programming representation from which an exact solution is computed. In addition, the compiler leverages an existing OOP language: objects and functions are grouped as the OOP objects and methods that encapsulate the user's data.
Combining lexical resources: Mapping between propbank and verbnet
- In Proceedings of the 7th International Workshop on Computational Linguistics
, 2007
"... A wide variety of lexical resources have been created to allow automatic semantic processing of novel text. However, each resource has its own practical and theoretical idiosyncracies, making it difficult to combine the information from different resources. We discuss the form that these differences ..."
Abstract
-
Cited by 15 (2 self)
- Add to MetaCart
A wide variety of lexical resources have been created to allow automatic semantic processing of novel text. However, each resource has its own practical and theoretical idiosyncracies, making it difficult to combine the information from different resources. We discuss the form that these differences can take, and describe how we overcame some of them in creating a mapping between two important resources: Prop-Bank and VerbNet. Furthermore, we present experimental results that show that this mapping improves performance for PropBank-style semantic role labeling. Since PropBank was designed on a verb-by-verb basis, the argument labels Arg2- Arg5 get used for a wide variety of argument roles. As a result, it can be difficult for automatic classifiers to learn to distinguish these arguments. But by using the mapping that we have created between PropBank and VerbNet, we can train a classifier based on VerbNet argument labels, which are more consistent and therefore easier to learn. 1
Factorie: Probabilistic programming via imperatively defined factor graphs
- In Advances in Neural Information Processing Systems 22
, 2009
"... Discriminatively trained undirected graphical models have had wide empirical success, and there has been increasing interest in toolkits that ease their application to complex relational data. The power in relational models is in their repeated structure and tied parameters; at issue is how to defin ..."
Abstract
-
Cited by 15 (2 self)
- Add to MetaCart
Discriminatively trained undirected graphical models have had wide empirical success, and there has been increasing interest in toolkits that ease their application to complex relational data. The power in relational models is in their repeated structure and tied parameters; at issue is how to define these structures in a powerful and flexible way. Rather than using a declarative language, such as SQL or first-order logic, we advocate using an imperative language to express various aspects of model structure, inference, and learning. By combining the traditional, declarative, statistical semantics of factor graphs with imperative definitions of their construction and operation, we allow the user to mix declarative and procedural domain knowledge, and also gain significant efficiencies. We have implemented such imperatively defined factor graphs in a system we call FACTORIE, a software library for an object-oriented, strongly-typed, functional language. In experimental comparisons to Markov Logic Networks on joint segmentation and coreference, we find our approach to be 3-15 times faster while reducing error by 20-25%—achieving a new state of the art. 1
Transliteration as constrained optimization
- In Proc. EMNLP
, 2008
"... This paper introduces a new method for identifying named-entity (NE) transliterations in bilingual corpora. Recent works have shown the advantage of discriminative approaches to transliteration: given two strings (ws, wt) in the source and target language, a classifier is trained to determine if wt ..."
Abstract
-
Cited by 14 (3 self)
- Add to MetaCart
This paper introduces a new method for identifying named-entity (NE) transliterations in bilingual corpora. Recent works have shown the advantage of discriminative approaches to transliteration: given two strings (ws, wt) in the source and target language, a classifier is trained to determine if wt is the transliteration of ws. This paper shows that the transliteration problem can be formulated as a constrained optimization problem and thus take into account contextual dependencies and constraints among character bi-grams in the two strings. We further explore several methods for learning the objective function of the optimization problem and show the advantage of learning it discriminately. Our experiments show that the new framework results in over 50 % improvement in translating English NEs to Hebrew. 1
Semantic role labeling via tree kernel joint inference
- In Proceedings of CoNLL-X
, 2006
"... Recent work on Semantic Role Labeling (SRL) has shown that to achieve high accuracy a joint inference on the whole predicate argument structure should be applied. In this paper, we used syntactic subtrees that span potential argument structures of the target predicate in tree kernel functions. This ..."
Abstract
-
Cited by 11 (6 self)
- Add to MetaCart
Recent work on Semantic Role Labeling (SRL) has shown that to achieve high accuracy a joint inference on the whole predicate argument structure should be applied. In this paper, we used syntactic subtrees that span potential argument structures of the target predicate in tree kernel functions. This allows Support Vector Machines to discern between correct and incorrect predicate structures and to re-rank them based on the joint probability of their arguments. Experiments on the PropBank data show that both classification and re-ranking based on tree kernels can improve SRL systems.
A global joint model for semantic role labeling
- COMPUTATIONAL LINGUISTICS
, 2008
"... We present a model for semantic role labeling that effectively captures the linguistic intuition that a semantic argument frame is a joint structure, with strong dependencies among the arguments. We show how to incorporate these strong dependencies in a statistical joint model with a rich set of fea ..."
Abstract
-
Cited by 11 (0 self)
- Add to MetaCart
We present a model for semantic role labeling that effectively captures the linguistic intuition that a semantic argument frame is a joint structure, with strong dependencies among the arguments. We show how to incorporate these strong dependencies in a statistical joint model with a rich set of features over multiple argument phrases. The proposed model substantially outperforms a similar state-of-the-art local model that does not include dependencies among different arguments. We evaluate the gains from incorporating this joint information on the Propbank corpus, when using correct syntactic parse trees as input, and when using automatically derived parse trees. The gains amount to 24.1 % error reduction on all arguments and 36.8 % on core arguments for gold-standard parse trees on Propbank. For automatic parse trees, the error reductions are 8.3 % and 10.3 % on all and core arguments, respectively. We also present results on the CoNLL 2005 shared task data set. Additionally, we explore considering multiple syntactic analyses to cope with parser noise and uncertainty.
Fast Semantic Extraction Using a Novel Neural Network Architecture
"... We describe a novel neural network architecture for the problem of semantic role labeling. Many current solutions are complicated, consist of several stages and handbuilt features, and are too slow to be applied as part of real applications that require such semantic labels, partly because of their ..."
Abstract
-
Cited by 7 (2 self)
- Add to MetaCart
We describe a novel neural network architecture for the problem of semantic role labeling. Many current solutions are complicated, consist of several stages and handbuilt features, and are too slow to be applied as part of real applications that require such semantic labels, partly because of their use of a syntactic parser (Pradhan et al., 2004; Gildea and Jurafsky, 2002). Our method instead learns a direct mapping from source sentence to semantic tags for a given predicate without the aid of a parser or a chunker. Our resulting system obtains accuracies comparable to the current state-of-the-art at a fraction of the computational cost. 1

