Results 1 - 10
of
67
An Intrinsic Information Content Metric for Semantic Similarity in WordNet
, 2004
"... Information Content (IC) is an important dimension of word knowledge when assessing the similarity of two terms or word senses. The conventional way of measuring the IC of word senses is to combine knowledge of their hierarchical structure from an ontology like WordNet with statistics on their actua ..."
Abstract
-
Cited by 39 (2 self)
- Add to MetaCart
Information Content (IC) is an important dimension of word knowledge when assessing the similarity of two terms or word senses. The conventional way of measuring the IC of word senses is to combine knowledge of their hierarchical structure from an ontology like WordNet with statistics on their actual usage in text as derived from a large corpus. In this paper we present a wholly intrinsic measure of IC that relies on hierarchical structure alone. We report that this measure is consequently easier to calculate, yet when used as the basis of a similarity mechanism it yields judgments that correlate more closely with human assessments than other, extrinsic measures of IC that additionally employ corpus analysis.
A survey of statistical machine translation
, 2007
"... Statistical machine translation (SMT) treats the translation of natural language as a machine learning problem. By examining many samples of human-produced translation, SMT algorithms automatically learn how to translate. SMT has made tremendous strides in less than two decades, and many popular tec ..."
Abstract
-
Cited by 30 (3 self)
- Add to MetaCart
Statistical machine translation (SMT) treats the translation of natural language as a machine learning problem. By examining many samples of human-produced translation, SMT algorithms automatically learn how to translate. SMT has made tremendous strides in less than two decades, and many popular techniques have only emerged within the last few years. This survey presents a tutorial overview of state-of-the-art SMT at the beginning of 2007. We begin with the context of the current research, and then move to a formal problem description and an overview of the four main subproblems: translational equivalence modeling, mathematical modeling, parameter estimation, and decoding. Along the way, we present a taxonomy of some different approaches within these areas. We conclude with an overview of evaluation and notes on future directions.
Corpus-based, Statistical Goal Recognition
, 2003
"... Goal recognition for dialogue systems needs to be fast, make early predictions, and be portable. We present initial work which shows that using statistical, corpus-based methods to build goal recognizers may be a viable way to meet those needs. Our goal recognizer is trained on data from a plan ..."
Abstract
-
Cited by 18 (3 self)
- Add to MetaCart
Goal recognition for dialogue systems needs to be fast, make early predictions, and be portable. We present initial work which shows that using statistical, corpus-based methods to build goal recognizers may be a viable way to meet those needs. Our goal recognizer is trained on data from a plan corpus and then used to determine the agent's most likely goal based on that data. The algorithm is linear in the number of goals, and performs very well in terms of accuracy and early prediction. In addition, it is more easily portable to new domains as does not require a hand-crafted plan library.
Language Modeling Using Efficient Best-First Bottom-Up Parsing
- In Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop
, 2003
"... In this paper we present a two-stage best-first bottom-up word-lattice parser which we use as a language model for speech recognition. The parser works by using a "Figure of Merit" that selects lattice paths while simultaneously selecting syntactic category edges for parsing. Additionally, we introd ..."
Abstract
-
Cited by 13 (6 self)
- Add to MetaCart
In this paper we present a two-stage best-first bottom-up word-lattice parser which we use as a language model for speech recognition. The parser works by using a "Figure of Merit" that selects lattice paths while simultaneously selecting syntactic category edges for parsing. Additionally, we introduce a modified version of the Inside-Outside algorithm used as a pruning stage between syntactic context-free parsing and lexicalized context-dependent parsing. We report our results in terms of Word Error Rate on the HUB--1 word-lattices and compare these results to other syntactic language modeling techniques.
Incremental hierarchical clustering of text documents
- in 16th CIKM
, 2006
"... Incremental hierarchical text document clustering algorithms are important in organizing documents generated from streaming on-line sources, such as, Newswire and Blogs. However, this is a relatively unexplored area in the text document clustering literature. Popular incremental hierarchical cluster ..."
Abstract
-
Cited by 13 (0 self)
- Add to MetaCart
Incremental hierarchical text document clustering algorithms are important in organizing documents generated from streaming on-line sources, such as, Newswire and Blogs. However, this is a relatively unexplored area in the text document clustering literature. Popular incremental hierarchical clustering algorithms, namely Cobweb and Classit, havenot been widely used with text document data. We discuss why, in the current form, these algorithms are not suitable for text clustering and propose an alternative formulation that includes changes to the underlying distributional assumption of the algorithm in order to conform with the data. Both the original Classit algorithm and our proposed algorithm are evaluated using Reuters newswire articles and Ohsumed dataset.
Inducing criteria for mass noun lexical mappings using the Cyc KB, and its extension to WordNet
- In Proc. of the Fifth International Workshop on Computational Semantics (IWCS-5
, 2003
"... This paper presents an automatic approach for learning semantic criteria for the mass versus count noun distinction by induction over the lexical mappings contained in the Cyc knowledge base. This produces accurate results (89.5%) using a decision tree that only incorporates semantic features (i ..."
Abstract
-
Cited by 12 (4 self)
- Add to MetaCart
This paper presents an automatic approach for learning semantic criteria for the mass versus count noun distinction by induction over the lexical mappings contained in the Cyc knowledge base. This produces accurate results (89.5%) using a decision tree that only incorporates semantic features (i.e., Cyc ontological types). Comparable results (86.9%) are obtained using OpenCyc, the publicly available version of Cyc. For broader applicability, the mass noun criteria using Cyc are converted into criteria using WordNet, preserving the general accuracy (86.3%).
A Mathematical Model for Context and Word-Meaning
, 2003
"... Context is vital for deciding which of the possible senses of a word is being used in a particular situation, a task known as disambiguation. Motivated by a survey of disambiguation techniques in natural language processing, this paper presents a mathematical model describing the relationship betwee ..."
Abstract
-
Cited by 7 (4 self)
- Add to MetaCart
Context is vital for deciding which of the possible senses of a word is being used in a particular situation, a task known as disambiguation. Motivated by a survey of disambiguation techniques in natural language processing, this paper presents a mathematical model describing the relationship between words, meanings and contexts, giving examples of how context-groups can be used to distinguish different senses of ambiguous words. Many aspects of this model have interesting similarities with quantum theory.
Dissimilarity in graph-based semisupervised classification
- Eleventh International Conference on Artificial Intelligence and Statistics (AISTATS
, 2007
"... Label dissimilarity specifies that a pair of examples probably have different class labels. We present a semi-supervised classification algorithm that learns from dissimilarity and similarity information on labeled and unlabeled data. Our approach uses a novel graphbased encoding of dissimilarity th ..."
Abstract
-
Cited by 7 (2 self)
- Add to MetaCart
Label dissimilarity specifies that a pair of examples probably have different class labels. We present a semi-supervised classification algorithm that learns from dissimilarity and similarity information on labeled and unlabeled data. Our approach uses a novel graphbased encoding of dissimilarity that results in a convex problem, and can handle both binary and multiclass classification. Experiments on several tasks are promising. 1
Learning Continuous Phrase Representations and Syntactic Parsing with Recursive Neural Networks
"... Natural language parsing has typically been done with small sets of discrete categories such as NP and VP, but this representation does not capture the full syntactic nor semantic richness of linguistic phrases, and attempts to improve on this by lexicalizing phrases only partly address the problem ..."
Abstract
-
Cited by 7 (4 self)
- Add to MetaCart
Natural language parsing has typically been done with small sets of discrete categories such as NP and VP, but this representation does not capture the full syntactic nor semantic richness of linguistic phrases, and attempts to improve on this by lexicalizing phrases only partly address the problem at the cost of huge feature spaces and sparseness. To address this, we introduce a recursive neural network architecture for jointly parsing natural language and learning vector space representations for variable-sized inputs. At the core of our architecture are context-sensitive recursive neural networks (CRNN). These networks can induce distributed feature representations for unseen phrases and provide syntactic information to accurately predict phrase structure trees. Most excitingly, the representation of each phrase also captures semantic information: For instance, the phrases “decline to comment” and “would not disclose the terms ” are close by in the induced embedding space. Our current system achieves an unlabeled bracketing F-measure of 92.1% on the Wall Street Journal dataset for sentences up to length 15. 1

