• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Similarity-based approaches to natural language processing (1997)

by L Lee
Venue:Harvard University
Add To MetaCart

Tools

Sorted by:
Results 11 - 20 of 21
Next 10 →

Automatic Semantic Classification of Verbs According to their Alternation Behaviour

by Sabine Schulte im Walde, Supervisor Prof, Mats Rooth , 1998
"... This thesis aims at an automatic acquisition of a semantic classification for verbs. As starting point, I assume that the.... ..."
Abstract - Cited by 8 (0 self) - Add to MetaCart
This thesis aims at an automatic acquisition of a semantic classification for verbs. As starting point, I assume that the....

Automatising the Learning of Lexical Patterns: an Application to the Enrichment of WordNet by Extracting Semantic Relationships from Wikipedia

by Maria Ruiz-casado, Enrique Alfonseca, Pablo Castells - Journal of Data and Knowledge Engineering , 2007
"... This paper describes an automatic approach to identify lexical patterns that represent semantic relationships between concepts in an on-line encyclopedia. Next, these patterns can be applied to extend existing ontologies or semantic networks with new relations. The experiments have been performed wi ..."
Abstract - Cited by 6 (1 self) - Add to MetaCart
This paper describes an automatic approach to identify lexical patterns that represent semantic relationships between concepts in an on-line encyclopedia. Next, these patterns can be applied to extend existing ontologies or semantic networks with new relations. The experiments have been performed with the Simple English Wikipedia and WordNet 1.7. A new algorithm has been devised for automatically generalising the lexical patterns found in the encyclopedia entries. We have found general patterns for the hyperonymy, hyponymy, holonymy and meronymy relations and, using them, we have extracted more than 2600 new relationships that did not appear in WordNet originally. The precision of these relationships depends on the degree of generality chosen for the patterns and the type of relation, being around 60-70 % for the best combinations proposed.

From Resource Discovery to Knowledge Discovery on the Internet

by Osmar R. Zaïane , 1998
"... More than 50 years ago, at a time when modern computers didn't exist yet, Vannevar Bush wrote about a multimedia digital library containing human collective knowledge and filled with "trails" linking materials of the same topic. At the end of World War II, Vannevar urged scientists to build such a k ..."
Abstract - Cited by 3 (0 self) - Add to MetaCart
More than 50 years ago, at a time when modern computers didn't exist yet, Vannevar Bush wrote about a multimedia digital library containing human collective knowledge and filled with "trails" linking materials of the same topic. At the end of World War II, Vannevar urged scientists to build such a knowledge store and make it useful, continuously extendable and more importantly, accessible for consultation. Today, the closest to the materialization of Vannevar's dream is the World-Wide Web hypertext and multimedia document collection. However, the ease of use and accessibility of the knowledge described by Vannevar is yet to be realized. Since the 60s, extensive research has been accomplished in the information retrieval field, and free-text search was finally adopted by many text repository systems in the late 80s. The advent of the World-Wide Web in the 90s helped text search become routine as millions of users use search engines daily to pinpoint resources on the Internet. However, r...

Textual Similarities based on a Distributional Approach

by Romaric Besançon, Martin Rajman, Jean-Cédric Chappelier - in Proceedings of the Tenth International Workshop on Database and Expert Systems Applications (DEXA99 , 1999
"... The design of efficient textual similarities is an important issue in the domain of textual data exploration. Textual similarities are for example central in document collection structuring (e.g. clustering), or in Information Retrieval (IR) which relies on the computation of textual similarities fo ..."
Abstract - Cited by 3 (0 self) - Add to MetaCart
The design of efficient textual similarities is an important issue in the domain of textual data exploration. Textual similarities are for example central in document collection structuring (e.g. clustering), or in Information Retrieval (IR) which relies on the computation of textual similarities for measuring the adequacy between a query and documents. The objective of this paper is to present and compare several textual similarity measures in the framework of the Distributional Semantics (DS) model for IR. This model is an extension of the standard Vector Space model, which further takes the co-frequencies between the terms in a given reference corpus into account. These co-frequencies are considered to provide a distributional representation of the "semantics" of the terms. The co-occurrence profiles are used to represent the documents as vectors. Practical retrieval experiments using DS-based similarity models have been conducted in the framework of the AMARYLLIS evaluation campaig...

Combining Statistical Techniques and Lexico-syntactic Patterns for Semantic Relations Extraction from Text

by Emiliano Giovannetti, Simone Marchi, Simonetta Montemagni
"... Abstract. We describe here a methodology to combine two different techniques for Semantic Relation Extraction from texts. On the one hand, generic lexicosyntactic patterns are applied to the linguistically analyzed corpus to detect a first set of pairs of co-occurring words, possibly involved in “sy ..."
Abstract - Cited by 1 (0 self) - Add to MetaCart
Abstract. We describe here a methodology to combine two different techniques for Semantic Relation Extraction from texts. On the one hand, generic lexicosyntactic patterns are applied to the linguistically analyzed corpus to detect a first set of pairs of co-occurring words, possibly involved in “syntagmatic” relations. On the other hand, a statistical unsupervised association system is used to obtain a second set of pairs of “distributionally similar ” terms, that appear to occur in similar contexts, thus possibly involved in “paradigmatic” relations. The approach aims at learning ontological information by filtering the candidate relations obtained through generic lexico-syntactic patterns and by labelling the anonymous relations obtained through the statistical system. The resulting set of relations can be used to enrich existing ontologies and for semantic annotation of documents or web pages.

A Practical Semantic Type Representation for Natural Language Understanding

by Myroslava Dzikovska , 2000
"... Reasoning about semantic classes and determining compatibility of the words in a given context is an important procedure used in many modules of natural language understanding systems. However, most existing systems do not devote much attention to their ontological knowledge representations, resulti ..."
Abstract - Add to MetaCart
Reasoning about semantic classes and determining compatibility of the words in a given context is an important procedure used in many modules of natural language understanding systems. However, most existing systems do not devote much attention to their ontological knowledge representations, resulting in implementations that are not portable to other domains. At the same time, statistical methods are more robust and less labor-intensive to develop, but typically result in models that are not easily interpretable by humans. We propose a semantic feature representation the use in practical dialogue systems and argue that it can oer advantages in terms of lexicon development and portability - in particular for dening selectional restrictions - and can also be useful for other system modules that do logical inference. We then propose to develop statistical methods allowing us to learn parts of our representation from corpus data. The author wishes to thank James Allen, Jason Eisner, Len...

Department of Linguistics

by University Of Stockholm, Magnus Sahlgren, David Swanberg, Supervisors Jussi Karlgren, Anders Holst
"... A new method for vector based semantic analysis is described. The particular technique used takes advantage of the distributional patterns of words in large text data, and represents each word in 1,800 dimensional sparse vectors, received by adding together the vectors denoting the context of each u ..."
Abstract - Add to MetaCart
A new method for vector based semantic analysis is described. The particular technique used takes advantage of the distributional patterns of words in large text data, and represents each word in 1,800 dimensional sparse vectors, received by adding together the vectors denoting the context of each unique word. The angles of these vectors are then compared in order to establish the semantic similarity between different words. The performance of the technique is evaluated through a standardized synonym test (Test Of English as a Foreign Language), and results reported from this first experiment are promising. Possible applications of the technique are discussed, and the conclusion is drawn that this method can be seen as a viable implementation of linguistic knowledge in computer systems.

Proceedings of the 9th Conference on Computational Natural Language Learning (CoNLL),

by Pages Ann Arbor, Dayne Freitag, Matthias Blume, John Byrnes, Edmond Chow, Sadik Kapadia, Richard Rohwer, Zhiqiang Wang - In Proceedings of CoNLL2005 , 2005
"... Recent work on the problem of detecting synonymy through corpus analysis has used the Test of English as a Foreign Language (TOEFL) as a benchmark. However, this test involves as few as 80 questions, prompting questions regarding the statistical significance of reported results. ..."
Abstract - Add to MetaCart
Recent work on the problem of detecting synonymy through corpus analysis has used the Test of English as a Foreign Language (TOEFL) as a benchmark. However, this test involves as few as 80 questions, prompting questions regarding the statistical significance of reported results.

EXTRACTOR FOR PATTERN DISAMBIGUIATION

by Sheng Yin, Sheng Yin, Sheng Yin, Major Professor, Ismailcem Budak Arpinar, Khaled Rasheed, Prashant Doshi, Maureen Grasso
"... One difficulty that prevents a machine from searching, retrieving and processing web content through the World Wide Web (WWW) is that most web content is presented in natural language, which cannot be processed by a machine. The current pattern-based annotation approaches can generate patterns for a ..."
Abstract - Add to MetaCart
One difficulty that prevents a machine from searching, retrieving and processing web content through the World Wide Web (WWW) is that most web content is presented in natural language, which cannot be processed by a machine. The current pattern-based annotation approaches can generate patterns for a given relation from unrestricted text, and they can use those generalized patterns to extract related concepts, that have the same relation, from other text. However, those approaches all have one problem unsolved: the pattern ambiguity problem. Our approach can generate lexical patterns for a particular relation from unrestricted text. Then patterns can be used to recognize concepts which have the same relation in other text. We proposed an ontologydriven pattern disambiguation process. This process can dramatically improve the performance of existing pattern-based annotation approaches. INDEX WORDS:

Discriminative Training of . . .

by Xin Li, et al. - PROCEEDINGS OF THE 9TH CONFERENCE ON COMPUTATIONAL NATURAL LANGUAGE LEARNING (CONLL) , 2005
"... Clustering is an optimization procedure that partitions a set of elements to optimize some criteria, based on a fixed distance metric defined between the elements. Clustering approaches have been widely applied in natural language processing and it has been shown repeatedly that their success ..."
Abstract - Add to MetaCart
Clustering is an optimization procedure that partitions a set of elements to optimize some criteria, based on a fixed distance metric defined between the elements. Clustering approaches have been widely applied in natural language processing and it has been shown repeatedly that their success depends on defining a good distance metric, one that is appropriate for the task and the clustering algorithm used. This paper develops a framework in which clustering is viewed as a learning task, and proposes a way to train a distance metric that is appropriate for the chosen clustering algorithm in the context of the given task. Experiments in the context of the entity identification problem exhibit significant performance improvements over state-of-the-art clustering approaches developed for this problem.
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University