Results 1 - 10
of
19
From Distributional to Semantic Similarity
, 2003
"... Lexical-semantic resources, including thesauri and WORDNET, have been successfully incorporated into a wide range of applications in Natural Language Processing. However they are very difficult and expensive to create and maintain, and their usefulness has been severely hampered by their limited cov ..."
Abstract
-
Cited by 59 (11 self)
- Add to MetaCart
Lexical-semantic resources, including thesauri and WORDNET, have been successfully incorporated into a wide range of applications in Natural Language Processing. However they are very difficult and expensive to create and maintain, and their usefulness has been severely hampered by their limited coverage, bias and inconsistency. Automated and semi-automated methods for developing such resources are therefore crucial for further resource development and improved application performance.
Towards large-scale, open-domain and ontology-based named entity classification
- In Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP’05
, 2005
"... ..."
Semantic Density Analysis: Comparing word meaning across time and phonetic space
"... This paper presents a new statistical method for detecting and tracking changes in word meaning, based on Latent Semantic Analysis. By comparing the density of semantic vector clusters this method allows researchers to make statistical inferences on questions such as whether the meaning of a word ch ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
This paper presents a new statistical method for detecting and tracking changes in word meaning, based on Latent Semantic Analysis. By comparing the density of semantic vector clusters this method allows researchers to make statistical inferences on questions such as whether the meaning of a word changed across time or if a phonetic cluster is associated with a specific meaning. Possible applications of this method are then illustrated in tracing the semantic change of „dog‟, „do‟, and „deer ‟ in early English and examining and comparing phonaesthemes. 1
Categorization-driven cross-language retrieval of medical information
- Journal of the American Society for Information Science and Technology
, 2006
"... The Web has become a large repository of documents (or pages) written in many different languages. In this context, traditional information retrieval (IR) techniques cannot be used whenever the user query and the documents being retrieved are in different languages. To address this problem, new cros ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
The Web has become a large repository of documents (or pages) written in many different languages. In this context, traditional information retrieval (IR) techniques cannot be used whenever the user query and the documents being retrieved are in different languages. To address this problem, new cross-language information retrieval (CLIR) techniques have been proposed. In this work, we describe a method for cross-language retrieval of medical information. This method combines query terms and related medical concepts obtained automatically through a categorization procedure. The medical concepts are used to create a linguistic abstraction that allows retrieving information in a language-independent way, minimizing linguistic problems such as polysemy. To evaluate our method, we carried out experiments using the OHSUMED test collection, whose documents are written in English, with queries expressed in Portuguese, Spanish, and French. The results indicate that our cross-language retrieval method is as effective as a standard vector space model algorithm operating on queries and documents in the same language. Further, our results improve previous results in the literature. 1
Semantic Indexing of a Competence Map to support Scientific Collaboration in a Research Community
"... This paper describes a methodology to semiautomatically acquire a taxonomy of terms and term definitions in a specific research domain. The taxonomy is then used for semantic search and indexing of a knowledge base of scientific competences, called Knowledge Map. The KMap is a system to support rese ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
This paper describes a methodology to semiautomatically acquire a taxonomy of terms and term definitions in a specific research domain. The taxonomy is then used for semantic search and indexing of a knowledge base of scientific competences, called Knowledge Map. The KMap is a system to support research collaborations and sharing of results within and beyond a European Network of Excellence. The methodology is general and can be applied to model any web community- starting from the documents shared and exchanged among the community members-and to use this model for improving accessibility of data and knowledge repositories. 1
ABSTRACT Long-Answer Question Answering and Rhetorical-Semantic Relations
, 2007
"... Over the past decade, Question Answering (QA) has generated considerable interest and participation in the fields of Natural Language Processing and Information Retrieval. Conferences such as TREC, CLEF and DUC have examined various aspects of the QA task in the academic community. In the commercial ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Over the past decade, Question Answering (QA) has generated considerable interest and participation in the fields of Natural Language Processing and Information Retrieval. Conferences such as TREC, CLEF and DUC have examined various aspects of the QA task in the academic community. In the commercial world, major search engines from Google, Microsoft and Yahoo have integrated basic QA capabilities into their core web search. These efforts have focused largely on so-called “factoid ” questions seeking a single fact, such as the birthdate of an individual or the capital city of a country. Yet in the past few years, there has been growing recognition of a broad class of “long-answer ” questions which cannot be satisfactorily answered in this framework, such as those seeking a definition, explanation, or other descriptive information in response. In this thesis, we consider the problem of answering such questions, with particular focus on the contribution to be made by integrating rhetorical and semantic models. We present DefScriber, a system for answering definitional (“What is X?”), biographi-cal (“Who is X?”) and other long-answer questions using a hybrid of goal- and data-driven methods. Our goal-driven, or top-down, approach is motivated by a set of definitional pred-
Combining Statistical Techniques and Lexico-syntactic Patterns for Semantic Relations Extraction from Text
"... Abstract. We describe here a methodology to combine two different techniques for Semantic Relation Extraction from texts. On the one hand, generic lexicosyntactic patterns are applied to the linguistically analyzed corpus to detect a first set of pairs of co-occurring words, possibly involved in “sy ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Abstract. We describe here a methodology to combine two different techniques for Semantic Relation Extraction from texts. On the one hand, generic lexicosyntactic patterns are applied to the linguistically analyzed corpus to detect a first set of pairs of co-occurring words, possibly involved in “syntagmatic” relations. On the other hand, a statistical unsupervised association system is used to obtain a second set of pairs of “distributionally similar ” terms, that appear to occur in similar contexts, thus possibly involved in “paradigmatic” relations. The approach aims at learning ontological information by filtering the candidate relations obtained through generic lexico-syntactic patterns and by labelling the anonymous relations obtained through the statistical system. The resulting set of relations can be used to enrich existing ontologies and for semantic annotation of documents or web pages.
Quantum Logic of Word Meanings: Concept Lattices in Vector Space Models
, 2003
"... This paper systematically develops the logical and algebraic possibilities inherent in vector space models for language, considerably beyond those which are customarily used in semantic applications such as information retrieval and word sense disambiguation. The cornerstone of the approach lies in ..."
Abstract
- Add to MetaCart
This paper systematically develops the logical and algebraic possibilities inherent in vector space models for language, considerably beyond those which are customarily used in semantic applications such as information retrieval and word sense disambiguation. The cornerstone of the approach lies in a simple implementation of the connectives of quantum logic as introduced by Birkho# and von Neumann (1936), which defines the negation of a concept as the projection onto its orthogonal subspace, and the disjunction and conjunction of two concepts as the vector sum and intersection of their subspaces. This enables us to use the full lattice structure of a vector space, bringing these models much closer to traditional semantic lattice representations such as taxonomic concept hierarchies.
Text-Based Ontology Enrichment Using Hierarchical Self-organizing Maps
"... Abstract. The success of the Semantic Web research is dependent upon the construction of complete and reliable domain ontologies. In this paper we describe an unsupervised framework for domain ontology enrichment based on mining domain text corpora. Specifically, we enrich the hierarchical backbone ..."
Abstract
- Add to MetaCart
Abstract. The success of the Semantic Web research is dependent upon the construction of complete and reliable domain ontologies. In this paper we describe an unsupervised framework for domain ontology enrichment based on mining domain text corpora. Specifically, we enrich the hierarchical backbone of an existing ontology, i.e. its taxonomy, with new domain-specific concepts. The framework is based on an extended model of hierarchical self-organizing maps. As being founded on an unsupervised neural network architecture, the framework can be applied to different languages and domains. Terms extracted by mining a text corpus encode contextual content information, in a distributional vector space. The enrichment behaves like a classification of the extracted terms into the existing taxonomy by attaching them as hyponyms for the nodes of the taxonomy. The experiments reported are in the “Lonely Planet” tourism domain. The taxonomy and the corpus are the ones proposed in the PASCAL ontology learning and population challenge. The experimental results prove that the quality of the enrichment is considerably improved by using semantics based vector representations for the classified (newly added) terms, like the document category histograms (DCH) and the document frequency times inverse term frequency (DF-ITF) weighting scheme.

