Results 1 - 10
of
18
Personalized search based on user search histories
- In Proc. of International Conference of Knowledge Management(CIKM), Washington D.C., 2004
, 2005
"... User profiles, descriptions of user interests, can be used by search engines to provide personalized search results. Many approaches to creating user profiles collect user information through proxy servers (to capture browsing histories) or desktop bots (to capture activities on a personal computer) ..."
Abstract
-
Cited by 45 (1 self)
- Add to MetaCart
User profiles, descriptions of user interests, can be used by search engines to provide personalized search results. Many approaches to creating user profiles collect user information through proxy servers (to capture browsing histories) or desktop bots (to capture activities on a personal computer). Both these techniques require participation of the user to install the proxy server or the bot. In this study, we explore the use of a less-invasive means of gathering user information for personalized search. In particular, we build user profiles based on activity at the search site itself and study the use of these profiles to provide personalized search results. By implementing a wrapper around the Google search engine, we were able to collect information about individual user search activities. In particular, we collected the queries for which at least one search result was examined, and the snippets (titles and summaries) for each examined result. User profiles were created by classifying the collected information (queries or snippets) into concepts in a reference concept hierarchy. These profiles were
Word sense disambiguation: a survey
- ACM COMPUTING SURVEYS
, 2009
"... Word sense disambiguation (WSD) is the ability to identify the meaning of words in context in a computational manner. WSD is considered an AI-complete problem, that is, a task whose solution is at least as hard as the most difficult problems in artificial intelligence. We introduce the reader to the ..."
Abstract
-
Cited by 28 (9 self)
- Add to MetaCart
Word sense disambiguation (WSD) is the ability to identify the meaning of words in context in a computational manner. WSD is considered an AI-complete problem, that is, a task whose solution is at least as hard as the most difficult problems in artificial intelligence. We introduce the reader to the motivations for solving the ambiguity of words and provide a description of the task. We overview supervised, unsupervised, and knowledge-based approaches. The assessment of WSD systems is discussed in the context of the Senseval/Semeval campaigns, aiming at the objective evaluation of systems participating in several different disambiguation tasks. Finally, applications, open problems, and future directions are discussed.
Ontology and the Lexicon
- In Handbook on Ontologies in Information Systems
, 2003
"... ly have a separate entry for each category; for example, flap would have one entry as a noun and another as a verb. Separate entries are usually also appropriate for each of the senses of a homonym---a word that has more than one unrelated sense even within a single syntactic category; for example, ..."
Abstract
-
Cited by 21 (0 self)
- Add to MetaCart
ly have a separate entry for each category; for example, flap would have one entry as a noun and another as a verb. Separate entries are usually also appropriate for each of the senses of a homonym---a word that has more than one unrelated sense even within a single syntactic category; for example, the noun pen would have distinct entries for the senses writing instrument, animal enclosure,andswan. Polysemy--- related or overlapping senses---is a more-complex situation; sometimes the senses may be discrete enough that we can treat them as distinct: for example, window as both opening in wall and glass pane in opening in wall (fall through the window; break the window). But this is not always so; the word open, for example, has many overlapping senses concerning unfolding, expanding, revealing, moving to an open position, making openings in, and so on, and separating them into discrete senses, as the writers of dictionary definitions try to do, is not possible (see also sections 1.2.3 a
OntoNotes: A Unified Relational Semantic Representation
"... The OntoNotes project is creating a corpus of largescale, accurate, and integrated annotation of multiple levels of the shallow semantic structure in text. Such rich, integrated annotation covering many levels will allow for richer, cross-level models enabling significantly better automatic semantic ..."
Abstract
-
Cited by 11 (2 self)
- Add to MetaCart
The OntoNotes project is creating a corpus of largescale, accurate, and integrated annotation of multiple levels of the shallow semantic structure in text. Such rich, integrated annotation covering many levels will allow for richer, cross-level models enabling significantly better automatic semantic analysis. At the same time, it demands a robust, efficient, scalable mechanism for storing and accessing these complex inter-dependent annotations. We describe a relational database representation that captures both the inter- and intra-layer dependencies and provide details of an object-oriented API for efficient, multi-tiered access to this data. 1
Extending Metadata Definitions by Automatically Extracting and Organizing Glossary Definitions
- IN PROCEEDINGS OF THE NATIONAL CONFERENCE ON DIGITAL GOVERNMENT RESEARCH
, 2003
"... Metadata descriptions of database contents are required to build and use systems that access and deliver data in response to user requests. When numerous heterogeneous databases are brought together in a single system, their various metadata formalizations must be homogenized and integrated in or ..."
Abstract
-
Cited by 11 (6 self)
- Add to MetaCart
Metadata descriptions of database contents are required to build and use systems that access and deliver data in response to user requests. When numerous heterogeneous databases are brought together in a single system, their various metadata formalizations must be homogenized and integrated in order to support the access planning and delivery system. This integration is a tedious process that requires human expertise and attention. In this paper we describe a method of speeding up the formalization and integration of new metadata. The method takes advantage of the fact that databases are often described in web pages containing natural language glossaries that define pertinent aspects of the data. Given a root URL, our method identifies likely glossaries, extracts and formalizes aspects of relevant concepts defined in them, and automatically integrates the new formalized metadata concepts into a large model of the domain and associated conceptualizations.
Methodologies for the Reliable Construction of Ontological Knowledge
- In Proceedings of ICCS 2005
, 2005
"... Abstract. This paper addresses the methodology of ontology construction. It identifies five styles of approach to ontologizing (deriving from philosophy, cognitive science, linguistics, AI/computational linguistics, and domain reasoning) and argues that they do not provide the same results. It then ..."
Abstract
-
Cited by 9 (3 self)
- Add to MetaCart
Abstract. This paper addresses the methodology of ontology construction. It identifies five styles of approach to ontologizing (deriving from philosophy, cognitive science, linguistics, AI/computational linguistics, and domain reasoning) and argues that they do not provide the same results. It then provides a more detailed example of one of the approaches. 1
B.: Tackling the internet glossary glut: Automatic extraction and evaluation of genus phrases
- In: Proceedings of the SIGIR’03 Workshop on Semantic Web. (2003
, 2003
"... This paper addresses the problem of developing methods to be used in the identification and extraction of meaningful semantic components from large online glossaries. We present two sets of results. First, we report on the algorithm, ParseGloss, which was used to analyze definitions, and extract the ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
This paper addresses the problem of developing methods to be used in the identification and extraction of meaningful semantic components from large online glossaries. We present two sets of results. First, we report on the algorithm, ParseGloss, which was used to analyze definitions, and extract the main concept, or genus phrase. We ran the system on over 12,000 online glossary entries. Second, we present a method to evaluate our results, using human judgments on a collection of definitions from six different sources. This paper discusses our approach to the evaluation process, since the creation of a standard for evaluation is in itself a contribution to the field. The methods we have developed have required addressing the significant challenges of abstracting a single gold standard from multiple naive, human judgments on a highly subjective task. Once the method for creating the standard was developed, we then established the gold standard data. We report on our performance in running ParseGloss over this controlled collection of definitions. Our first set of results presents precision and recall on system performance. Our second results are presented in terms of techniques for determining agreement between human subjects. Success in the ParseGloss algorithm will contribute to the automatic creation of ontologies.
Linguistic Watermark 3.0: an RDF framework and a software library for bridging language and ontologies in the Semantic Web
"... Abstract. In this paper, we present a framework for representing heterogeneous linguistic resources and for integrating their content with Semantic Web ontologies. This work, which extends and improves previous research conducted by these same authors, articulates into two main results: first, a set ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
Abstract. In this paper, we present a framework for representing heterogeneous linguistic resources and for integrating their content with Semantic Web ontologies. This work, which extends and improves previous research conducted by these same authors, articulates into two main results: first, a set of coordinated RDF vocabularies providing descriptors for representing linguistic resources and their software counterparts, as well a collection of metadata for describing the linguistic enrichment of ontologies, both on quantitative and qualitative grounds. The second result is a software library for accessing resources described according to the above vocabularies and for evaluating the quality of linguistically enriched ontologies. 1.
OntoNotes: Sense Pool Verification Using Google N-gram and Statistical Tests
"... Abstract. The OntoNotes project has developed a methodology for producing a large multilingual corpus with annotation of predicate-argument structure, word senses, ontology linking, and coreference. The underlying semantic model of OntoNotes involves word senses that are grouped into so-called sense ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Abstract. The OntoNotes project has developed a methodology for producing a large multilingual corpus with annotation of predicate-argument structure, word senses, ontology linking, and coreference. The underlying semantic model of OntoNotes involves word senses that are grouped into so-called sense pools, i.e., sets of near-synonymous senses of words. Such information is useful for many applications, including query expansion for information retrieval (IR) systems, (near-)duplicate detection for text summarization systems, and alternative word selection for writing support systems. Once senses have been created and verified by annotation, sense pools are formed by an expert. Verification of sense pools is the topic of this paper. This paper describes a two-stage framework that combines machine and human verification of sense pools. The machine verification acts as a filter to select candidate pool members based on n-gram frequencies obtained from Google and subjected to appropriate statistical measures. The remaining candidates are then passed to humans for final verification. Our experimental results demonstrate that the machine verification can save much human verification work and thus facilitate the development of sense pools.
1 Using a Natural Language Understanding System to Generate Semantic Web Content
"... We describe our research on automatically generating rich semantic annotations of text and making it available on the Semantic Web. In particular, we discuss the challenges involved in adapting the OntoSem natural language processing system for this purpose. OntoSem, an implementation of the theory ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
We describe our research on automatically generating rich semantic annotations of text and making it available on the Semantic Web. In particular, we discuss the challenges involved in adapting the OntoSem natural language processing system for this purpose. OntoSem, an implementation of the theory of ontological semantics under continuous development for over fifteen years, uses a specially constructed NLP-oriented ontology and an ontological-semantic lexicon to translate English text into a custom ontology-motivated knowledge representation language, the language of text meaning representations (TMRs). OntoSem concentrates on a variety of ambiguity resolution tasks as well as processing unexpected input and reference. To adapt OntoSem’s representation to the Semantic Web, we developed a translation system, OntoSem2OWL, between the TMR language into the Semantic Web language OWL. We next used OntoSem and OntoSem2OWL to support SemNews, an experimental web service that monitors RSS news sources, processes the summaries of the news stories and publishes a structured representation of the meaning of the text in the news story.

