Results 1 - 10
of
13
Collective Annotation of Wikipedia Entities in Web Text
"... To take the first step beyond keyword-based search toward entity-based search, suitable token spans (“spots”) on documents must be identified as references to real-world entities from an entity catalog. Several systems have been proposed to link spots on Web pages to entities in Wikipedia. They are ..."
Abstract
-
Cited by 20 (3 self)
- Add to MetaCart
To take the first step beyond keyword-based search toward entity-based search, suitable token spans (“spots”) on documents must be identified as references to real-world entities from an entity catalog. Several systems have been proposed to link spots on Web pages to entities in Wikipedia. They are largely based on local compatibility between the text around the spot and textual metadata associated with the entity. Two recent systems exploit inter-label dependencies, but in limited ways. We propose a general collective disambiguation approach. Our premise is that coherent documents refer to entities from one or a few related topics or domains. We give formulations for the trade-off between local spot-to-entity compatibility and measures of global coherence between entities. Optimizing the overall entity assignment is NP-hard. We investigate practical solutions based on local hill-climbing, rounding integer linear programs, and pre-clustering entities followed by local optimization within clusters. In experiments involving over a hundred manuallyannotated Web pages and tens of thousands of spots, our approaches significantly outperform recently-proposed algorithms.
V.: A Study on Automated Relation Labelling in Ontology Learning
- Ontology Learning from Text: Methods, Evaluation and Applications. IOS
, 2005
"... ..."
Annotating and Searching Web Tables Using Entities, Types and Relationships
"... Tables are a universal idiom to present relational data. Billions of tables on Web pages express entity references, attributes and relationships. This representation of relational world knowledge is usually considerably better than completely unstructured, free-format text. At the same time, unlike ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
Tables are a universal idiom to present relational data. Billions of tables on Web pages express entity references, attributes and relationships. This representation of relational world knowledge is usually considerably better than completely unstructured, free-format text. At the same time, unlike manually-created knowledge bases, relational information mined from “organic ” Web tables need not be constrained by availability of precious editorial time. Unfortunately, in the absence of any formal, uniform schema imposed on Web tables, Web search cannot take advantage of these high-quality sources of relational information. In this paper we propose new machine learning techniques to annotate table cells with entities that they likely mention, table columns with types from which entities are drawn for cells in the column, and relations that pairs of table columns seek to express. We propose a new graphical model for making all these labeling decisions for each table simultaneously, rather than make separate local decisions for entities, types and relations. Experiments using the YAGO catalog, DB-Pedia, tables from Wikipedia, and over 25 million HTML tables from a 500 million page Web crawl uniformly show the superiority of our approach. We also evaluate the impact of better annotations on a prototype relational Web search tool. We demonstrate clear benefits of our annotations beyond indexing tables in a purely textual manner. 1.
Magpie: Experiences in supporting Semantic Web browsing
, 2007
"... Magpie has been one of the first truly effective approaches to bringing semantics into the web browsing experience. The key innovation brought by Magpie was the replacement of a manual annotation process by an automatically associated ontology-based semantic layer over web resources, which ensured a ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
Magpie has been one of the first truly effective approaches to bringing semantics into the web browsing experience. The key innovation brought by Magpie was the replacement of a manual annotation process by an automatically associated ontology-based semantic layer over web resources, which ensured added value at no cost for the user. Magpie also differs from older open hypermedia systems: its associations between entities in a web page and semantic concepts from an ontology enable link typing and subsequent interpretation of the resource. The semantic layer in Magpie also facilitates locating semantic services and making them available to the user, so that they can be manually activated by a user or opportunistically triggered when appropriate patterns are encountered during browsing. In this paper we track the evolution of Magpie as a technology for developing open and flexible Semantic Web applications. Magpie emerged from our research into user-accessible Semantic Web, and we use this viewpoint to assess the role of tools like Magpie in making semantic content useful for ordinary users. We see such tools as crucial in bootstrapping the Semantic Web through the automation of the knowledge generation process.
Metadata-driven personal knowledge publishing
- In Proceedings of 3rd International Semantic Web Conference 2004
, 2004
"... Abstract. We propose a personal knowledge publishing system called Semblog is realized with integration of Semantic Web techniques and Weblog tools. Semblog suite provides an integrated environment for gathering, authoring, publishing, and making human relationship seamlessly to enable people to exc ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
Abstract. We propose a personal knowledge publishing system called Semblog is realized with integration of Semantic Web techniques and Weblog tools. Semblog suite provides an integrated environment for gathering, authoring, publishing, and making human relationship seamlessly to enable people to exchange information and knowledge with easy and casual fashion. We use a lightweight metadata format like RSS to activate the information flow and its activities. We define three level of interest of information gathering and publishing i.e., “check”, “clip ” and “post ” and provide suitable ways to distribute information depending on the interest level. Our system called Semblog platform consists of two types of extended content aggregator and information retrieval / recommendation applications. We also design a new metadata module to define personal ontology that realizes semantic relations among people and Weblog sites. 1
Targeted Disambiguation of Ad-hoc, Homogeneous Sets of Named Entities
, 2012
"... In many entity extraction applications, the entities to be recognized are constrained to be from a list of “target entities”. In many cases, these target entities are (i) ad-hoc, i.e., do not exist in a knowledge base and (ii) homogeneous (e.g., all the entities are IT companies). We study the follo ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
In many entity extraction applications, the entities to be recognized are constrained to be from a list of “target entities”. In many cases, these target entities are (i) ad-hoc, i.e., do not exist in a knowledge base and (ii) homogeneous (e.g., all the entities are IT companies). We study the following novel disambiguation problem in this unique setting: given the candidate mentions of all the target entities, determine which ones are true mentions of a target entity. Prior techniques only consider target entities present in a knowledge base and/or having a rich set of attributes. In this paper, we develop novel techniques that require no knowledge about the entities excepttheir names. Ourmain insight is to leverage the homogeneity constraint and disambiguate the candidate mentions collectively across all documents. We propose a graph-based model, called MentionRank, for that purpose. Furthermore, if additional knowledge is available for some or all of the entities, our model can leverage it to further improve quality. Our experiments demonstrate the effectiveness of our model. To the best of our knowledge, this is the first work on targeted entity disambiguation for ad-hoc entities.
Towards Constructing a Chinese Information Extraction System to Support Innovations in Library Services. [Vers l’élaboration d’un système chinois d’extraction d’informations pour soutenir les innovations dans le cadre des service bibliothécaires.] IFLA Jo
- IFLA Journal
"... On behalf of: ..."
The Message without the Medium: Unifying Modern Messaging Paradigms through the Semantic Web
, 2004
"... Since its inception, the Internet has been a hotbed of successful communications channels, starting off with e-mail, Internet Relay Chat and Usenet newsgroups and more recently adding weblogs, instant messaging, and news feeds. These systems have been developed quite independently over the past half ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Since its inception, the Internet has been a hotbed of successful communications channels, starting off with e-mail, Internet Relay Chat and Usenet newsgroups and more recently adding weblogs, instant messaging, and news feeds. These systems have been developed quite independently over the past half century and continue to be extended with new functionality that addresses the broadening needs of their users and supports the full range of semantic expression. Stepping back, however, we observe that having a variety of messaging frameworks creates significant problems for users when attempting to manage and collate messages on a single topic or context that may be discussed via multiple media. We posit that no message should be constrained in this way by its medium. As it is, messaging applications are slowly converging in their functionalities. We show that a unified approach to messaging can be achieved in a single step through appropriate use of the RDF, a Semantic Web technology, as a data model. We further exploit this data model to develop appropriate user interface elements that allow aggregation of messages across protocols, and discuss the benefits that arise from such a scenario.
Discovery of Lexical Entries for Non--Taxonomic
- In: Proceedings of SOFSEM 2004: Theory and Practice of Computer Science, LNCS 2932
, 2004
"... Ontology learning from texts has recently been proposed as a new technology helping ontology designers in the modelling process. ..."
Abstract
- Add to MetaCart
Ontology learning from texts has recently been proposed as a new technology helping ontology designers in the modelling process.
The Instance Store:
- In Proc. of the 2004 Description Logic Workshop (DL 2004
, 2004
"... We present an application -- the Instance Store -- aimed at solving some of the scalability problems that arise when reasoning with the large numbers of individuals envisaged in the semantic web. The approach uses well-known techniques for reducing description logic reasoning with individuals to ..."
Abstract
- Add to MetaCart
We present an application -- the Instance Store -- aimed at solving some of the scalability problems that arise when reasoning with the large numbers of individuals envisaged in the semantic web. The approach uses well-known techniques for reducing description logic reasoning with individuals to reasoning with concepts. Crucial to the implementation is the combination of a description logic terminological reasoner with a traditional relational database. The resulting form of inference, although specialised, is sound and complete and sufficient for several interesting applications. Most importantly, the application scales to sizes (over 100,000s individuals) where all other existing applications fail. This claim is substantiated by a detailed empirical evaluation of the Instance Store in contrast with existing alternative approaches.

