Results 1 - 10
of
38
Gimme’ The Context: Context-driven Automatic Semantic Annotation with C-PANKOW
, 2005
"... Without the proliferation of formal semantic annotations, the Semantic Web is certainly doomed to failure. In earlier work we presented a new paradigm to avoid this: the ’Self Annotating Web’, in which globally available knowledge is used to annotate resources such as web pages. In particular, we pr ..."
Abstract
-
Cited by 60 (2 self)
- Add to MetaCart
Without the proliferation of formal semantic annotations, the Semantic Web is certainly doomed to failure. In earlier work we presented a new paradigm to avoid this: the ’Self Annotating Web’, in which globally available knowledge is used to annotate resources such as web pages. In particular, we presented a concrete method instantiating this paradigm, called PANKOW (Pattern-based ANnotation through Knowledge On the Web). In PANKOW, a named entity to be annotated is put into several linguistic patterns that convey competing semantic meanings. The patterns that are matched most often on the Web indicate the meaning of the named entity — leading to automatic or semi-automatic annotation. In this paper we present C-PANKOW (Context-driven PANKOW), which alleviates several shortcomings of PANKOW. First, by downloading abstracts and processing them off-line, we avoid the generation of large number of linguistic patterns and correspondingly large number of Google queries. Second, by linguistically analyzing and normalizing the downloaded abstracts, we increase the coverage of our pattern matching mechanism and overcome several limitations of the earlier pattern generation process. Third, we use the annotation context in order to distinguish the significance of a pattern match for the given annotation task. Our experiments show that C-PANKOW inherits all the advantages of PANKOW (no training required etc.), but in addition it is far more efficient and effective.
Toward an architecture for never-ending language learning
- In AAAI
, 2010
"... We consider here the problem of building a never-ending language learner; that is, an intelligent computer agent that runs forever and that each day must (1) extract, or read, information from the web to populate a growing structured knowledge base, and (2) learn to perform this task better than on ..."
Abstract
-
Cited by 36 (5 self)
- Add to MetaCart
We consider here the problem of building a never-ending language learner; that is, an intelligent computer agent that runs forever and that each day must (1) extract, or read, information from the web to populate a growing structured knowledge base, and (2) learn to perform this task better than on the previous day. In particular, we propose an approach and a set of design principles for such an agent, describe a partial implementation of such a system that has already learned to extract a knowledge base containing over 242,000 beliefs with an estimated precision of 74 % after running for 67 days, and discuss lessons learned from this preliminary attempt to build a never-ending learning agent.
Unsupervised Activity Recognition Using Automatically Mined Common Sense
- In AAAI
, 2005
"... A fundamental difficulty in recognizing human activities is obtaining the labeled data needed to learn models of those activities. Given emerging sensor technology, however, it is possible to view activity data as a stream of natural language terms. Activity models are then mappings from such t ..."
Abstract
-
Cited by 34 (5 self)
- Add to MetaCart
A fundamental difficulty in recognizing human activities is obtaining the labeled data needed to learn models of those activities. Given emerging sensor technology, however, it is possible to view activity data as a stream of natural language terms. Activity models are then mappings from such terms to activity names, and may be extracted from text corpora such as the web.
Extracting personal names from emails: Applying named entity recognition to informal text
- In HLT-EMNLP
, 2005
"... There has been little prior work on Named Entity Recognition for ”informal ” documents like email. We present two methods for improving performance of person name recognizers for email: emailspecific structural features and a recallenhancing method which exploits name repetition across multiple docu ..."
Abstract
-
Cited by 29 (8 self)
- Add to MetaCart
There has been little prior work on Named Entity Recognition for ”informal ” documents like email. We present two methods for improving performance of person name recognizers for email: emailspecific structural features and a recallenhancing method which exploits name repetition across multiple documents. 1
Unsupervised discovery of generic relationships using pattern clusters and its evaluation by automatically generated SAT analogy questions
- IN PROC. OF THE ANNUAL MEETING OF THE ACL
, 2008
"... We present a novel framework for the discovery and representation of general semantic relationships that hold between lexical items. We propose that each such relationship can be identified with a cluster of patterns that captures this relationship. We give a fully unsupervised algorithm for pattern ..."
Abstract
-
Cited by 18 (5 self)
- Add to MetaCart
We present a novel framework for the discovery and representation of general semantic relationships that hold between lexical items. We propose that each such relationship can be identified with a cluster of patterns that captures this relationship. We give a fully unsupervised algorithm for pattern cluster discovery, which searches, clusters and merges highfrequency words-based patterns around randomly selected hook words. Pattern clusters can be used to extract instances of the corresponding relationships. To assess the quality of discovered relationships, we use the pattern clusters to automatically generate SAT analogy questions. We also compare to a set of known relationships, achieving very good results in both methods. The evaluation (done in both English and Russian) substantiates the premise that our pattern clusters indeed reflect relationships perceived by humans.
Fully unsupervised discovery of concept-specific relationships by web mining
- Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics
, 2007
"... We present a web mining method for discovering and enhancing relationships in which a specified concept (word class) participates. We discover a whole range of relationships focused on the given concept, rather than generic known relationships as in most previous work. Our method is based on cluster ..."
Abstract
-
Cited by 16 (7 self)
- Add to MetaCart
We present a web mining method for discovering and enhancing relationships in which a specified concept (word class) participates. We discover a whole range of relationships focused on the given concept, rather than generic known relationships as in most previous work. Our method is based on clustering patterns that contain concept words and other words related to them. We evaluate the method on three different rich concepts and find that in each case the method generates a broad variety of relationships with good precision. 1
Mining Web Data for Competency Management
- In Proc. of Web Intelligence (WI 2005
, 2005
"... We present CORDER (COmmunity Relation Discovery by named Entity Recognition) an un-supervised machine learning algorithm that exploits named entity recognition and co-occurrence data to associate individuals in an organization with their expertise and associates. We discuss the problems associated w ..."
Abstract
-
Cited by 13 (5 self)
- Add to MetaCart
We present CORDER (COmmunity Relation Discovery by named Entity Recognition) an un-supervised machine learning algorithm that exploits named entity recognition and co-occurrence data to associate individuals in an organization with their expertise and associates. We discuss the problems associated with evaluating unsupervised learners and report our initial evaluation experiments. 1.
Sensor-Based Understanding of Daily Life via Large-Scale Use of Common Sense
"... The use of large quantities of common sense has long been thought to be critical to the automated understanding of the world. To this end, various groups have collected repositories of common sense in machinereadable form. However, efforts to apply these large bodies of knowledge to enable corr ..."
Abstract
-
Cited by 13 (1 self)
- Add to MetaCart
The use of large quantities of common sense has long been thought to be critical to the automated understanding of the world. To this end, various groups have collected repositories of common sense in machinereadable form. However, efforts to apply these large bodies of knowledge to enable correspondingly largescale sensor-based understanding of the world have been few. Challenges have included semantic gaps between facts in the repositories and phenomena detected by sensors, fragility of reasoning in the face of noise, incompleteness of repositories, and slowness of reasoning with these large repositories. We show how to address these problems with a combination of novel sensors, probabilistic representation, web-scale information retrieval and approximate reasoning. In particular, we show how to use the 50,000-fact hand-entered OpenMind Indoor Common Sense database to interpret sensor traces of day-to-day activities with 88% accuracy (which is easy) and 32/53% precision/recall (which is not).
A Geographic Knowledge Base for Semantic Web Applications
- In Proceedings of SBBD-05, the 20th Brazilian Symposium on Databases
, 2005
"... Abstract. This paper introduces GKB, a repository based on a domain independent meta-model for integrating geographic knowledge collected from multiple sources. We present the architecture, the repository design and the data cleaning and knowledge integration processes. We also describe the rules de ..."
Abstract
-
Cited by 10 (7 self)
- Add to MetaCart
Abstract. This paper introduces GKB, a repository based on a domain independent meta-model for integrating geographic knowledge collected from multiple sources. We present the architecture, the repository design and the data cleaning and knowledge integration processes. We also describe the rules developed to add new knowledge to GKB. GKB includes tools for generating ontologies, which are being used by multiple semantic web applications. To illustrate how it is being used, we present some of the applications that interact with the repository or load ontologies created with GKB.
Learning Text Patterns for Web Information Extraction and Assessment
, 2002
"... Learning text patterns that suggest a desired type of information is a common strategy for extracting information from unstructured text on the Web. In this paper, we introduce the idea that learned patterns can be used as both extractors (to generate new information) and discriminators (to assess t ..."
Abstract
-
Cited by 10 (2 self)
- Add to MetaCart
Learning text patterns that suggest a desired type of information is a common strategy for extracting information from unstructured text on the Web. In this paper, we introduce the idea that learned patterns can be used as both extractors (to generate new information) and discriminators (to assess the truth of extracted information). We demonstrate experimentally that a Web information extraction system (KnowItAll) can be improved (in terms of coverage and accuracy) through the addition of a simple pattern-learning algorithm. By using learned patterns as extractors, we are able to boost recall by 50 % to 80%; and by using such patterns as discriminators we are able to reduce classification errors by 28 % to 35%. In addition, the paper reports theoretical results on optimally selecting and ordering discriminators, and shows that this theory yields a heuristic that further reduces classification errors by an additional 19 % to 35 % – giving an overall error reduction of 47 % to 53%.

