Results 1 - 10
of
38
Organizing and searching the World Wide Web of facts - step one: the one-million fact extraction challenge
- In Proceedings of the 21st National Conference on Artificial Intelligence (AAAI-06
, 2006
"... Due to the inherent difficulty of processing noisy text, the potential of the Web as a decentralized repository of human knowledge remains largely untapped during Web search. The access to billions of binary relations among named entities would enable new search paradigms and alternative methods for ..."
Abstract
-
Cited by 48 (4 self)
- Add to MetaCart
Due to the inherent difficulty of processing noisy text, the potential of the Web as a decentralized repository of human knowledge remains largely untapped during Web search. The access to billions of binary relations among named entities would enable new search paradigms and alternative methods for presenting the search results. A first concrete step towards building large searchable repositories of factual knowledge is to derive such knowledge automatically at large scale from textual documents. Generalized contextual extraction patterns allow for fast iterative progression towards extracting one million facts of a given type (e.g., Person-BornIn-Year) from 100 million Web documents of arbitrary quality. The extraction starts from as few as 10 seed facts, requires no additional input knowledge or annotated text, and emphasizes scale and coverage by avoiding the use of syntactic parsers, named entity recognizers, gazetteers, and similar text processing tools and resources.
Unsupervised discovery of generic relationships using pattern clusters and its evaluation by automatically generated SAT analogy questions
- IN PROC. OF THE ANNUAL MEETING OF THE ACL
, 2008
"... We present a novel framework for the discovery and representation of general semantic relationships that hold between lexical items. We propose that each such relationship can be identified with a cluster of patterns that captures this relationship. We give a fully unsupervised algorithm for pattern ..."
Abstract
-
Cited by 18 (5 self)
- Add to MetaCart
We present a novel framework for the discovery and representation of general semantic relationships that hold between lexical items. We propose that each such relationship can be identified with a cluster of patterns that captures this relationship. We give a fully unsupervised algorithm for pattern cluster discovery, which searches, clusters and merges highfrequency words-based patterns around randomly selected hook words. Pattern clusters can be used to extract instances of the corresponding relationships. To assess the quality of discovered relationships, we use the pattern clusters to automatically generate SAT analogy questions. We also compare to a set of known relationships, achieving very good results in both methods. The evaluation (done in both English and Russian) substantiates the premise that our pattern clusters indeed reflect relationships perceived by humans.
Fully unsupervised discovery of concept-specific relationships by web mining
- Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics
, 2007
"... We present a web mining method for discovering and enhancing relationships in which a specified concept (word class) participates. We discover a whole range of relationships focused on the given concept, rather than generic known relationships as in most previous work. Our method is based on cluster ..."
Abstract
-
Cited by 16 (7 self)
- Add to MetaCart
We present a web mining method for discovering and enhancing relationships in which a specified concept (word class) participates. We discover a whole range of relationships focused on the given concept, rather than generic known relationships as in most previous work. Our method is based on clustering patterns that contain concept words and other words related to them. We evaluate the method on three different rich concepts and find that in each case the method generates a broad variety of relationships with good precision. 1
Extracting Semantic Networks from Text Via Relational Clustering
"... Abstract. Extracting knowledge from text has long been a goal of AI. Initial approaches were purely logical and brittle. More recently, the availability of large quantities of text on the Web has led to the development of machine learning approaches. However, to date these have mainly extracted grou ..."
Abstract
-
Cited by 14 (5 self)
- Add to MetaCart
Abstract. Extracting knowledge from text has long been a goal of AI. Initial approaches were purely logical and brittle. More recently, the availability of large quantities of text on the Web has led to the development of machine learning approaches. However, to date these have mainly extracted ground facts, as opposed to general knowledge. Other learning approaches can extract logical forms, but require supervision and do not scale. In this paper we present an unsupervised approach to extracting semantic networks from large volumes of text. We use the TextRunner system [1] to extract tuples from text, and then induce general concepts and relations from them by jointly clustering the objects and relational strings in the tuples. Our approach is defined in Markov logic using four simple rules. Experiments on a dataset of two million tuples show that it outperforms three other relational clustering approaches, and extracts meaningful semantic networks. 1
Mining Web Data for Competency Management
- In Proc. of Web Intelligence (WI 2005
, 2005
"... We present CORDER (COmmunity Relation Discovery by named Entity Recognition) an un-supervised machine learning algorithm that exploits named entity recognition and co-occurrence data to associate individuals in an organization with their expertise and associates. We discuss the problems associated w ..."
Abstract
-
Cited by 13 (5 self)
- Add to MetaCart
We present CORDER (COmmunity Relation Discovery by named Entity Recognition) an un-supervised machine learning algorithm that exploits named entity recognition and co-occurrence data to associate individuals in an organization with their expertise and associates. We discuss the problems associated with evaluating unsupervised learners and report our initial evaluation experiments. 1.
Unsupervised methods for determining object and relation synonyms on the web
- Journal of Artificial Intelligence Research
, 2009
"... The task of identifying synonymous relations and objects, or synonym resolution, is critical for high-quality information extraction. This paper investigates synonym resolution in the context of unsupervised information extraction, where neither hand-tagged training examples nor domain knowledge is ..."
Abstract
-
Cited by 13 (2 self)
- Add to MetaCart
The task of identifying synonymous relations and objects, or synonym resolution, is critical for high-quality information extraction. This paper investigates synonym resolution in the context of unsupervised information extraction, where neither hand-tagged training examples nor domain knowledge is available. The paper presents a scalable, fullyimplemented system that runs in O(KN log N) time in the number of extractions, N, and the maximum number of synonyms per word, K. The system, called Resolver, introduces a probabilistic relational model for predicting whether two strings are co-referential based on the similarity of the assertions containing them. On a set of two million assertions extracted from the Web, Resolver resolves objects with 78 % precision and 68 % recall, and resolves relations with 90 % precision and 35 % recall. Several variations of Resolver’s probabilistic model are explored, and experiments demonstrate that under appropriate conditions these variations can improve F1 by 5%. An extension to the basic Resolver system allows it to handle polysemous names with 97 % precision and 95 % recall on a data set from the TREC corpus.
Topic Identification for Fine-Grained Opinion Analysis
"... Within the area of general-purpose finegrained subjectivity analysis, opinion topic identification has, to date, received little attention due to both the difficulty of the task and the lack of appropriately annotated resources. In this paper, we provide an operational definition of opinion topic an ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
Within the area of general-purpose finegrained subjectivity analysis, opinion topic identification has, to date, received little attention due to both the difficulty of the task and the lack of appropriately annotated resources. In this paper, we provide an operational definition of opinion topic and present an algorithm for opinion topic identification that, following our new definition, treats the task as a problem in topic coreference resolution. We develop a methodology for the manual annotation of opinion topics and use it to annotate topic information for a portion of an existing general-purpose opinion corpus. In experiments using the corpus, our topic identification approach statistically significantly outperforms several non-trivial baselines according to three evaluation measures. 1
Discovering Relations between Named Entities from a Large Raw Corpus Using Tree Similarity-based Clustering
"... Abstract. We propose a tree-similarity-based unsupervised learning method to extract relations between Named Entities from a large raw corpus. Our method regards relation extraction as a clustering problem on shallow parse trees. First, we modify previous tree kernels on relation extraction to estim ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
Abstract. We propose a tree-similarity-based unsupervised learning method to extract relations between Named Entities from a large raw corpus. Our method regards relation extraction as a clustering problem on shallow parse trees. First, we modify previous tree kernels on relation extraction to estimate the similarity between parse trees more efficiently. Then, the similarity between parse trees is used in a hierarchical clustering algorithm to group entity pairs into different clusters. Finally, each cluster is labeled by an indicative word and unreliable clusters are pruned out. Evaluation on the New York Times (1995) corpus shows that our method outperforms the only previous work by 5 in F-measure. It also shows that our method performs well on both high-frequent and lessfrequent entity pairs. To the best of our knowledge, this is the first work to use a tree similarity metric in relation clustering. 1
Discovering overlapping communities of named entities
- Knowledge Discovery in Databases: PKDD 2006 (LNCS 4213
, 2006
"... Abstract. Although community discovery based on social network analysis has been studied extensively in the Web hyperlink environment, limited research has been done in the case of named entities in text documents. The cooccurrence of entities in documents usually implies some connections among them ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
Abstract. Although community discovery based on social network analysis has been studied extensively in the Web hyperlink environment, limited research has been done in the case of named entities in text documents. The cooccurrence of entities in documents usually implies some connections among them. Investigating such connections can reveal important patterns. In this paper, we mine communities among named entities in Web documents and text corpus. Most existing works on community discovery generate a partition of the entity network, assuming each entity belongs to one community. However, in the scenario of named entities, an entity may participate in several communities. For example, a person is in the communities of his/her family, colleagues, and friends. In this paper, we propose a novel technique to mine overlapping communities of named entities. This technique is based on triangle formation, expansion, and clustering with content similarity. Our experimental results show that the proposed technique is highly effective.
Web information extraction and user modeling: towards closing the gap
- IEEE Data Engineering Bulletin
, 2005
"... Web search engines have become the primary method of accessing information on the web. Billions of queries are submitted to major web search engines, reflecting a wide range of information needs. While significant progress has been made on improving the relevance of the results, web search process o ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
Web search engines have become the primary method of accessing information on the web. Billions of queries are submitted to major web search engines, reflecting a wide range of information needs. While significant progress has been made on improving the relevance of the results, web search process often remains a frustrating experience. At the same time, web information extraction has seen tremendous progress, such that knowledge bases of millions of facts extracted from the web are now a reality. Yet it is not clear how effectively these knowledge bases support common user information needs. We posit that a key for web information extraction to significantly impact the web search experience is to connect the extraction process with user modeling, particularly with automatic methods for inferring user information needs and anticipated interaction patterns. In this paper we overview some recent efforts for user modeling and inferring user preferences in the context of closing the gap between web information extraction and user modeling. 1

