Results 1 - 10
of
14
Biperpedia: An Ontology for Search Applications
"... Search engines make significant efforts to recognize queries that can be answered by structured data and invest heavily in creating and maintaining high-precision databases. While these databases have a relatively wide coverage of entities, the number of attributes they model (e.g., GDP, CAPITAL, AN ..."
Abstract
-
Cited by 9 (2 self)
- Add to MetaCart
(Show Context)
Search engines make significant efforts to recognize queries that can be answered by structured data and invest heavily in creating and maintaining high-precision databases. While these databases have a relatively wide coverage of entities, the number of attributes they model (e.g., GDP, CAPITAL, ANTHEM) is relatively small. Extending the number of attributes known to the search engine can enable it to more precisely answer queries from the long and heavy tail, extract a broader range of facts from the Web, and recover the semantics of tables on the Web. We describe Biperpedia, an ontology with 1.6M (class, attribute) pairs and 67K distinct attribute names. Biperpedia extracts attributes from the query stream, and then uses the best extractions to seed attribute extraction from text. For every attribute Biperpedia saves a set of synonyms and text patterns in which it appears, thereby enabling it to recognize the attribute in more contexts. In addition to a detailed analysis of the quality of Biperpedia, we show that it can increase the number of Web tables whose semantics we can recover by more than a factor of 4 compared with Freebase. 1.
Classdriven attribute extraction
- In Proceedings of the 22nd International Conference on Computational Linguistics (COLING-08
, 2008
"... All in-text references underlined in blue are linked to publications on ResearchGate, letting you access and read them immediately. ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
All in-text references underlined in blue are linked to publications on ResearchGate, letting you access and read them immediately.
Attribute extraction and scoring: A probabilistic approach.
- In ICDE,
, 2013
"... Abstract-Knowledge bases, which consist of concepts, entities, attributes and relations, are increasingly important in a wide range of applications. We argue that knowledge about attributes (of concepts or entities) plays a critical role in inferencing. In this paper, we propose methods to derive a ..."
Abstract
-
Cited by 6 (3 self)
- Add to MetaCart
(Show Context)
Abstract-Knowledge bases, which consist of concepts, entities, attributes and relations, are increasingly important in a wide range of applications. We argue that knowledge about attributes (of concepts or entities) plays a critical role in inferencing. In this paper, we propose methods to derive attributes for millions of concepts and we quantify the typicality of the attributes with regard to their corresponding concepts. We employ multiple data sources such as web documents, search logs, and existing knowledge bases, and we derive typicality scores for attributes by aggregating different distributions derived from different sources using different methods. To the best of our knowledge, ours is the first approach to integrate concept-and instance-based patterns into probabilistic typicality scores that scale to broad concept space. We have conducted extensive experiments to show the effectiveness of our approach.
Acquiring knowledge about human goals from search query logs
- INFORMATION PROCESSING AND MANAGEMENT
, 2011
"... A better understanding of what motivates humans to perform certain actions is relevant for a range of research challenges including generating action sequences that implement goals (planning). A first step in this direction is the task of acquiring knowledge about human goals. In this work, we inves ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
A better understanding of what motivates humans to perform certain actions is relevant for a range of research challenges including generating action sequences that implement goals (planning). A first step in this direction is the task of acquiring knowledge about human goals. In this work, we investigate whether Search Query Logs are a viable source for extracting expressions of human goals. For this purpose, we devise an algorithm that automatically identifies queries containing explicit goals such as find home to rent in Florida. Evaluation results of our algorithm achieve useful precision/recall values. We apply the classification algorithm to two large Search Query Logs, recorded by AOL and Microsoft Research in 2006, and obtain a set of ∼110.000 queries containing explicit goals. To study the nature of human goals in Search Query Logs, we conduct qualitative, quantitative and comparative analyses. Our findings suggest that Search Query Logs (i) represent a viable source for extracting human goals, (ii) contain a great variety of human goals and (iii) contain human goals that can be employed to complement existing commonsense knowledge bases. Finally, we illustrate the potential of goal knowledge for addressing following application scenario: to refine and extend commonsense knowledge with human goals from Search Query Logs. This work is relevant for (i) knowledge engineers interested in acquiring human goals from textual corpora and constructing knowledge bases of human goals (ii) researchers interested in studying characteristics of human goals in Search Query Logs.
A Scalable Machine-Learning Approach for Semi-Structured Named Entity Recognition
, 2010
"... Named entity recognition studies the problem of locating and classifying parts of free text into a set of predefined categories. Although extensive research has focused on the detection of person, location and organization entities, there are many other entities of interest, including phone numbers, ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Named entity recognition studies the problem of locating and classifying parts of free text into a set of predefined categories. Although extensive research has focused on the detection of person, location and organization entities, there are many other entities of interest, including phone numbers, dates, times and currencies (to name a few examples). We refer to these types of entities as semistructured named entities, since they usually follow certain syntactic formats according to some conventions, although their structure is typically not well-defined. Regular expression solutions require significant amount of manual effort and supervised machine learning approaches rely on large sets of labeled training data. Therefore, these approaches do not scale when we need to support many semi-structured entity types in many languages and regions. In this paper, we study this problem and propose a novel threelevel bootstrapping framework for the detection of semi-structured entities. We describe the proposed techniques for phone, date and time entities, and perform extensive evaluations on English, German, Polish, Swedish and Turkish documents. Despite the minimal input from the user, our approach can achieve 95 % precision and
Studying Databases of Intentions: Do Search Query Logs Capture Knowledge about Common Human Goals?
"... Access to knowledge about common human goals has been found critical for realizing the vision of intelligent agents acting upon user intent on the web. Yet, the acquisition of knowledge about common human goals represents a major challenge. In a departure from existing approaches, this paper investi ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
(Show Context)
Access to knowledge about common human goals has been found critical for realizing the vision of intelligent agents acting upon user intent on the web. Yet, the acquisition of knowledge about common human goals represents a major challenge. In a departure from existing approaches, this paper investigates a novel resource for knowledge acquisition: The utilization of search query logs for this task. By relating goals contained in search query logs with goals contained in existing commonsense knowledge bases such as ConceptNet, we aim to shed light on the usefulness of search query logs for capturing knowledge about common human goals. The main contribution of this paper consists of insights generated from an empirical study comparing common human goals contained in two large search query logs (AOL and Microsoft Research) with goals contained in the commonsense knowledge base ConceptNet. The paper sketches ways how goals from search query logs could be used to address the goal acquisition and goal coverage problem related to commonsense knowledge bases.
Instance Sense Induction from Attribute Sets
"... This paper investigates the new problem of automatic sense induction for instance names using automatically extracted attribute sets. Several clustering strategies and data sources are described and evaluated. We also discuss the drawbacks of the evaluation metrics commonly used in similar clusterin ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
This paper investigates the new problem of automatic sense induction for instance names using automatically extracted attribute sets. Several clustering strategies and data sources are described and evaluated. We also discuss the drawbacks of the evaluation metrics commonly used in similar clustering tasks. The results show improvements in most metrics with respect to the baselines, especially for polysemous instances. 1
Notes on the Acquisition of Conditional Knowledge
, 2008
"... Supported by NSF grants IIS-0328849 and IIS-0535105. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the Research in Information Extraction has been overly focused on the extraction of facts concerning individuals as compared to general knowledge pe ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Supported by NSF grants IIS-0328849 and IIS-0535105. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the Research in Information Extraction has been overly focused on the extraction of facts concerning individuals as compared to general knowledge pertaining to classes of entities and events. In addition, preference has been given to simple techniques in order to enable high volume throughput. In what follows we give examples of existing work in the field of knowledge acquisition, then follow with ideas on areas for exploration beyond the current state of the art, specifically with respect to the extraction of conditional knowledge, making use of deeper linguistic analysis than is currently
1. EXTRACTION OF ATTRIBUTES
"... As an alternative to previous studies on extracting class attributes from unstructured text, which consider either Web documents or query logs as the source of textual data, A bootstrapped method extracts class attributes simultaneously from both sources, using a small set of seed attributes. The me ..."
Abstract
- Add to MetaCart
(Show Context)
As an alternative to previous studies on extracting class attributes from unstructured text, which consider either Web documents or query logs as the source of textual data, A bootstrapped method extracts class attributes simultaneously from both sources, using a small set of seed attributes. The method improves extraction preci-sion and also improves attribute relevance across 40 test classes.