Results 1 - 10
of
49
Dissociation and Propagation for Efficient Query Evaluation over Probabilistic Databases
, 2010
"... Queries over probabilistic databases are either safe, in which case they can be evaluated entirely in a relational database engine, or unsafe, in which case they need to be evaluated with a general-purpose inference engine at a high cost. This paper proposes a new approach by which every query is e ..."
Abstract
-
Cited by 11 (8 self)
- Add to MetaCart
Queries over probabilistic databases are either safe, in which case they can be evaluated entirely in a relational database engine, or unsafe, in which case they need to be evaluated with a general-purpose inference engine at a high cost. This paper proposes a new approach by which every query is evaluated like a safe query inside the database engine, by using a new method called dissociation. A dissociated query is obtained by adding extraneous variables to some atoms until the query becomes safe. We show that the probability of the original query and that of the dissociated query correspond to two well-known scoring functions on graphs, namely graph reliability (which is #P-hard), and the propagation score (which is related to PageRank and is in PTIME): When restricted to graphs, standard query probability is graph reliability, while the dissociated probability is the propagation score. We define a propagation score for conjunctive queries without self-joins and prove (i) that it is is always an upper bound for query reliability, and (ii) that both scores coincide for all safe queries. Given the widespread and successful use of graph propagation methods in practice, we argue for the dissociation method as a good and efficient way to rank probabilistic query results, especially for those queries which are highly intractable for exact probabilistic inference.
Robo brain: Large-scale knowledge engine for robots
, 2014
"... Abstract-In this paper we introduce a knowledge engine, which learns and shares knowledge representations, for robots to carry out a variety of tasks. Building such an engine brings with it the challenge of dealing with multiple data modalities including symbols, natural language, haptic senses, ro ..."
Abstract
-
Cited by 6 (6 self)
- Add to MetaCart
(Show Context)
Abstract-In this paper we introduce a knowledge engine, which learns and shares knowledge representations, for robots to carry out a variety of tasks. Building such an engine brings with it the challenge of dealing with multiple data modalities including symbols, natural language, haptic senses, robot trajectories, visual features and many others. The knowledge stored in the engine comes from multiple sources including physical interactions that robots have while performing tasks (perception, planning and control), knowledge bases from WWW and learned representations from leading robotics research groups. We discuss various technical aspects and associated challenges such as modeling the correctness of knowledge, inferring latent information and formulating different robotic tasks as queries to the knowledge engine. We describe the system architecture and how it supports different mechanisms for users and robots to interact with the engine. Finally, we demonstrate its use in three important research areas: grounding natural language, perception, and planning, which are the key building blocks for many robotic tasks. This knowledge engine is a collaborative effort and we call it RoboBrain.
Whats cookin? interpreting cooking videos using text, speech and vision
- In North American Chapter of the Association for Computational Linguistics Human Language Technologies (NAACL HLT 2015), May 31 - June 5, 2015
, 2015
"... Abstract We present a novel method for aligning a sequence of instructions to a video of someone carrying out a task. In particular, we focus on the cooking domain, where the instructions correspond to the recipe. Our technique relies on an HMM to align the recipe steps to the (automatically genera ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
(Show Context)
Abstract We present a novel method for aligning a sequence of instructions to a video of someone carrying out a task. In particular, we focus on the cooking domain, where the instructions correspond to the recipe. Our technique relies on an HMM to align the recipe steps to the (automatically generated) speech transcript. We then refine this alignment using a state-of-the-art visual food detector, based on a deep convolutional neural network. We show that our technique outperforms simpler techniques based on keyword spotting. It also enables interesting applications, such as automatically illustrating recipes with keyframes, and searching within a video for events of interest.
KATARA: A Data Cleaning System Powered by Knowledge Bases and Crowdsourcing
"... Classical approaches to clean data have relied on using in-tegrity constraints, statistics, or machine learning. These approaches are known to be limited in the cleaning accu-racy, which can usually be improved by consulting master data and involving experts to resolve ambiguity. The advent of knowl ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
(Show Context)
Classical approaches to clean data have relied on using in-tegrity constraints, statistics, or machine learning. These approaches are known to be limited in the cleaning accu-racy, which can usually be improved by consulting master data and involving experts to resolve ambiguity. The advent of knowledge bases (kbs), both general-purpose and within enterprises, and crowdsourcing marketplaces are providing yet more opportunities to achieve higher accuracy at a larger scale. We propose Katara, a knowledge base and crowd powered data cleaning system that, given a table, a kb, and a crowd, interprets table semantics to align it with the kb, identifies correct and incorrect data, and generates top-k possible repairs for incorrect data. Experiments show that Katara can be applied to various datasets and kbs, and can efficiently annotate data and suggest possible repairs. 1.
Data X-Ray: A diagnostic tool for data errors.
- In SIGMOD Conference,
, 2015
"... ABSTRACT A lot of systems and applications are data-driven, and the correctness of their operation relies heavily on the correctness of their data. While existing data cleaning techniques can be quite effective at purging datasets of errors, they disregard the fact that a lot of errors are systemat ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
(Show Context)
ABSTRACT A lot of systems and applications are data-driven, and the correctness of their operation relies heavily on the correctness of their data. While existing data cleaning techniques can be quite effective at purging datasets of errors, they disregard the fact that a lot of errors are systematic, inherent to the process that produces the data, and thus will keep occurring unless the problem is corrected at its source. In contrast to traditional data cleaning, in this paper we focus on data diagnosis: explaining where and how the errors happen in a data generative process. We develop a large-scale diagnostic framework called DATAXRAY. Our contributions are three-fold. First, we transform the diagnosis problem to the problem of finding common properties among erroneous elements, with minimal domain-specific assumptions. Second, we use Bayesian analysis to derive a cost model that implements three intuitive principles of good diagnoses. Third, we design an efficient, highly-parallelizable algorithm for performing data diagnosis on large-scale data. We evaluate our cost model and algorithm using both real-world and synthetic data, and show that our diagnostic framework produces better diagnoses and is orders of magnitude more efficient than existing techniques.
On the Discovery of Evolving Truth
"... In the era of big data, information regarding the same objects can be collected from increasingly more sources. Unfortunately, there usually exist conflicts among the information coming from differ-ent sources. To tackle this challenge, truth discovery, i.e., to inte-grate multi-source noisy informa ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
(Show Context)
In the era of big data, information regarding the same objects can be collected from increasingly more sources. Unfortunately, there usually exist conflicts among the information coming from differ-ent sources. To tackle this challenge, truth discovery, i.e., to inte-grate multi-source noisy information by estimating the reliability of each source, has emerged as a hot topic. In many real world appli-cations, however, the information may come sequentially, and as a consequence, the truth of objects as well as the reliability of sources may be dynamically evolving. Existing truth discovery methods, unfortunately, cannot handle such scenarios. To address this prob-lem, we investigate the temporal relations among both object truths and source reliability, and propose an incremental truth discovery framework that can dynamically update object truths and source weights upon the arrival of new data. Theoretical analysis is pro-vided to show that the proposed method is guaranteed to converge at a fast rate. The experiments on three real world applications and a set of synthetic data demonstrate the advantages of the proposed method over state-of-the-art truth discovery methods.
Approximate Lifted Inference with Probabilistic Databases
"... This paper proposes a new approach for approximate evaluation of #P-hard queries with probabilistic databases. In our approach, every query is evaluated entirely in the database engine by evaluat-ing a fixed number of query plans, each providing an upper bound on the true probability, then taking th ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
(Show Context)
This paper proposes a new approach for approximate evaluation of #P-hard queries with probabilistic databases. In our approach, every query is evaluated entirely in the database engine by evaluat-ing a fixed number of query plans, each providing an upper bound on the true probability, then taking their minimum. We provide an algorithm that takes into account important schema information to enumerate only the minimal necessary plans among all possi-ble plans. Importantly, this algorithm is a strict generalization of all known results of PTIME self-join-free conjunctive queries: A query is safe if and only if our algorithm returns one single plan. We also apply three relational query optimization techniques to evaluate all minimal safe plans very fast. We give a detailed ex-perimental evaluation of our approach and, in the process, provide a new way of thinking about the value of probabilistic methods over non-probabilistic methods for ranking query answers. 1.
Open-world probabilistic databases. In:
- Proc. of KR’16.
, 2016
"... Introduction and Motivation Driven by the need to learn from vast amounts of text data, efforts throughout natural language processing, information extraction, databases. and AI are coming together to build large-scale knowledge bases. We propose an alternative semantics for probabilistic knowled ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
(Show Context)
Introduction and Motivation Driven by the need to learn from vast amounts of text data, efforts throughout natural language processing, information extraction, databases. and AI are coming together to build large-scale knowledge bases. We propose an alternative semantics for probabilistic knowledge bases to address these problems, which results in open-world PDBs (OpenPDBs). We show that OpenPDBs provide more meaningful answers. Finally, we pinpoint limitations of OpenPDBs and discuss ontology based data access (OBDA) as promising approach to further strengthen this framework.