Results 1 -
5 of
5
Finding a minimal tree pattern under neighborhood constraints
- In PODS
, 2011
"... Tools that automatically generate queries are useful when schemas are hard to understand due to size or complexity. Usually, these tools find minimal tree patterns that contain a given set (or bag) of labels. The labels could be, for example, XML tags or relation names. The only restriction is that, ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
(Show Context)
Tools that automatically generate queries are useful when schemas are hard to understand due to size or complexity. Usually, these tools find minimal tree patterns that contain a given set (or bag) of labels. The labels could be, for example, XML tags or relation names. The only restriction is that, in a tree pattern, adjacent labels must be among some specified pairs. A more expressive framework is developed here, where a schema is a mapping of each label to a collection of bags of labels. A tree pattern conforms to the schema if for all nodes v, the bag comprising the labels of the neighbors is contained in one of the bags to which the label of v is mapped. The problem at hand is to find a minimal tree pattern that conforms to the schema and contains a given bag of labels. This problem is NP-hard even when using the simplest conceivable language for describing schemas. In practice, however, the set of labels is small, so efficiency is realized by means of an algorithm that is fixed-parameter tractable (FPT). Two languages for specifying schemas are discussed. In the first, one expresses pairwise mutual exclusions between labels. Though W[1]-hardness (hence, unlikeliness of an FPT algorithm) is shown, an FPT algorithm is described for the case where the mutual exclusions form a circular-arc graph (e.g., disjoint cliques). The second language is that of regular expres-sions, and for that another FPT algorithm is described.
Entity-Centric Search For Enterprise Services
"... Abstract. The consumption of APIs, such as Enterprise Services (ESs) in an enterprise Service-Oriented Architecture (eSOA), has largely been a task for experienced developers. With the rapidly growing number of such (Web)APIs, users with little or no experience in a given API face the problem of fin ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract. The consumption of APIs, such as Enterprise Services (ESs) in an enterprise Service-Oriented Architecture (eSOA), has largely been a task for experienced developers. With the rapidly growing number of such (Web)APIs, users with little or no experience in a given API face the problem of finding relevant API operations – e.g., mashups developers. However, building an effective search has been a challenge: Information Retrieval (IR) methods struggle with the brevity of text in API descriptions, whereas semantic search technologies require domain ontologies and formal queries. Motivated by the search behavior of users, we propose an iterative keyword search based on entities. The entities are part of a knowledge base, whose content stems from model-driven engineering. We implemented our approach and conducted a user study showing significant improvements in search effectiveness. 1
Lenses: An On-Demand Approach to ETL
"... Three mentalities have emerged in analytics. One view holds that reliable analytics is impossible without high-quality data, and relies on heavy-duty ETL processes and upfront data curation to provide it. The second view takes a more ad-hoc approach, collecting data into a data lake, and plac-ing re ..."
Abstract
- Add to MetaCart
(Show Context)
Three mentalities have emerged in analytics. One view holds that reliable analytics is impossible without high-quality data, and relies on heavy-duty ETL processes and upfront data curation to provide it. The second view takes a more ad-hoc approach, collecting data into a data lake, and plac-ing responsibility for data quality on the analyst querying it. A third, on-demand approach has emerged over the past decade in the form of numerous systems like Paygo or HLog, which allow for incremental curation of the data and help analysts to make principled trade-offs between data quality and effort. Though quite useful in isolation, these systems target only specific quality problems (e.g., Paygo targets only schema matching and entity resolution). In this paper, we explore the design of a general, extensible infrastructure for on-demand curation that is based on probabilistic query processing. We illustrate its generality through examples and show how such an infrastructure can be used to grace-fully make existing ETL workflows “on-demand”. Finally, we present a user interface for On-Demand ETL and address ensuing challenges, including that of efficiently ranking po-tential data curation tasks. Our experimental results show that On-Demand ETL is feasible and that our greedy rank-ing strategy for curation tasks, called CPI, is effective. 1.
Wearable Queries: Adapting Common Retrieval Needs to Data and Users
"... The wealth of information generated by users interacting with the network and its applications is often under-utilized due to complications in accessing heterogeneous and dy-namic data and retrieving relevant information from sources having possibly unknown formats and structures. Process-ing comple ..."
Abstract
- Add to MetaCart
(Show Context)
The wealth of information generated by users interacting with the network and its applications is often under-utilized due to complications in accessing heterogeneous and dy-namic data and retrieving relevant information from sources having possibly unknown formats and structures. Process-ing complex requests on such information sources can, thus, be costly, though not guaranteeing user satisfaction. Fur-thermore, dynamic contexts prevent substantial user involve-ment in the interpretation of the request. The paper envisions an innovative solution to process the above mentioned requests, limiting user involvement by ex-ploiting information on: (a) user context (geo-location, in-terests, needs); (b) data and processing quality; (c) similar requests repeated over time. By interpreting a request in a novel way by means of a Wearable Query (WQ), i.e., a query that captures the user and request specificities, we envision a methodological and technological solution for WQs in the presence of repeated information needs in distributed, het-erogeneous, dynamic environments, with emphasis on the geo-spatial dimension and on data quality. 1.
A Personal Perspective on Keyword Search over Data Graphs *
"... ABSTRACT Theoretical and practical issues pertaining to keyword search over data graphs are discussed. A formal model and algorithms for enumerating answers (by operating directly on the data graph) are described. Various aspects of a system are explained, including the object-connector-property da ..."
Abstract
- Add to MetaCart
(Show Context)
ABSTRACT Theoretical and practical issues pertaining to keyword search over data graphs are discussed. A formal model and algorithms for enumerating answers (by operating directly on the data graph) are described. Various aspects of a system are explained, including the object-connector-property data model, how it is used to construct a data graph from an XML document, how to deal with redundancies in the source data, what are duplicate answers, implementation and GUI. An approach to ranking that combines textual relevance with semantic considerations is described. It is argued that search over data graphs is inherently a two-dimensional process, where the goal is not just to find particular content but also to collect information on how the desired data may be semantically connected.