Results 1 - 10
of
11
Automatic Annotation of Content-Rich HTML Documents: Structural and Semantic Analysis
- In Intl. Semantic Web Conf. (ISWC
, 2003
"... Abstract. Although RDF/XML has been widely recognized as the standard vehicle for representing semantic information on the Web, an enormous amount of semantic data is still being encoded in HTML documents that are designed primarily for human consumption and not directly amenable to machine processi ..."
Abstract
-
Cited by 30 (10 self)
- Add to MetaCart
Abstract. Although RDF/XML has been widely recognized as the standard vehicle for representing semantic information on the Web, an enormous amount of semantic data is still being encoded in HTML documents that are designed primarily for human consumption and not directly amenable to machine processing. This paper seeks to bridge this semantic gap by addressing the fundamental problem of automatically annotating HTML documents with semantic labels. Exploiting a key observation that semantically related items exhibit consistency in presentation style as well as spatial locality in template-based content-rich HTML documents, we have developed a novel framework for automatically partitioning such documents into semantic structures. Our framework tightly couples structural analysis of documents with semantic analysis incorporating domain ontologies and lexical databases such as WordNet. We present experimental evidence of the effectiveness of our techniques on a large collection of HTML documents from various news portals. 1
A realistic architecture for the semantic web
- In RuleML
, 2005
"... Harold.Boley AT nrc-cnrc.gc.ca Abstract. In this paper we argue that a realistic architecture for the Semantic Web must be based on multiple independent, but interoperable, stacks of languages. In particular, we argue that there is a very important class of rule-based languages, with over thirty yea ..."
Abstract
-
Cited by 19 (3 self)
- Add to MetaCart
Harold.Boley AT nrc-cnrc.gc.ca Abstract. In this paper we argue that a realistic architecture for the Semantic Web must be based on multiple independent, but interoperable, stacks of languages. In particular, we argue that there is a very important class of rule-based languages, with over thirty years of history and experience, which cannot be layered on top of OWL and must be included in the Semantic Web architecture alongside with the stack of OWL-based languages. The class of languages we are after includes rules in the Logic Programming style, which support default negation. We briefly survey the logical foundations of these languages and then discuss an interoperability framework in which such languages can co-exist with OWL and its extensions. 1
Bootstrapping semantic annotation for content-rich html documents
- In Intl. Conf. on Data Engineering (ICDE
, 2005
"... Enormous amount of semantic data is still being encoded in HTML documents. Identifying and annotating the semantic concepts implicit in such documents makes them directly amenable for Semantic Web processing. In this paper we describe a highly automated technique for annotating HTML documents, espec ..."
Abstract
-
Cited by 14 (8 self)
- Add to MetaCart
Enormous amount of semantic data is still being encoded in HTML documents. Identifying and annotating the semantic concepts implicit in such documents makes them directly amenable for Semantic Web processing. In this paper we describe a highly automated technique for annotating HTML documents, especially template-based content-rich documents, containing many different semantic concepts per document. Starting with a (small) seed of hand-labeled instances of semantic concepts in a set of HTML documents we bootstrap an annotation process that automatically identifies unlabeled concept instances present in other documents. The bootstrapping technique exploits the observation that semantically related items in content-rich documents exhibit consistency in presentation style and spatial locality to learn a statistical model for accurately identifying different semantic concepts in HTML documents drawn from a variety of Web sources. We also present experimental results on the effectiveness of the technique. 1
Model-directed Web Transactions under Constrained Modalities
, 2006
"... Online transactions (e.g., buying a book on the Web) typically involve a number of steps spanning several pages. Conducting such transactions under constrained interaction modalities as exemplified by small screen handhelds or interactive speech interfaces- the primary mode of communication for visu ..."
Abstract
-
Cited by 6 (4 self)
- Add to MetaCart
Online transactions (e.g., buying a book on the Web) typically involve a number of steps spanning several pages. Conducting such transactions under constrained interaction modalities as exemplified by small screen handhelds or interactive speech interfaces- the primary mode of communication for visually impaired individuals- is a strenuous, fatigue-inducing activity. But usually one needs to browse only a small fragment of a Web page to perform a transactional step such as a form fillout, selecting an item from a search results list, etc. We exploit this observation to develop an automata-based process model that delivers only the “relevant ” page fragments at each transactional step, thereby reducing information overload on such narrow interaction bandwidths. We realize this model by coupling techniques from content analysis of Web documents, automata learning and statistical classification. The process model and associated techniques have been incorporated into Guide-O, a prototype system that facilitates online transactions using speech/keyboard interface (Guide-O-Speech), or with limited-display size handhelds (Guide-O-Mobile). Performance of Guide-O and its user experience are reported.
Browsing Fatigue in Handhelds: Semantic Bookmarking Spells Relief
- In Intl. World Wide Web Conf. (WWW
, 2005
"... Focused Web browsing activities such as periodically looking up headline news, weather reports, etc., which require only selective fragments of particular Web pages, can be made more efficient for users of limited-display-size handheld mobile devices by delivering only the target fragments. Semantic ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
Focused Web browsing activities such as periodically looking up headline news, weather reports, etc., which require only selective fragments of particular Web pages, can be made more efficient for users of limited-display-size handheld mobile devices by delivering only the target fragments. Semantic bookmarks provide a robust conceptual framework for recording and retrieving such targeted content not only from the specific pages used in creating the bookmarks but also from any user-specified page with similar content semantics. This paper describes a technique for realizing semantic bookmarks by coupling machine learning with Web page segmentation to create a statistical model of the bookmarked content. These models are used to identify and retrieve the bookmarked content from Web pages that share a common content domain. In contrast to ontology-based approaches where semantic bookmarks are limited to available concepts in the ontology, the learning-based approach allows users to bookmark ad-hoc personalized semantic concepts to effectively target content that fits the limited display of handhelds. User evaluation measuring the effectiveness of a prototype implementation of learning-based semantic bookmarking at reducing browsing fatigue in handhelds is provided.
MWeb: a Principled Framework for Modular Web Rule Bases and its Semantics
"... Abstract. We present a principled framework for modular web rule bases, called MWeb. According to this framework, each predicate defined in a rule base is characterized by its defining reasoning mode, scope, and exporting rule base list. Each predicate used in a rule base is characterized by its req ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
Abstract. We present a principled framework for modular web rule bases, called MWeb. According to this framework, each predicate defined in a rule base is characterized by its defining reasoning mode, scope, and exporting rule base list. Each predicate used in a rule base is characterized by its requesting reasoning mode and importing rule base list. For legal MWeb modular rule bases S, the MWebAS and MWebWFS semantics of each rule base s ∈ S w.r.t. S are defined model-theoretically. These semantics extend the answer set semantics (AS) and the well-founded semantics with explicit negation (WFSX) on ELPs, respectively, keeping all of their semantical and computational characteristics. Our framework supports: (i) local semantics and different points of view, (ii) local closed-world and open-world assumptions, (iii) scoped negation-as-failure, (iv) restricted propagation of local inconsistencies, and (v) monotonicity of reasoning, for “fully shared ” predicates.
Incorporating defeasible knowledge and argumentative reasoning in web-based forms
- In Proc. of the 3rd Intl. Workshop on Intelligent Techniques for Web Personalization (ITWP 2005). 19th Intl. Joint Conf. in Artificial Intelligence (IJCAI 2005). Edimburgh, UK
, 2005
"... The notion of forms as a way of organizing and presenting data has long been used since the beginning of the WWW. Web-based forms have evolved together with the development of new markup languages (e.g., XML), in which it is possible to provide validation scripts as part of the form code in order to ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
The notion of forms as a way of organizing and presenting data has long been used since the beginning of the WWW. Web-based forms have evolved together with the development of new markup languages (e.g., XML), in which it is possible to provide validation scripts as part of the form code in order to test whether the intended meaning of the form is correct. However, for the form designer, part of this intended meaning involves frequently other features which are not constraints themselves, but rather attributes emerging from the form, which provide plausible conclusions in the context of incomplete and potentially inconsistent information. As the value of such attributes may change in presence of new knowledge, we call them defeasible attributes. In this paper we propose extending traditional web-based forms to incorporate defeasible attributes as part of the knowledge that can be encoded in a form. The proposed extension allows the specification of scripts for reasoning about form fields using a defeasible knowledge base, expressed in terms of a Defeasible Logic Program. 1
Automated Semantic Analysis of Schematic Data
"... Content in numerous Web data sources, designed primarily for human consumption, are not directly amenable to machine processing. Automated semantic analysis of such content facilitates their transformation into machine-processable and richly structured semantically annotated data. This paper describ ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Content in numerous Web data sources, designed primarily for human consumption, are not directly amenable to machine processing. Automated semantic analysis of such content facilitates their transformation into machine-processable and richly structured semantically annotated data. This paper describes a learning-based technique for semantic analysis of schematic data which are characterized by being template-generated from backend databases. Starting with a seed set of hand-labeled instances of semantic concepts in a set of Web pages, the technique learns statistical models of these concepts using light-weight content features. These models direct the annotation of diverse Web pages possessing similar content semantics. The principles behind the technique find application in information retrival and extraction problems. Focused Web browsing activities require only selective fragments of particular Web pages but are often performed using bookmarks which fetch the contents of the entire page. This results in information overload for users of constrained interaction modality devices such as small-screen handheld devices. Fine-grained information extraction from Web pages, which are typically performed using page specific and syntactic expressions known as wrappers, suffer from lack of scalability and robustness. We report on the application of our technique in developing semantic bookmarks for retrieving targeted browsing content and semantic wrappers for robust and scalable information extraction from Web pages sharing a semantic domain.
DEFEASIBLE REASONING IN WEB-BASED FORMS THROUGH ARGUMENTATION
"... The notion of forms as a way of organizing and presenting data has been used since the beginning of the World Wide Web. Web-based forms have evolved together with the development of new markup languages, in which it is possible to provide validation scripts as part of the form code to test whether t ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
The notion of forms as a way of organizing and presenting data has been used since the beginning of the World Wide Web. Web-based forms have evolved together with the development of new markup languages, in which it is possible to provide validation scripts as part of the form code to test whether the intended meaning of the form is correct. However, for the form designer, part of this intended meaning frequently involves other features which are not constraints by themselves, but rather attributes emerging from the form, which provide plausible conclusions in the context of incomplete and potentially inconsistent information. As the value of such attributes may change in presence of new knowledge, we call them defeasible attributes. In this paper we propose extending traditional web-based forms to incorporate defeasible attributes as part of the knowledge that can be encoded by the form designer. The proposed extension allows the specification of scripts for reasoning about form fields using a defeasible knowledge base, expressed in
HearSay: Enabling Audio Browsing on Hypertext Content
- In Intl. World Wide Web Conf. (WWW
, 2004
"... In this paper we present HearSay, a system for browsing hypertext Web documents via audio. The HearSay system is based on our novel approach to automatically creating audio browsable content from hypertext Web documents. It combines two key technologies: (1) automatic partitioning of Web documents t ..."
Abstract
- Add to MetaCart
In this paper we present HearSay, a system for browsing hypertext Web documents via audio. The HearSay system is based on our novel approach to automatically creating audio browsable content from hypertext Web documents. It combines two key technologies: (1) automatic partitioning of Web documents through tightly coupled structural and semantic analysis, which transforms raw HTML documents into semantic structures so as to facilitate audio browsing; and (2) VoiceXML, an already standardized technology which we adopt to represent voice dialogs automatically created from the XML output of partitioning. This paper describes the software components of HearSay and presents an initial system evaluation.

