Results 1 - 10
of
22
Detecting Web Page Structure for Adaptive Viewing on Small Form Factor Devices
- In Intl. World Wide Web Conf. (WWW
, 2003
"... Mobile devices have already been widely used to access the Web. However, because most available web pages are designed for desktop PC in mind, it is inconvenient to browse these large web pages on a mobile device with a small screen. In this paper, we propose a new browsing convention to facilitate ..."
Abstract
-
Cited by 73 (5 self)
- Add to MetaCart
Mobile devices have already been widely used to access the Web. However, because most available web pages are designed for desktop PC in mind, it is inconvenient to browse these large web pages on a mobile device with a small screen. In this paper, we propose a new browsing convention to facilitate navigation and reading on a small-form-factor device. A web page is organized into a two level hierarchy with a thumbnail representation at the top level for providing a global view and index to a set of subpages at the bottom level for detail information. A page adaptation technique is also developed to analyze the structure of an existing web page and split it into small and logically related units that fit into the screen of a mobile device. For a web page not suitable for splitting, auto-positioning or scrolling-by-block is used to assist the browsing as an alterative. Our experimental results show that our proposed browsing convention and developed page adaptation scheme greatly improve the user’s browsing experiences on a device with a small display.
Improving pseudo-relevance feedback in web information retrieval using web page segmentation
- In Intl. World Wide Web Conf. (WWW
, 2003
"... In contrast to traditional document retrieval, a web page as a whole is not a good information unit to search because it often contains multiple topics and a lot of irrelevant information from navigation, decoration, and interaction part of the page. In this paper, we propose a VIsion-based Page Seg ..."
Abstract
-
Cited by 56 (10 self)
- Add to MetaCart
In contrast to traditional document retrieval, a web page as a whole is not a good information unit to search because it often contains multiple topics and a lot of irrelevant information from navigation, decoration, and interaction part of the page. In this paper, we propose a VIsion-based Page Segmentation (VIPS) algorithm to detect the semantic content structure in a web page. Compared with simple DOM based segmentation method, our page segmentation scheme utilizes useful visual cues to obtain a better partition of a page at the semantic level. By using our VIPS algorithm to assist the selection of query expansion terms in pseudo-relevance feedback in web information retrieval, we achieve 27 % performance improvement on Web Track dataset.
Learning block importance models for web pages
- In Intl. World Wide Web Conf. (WWW
, 2004
"... Some previous works show that a web page can be partitioned to multiple segments or blocks, and usually the importance of those blocks in a page is not equivalent. Also, it is proved that differentiating noisy or unimportant blocks from pages can facilitate web mining, search and accessibility. But ..."
Abstract
-
Cited by 45 (6 self)
- Add to MetaCart
Some previous works show that a web page can be partitioned to multiple segments or blocks, and usually the importance of those blocks in a page is not equivalent. Also, it is proved that differentiating noisy or unimportant blocks from pages can facilitate web mining, search and accessibility. But in these works, no uniform approach or model is presented to measure the importance of different portions in web pages. Through a user study, we found that people do have a consistent view about the importance of blocks in web pages. In this paper, we investigate how to find a model to automatically assign importance values to blocks in a web page. We define the block importance estimation as a learning problem. First, we use the VIPS (VIsion-based Page Segmentation) algorithm to partition a web page into semantic blocks with a hierarchy structure. Then spatial features (such as position, size) and content features (such as the number of images and links) are extracted to construct a feature vector for each block. Based on these features, learning algorithms, such as SVM and neural network, are applied to train various block importance models. In our experiments, the best model can achieve the performance with Micro-F1 79 % and Micro-Accuracy 85.9%, which is quite close to a person’s.
Extracting content structure for web pages based on visual representation
- Proc.5 th Asia Pacific Web Conference
, 2003
"... Abstract. A new web content structure based on visual representation is proposed in this paper. Many web applications such as information retrieval, information extraction and automatic page adaptation can benefit from this structure. This paper presents an automatic top-down, tag-tree independent a ..."
Abstract
-
Cited by 37 (6 self)
- Add to MetaCart
Abstract. A new web content structure based on visual representation is proposed in this paper. Many web applications such as information retrieval, information extraction and automatic page adaptation can benefit from this structure. This paper presents an automatic top-down, tag-tree independent approach to detect web content structure. It simulates how a user understands web layout structure based on his visual perception. Comparing to other existing techniques, our approach is independent to underlying documentation representation such as HTML and works well even when the HTML structure is far different from layout structure. Experiments show satisfactory results. 1
Web-page Classification through Summarization
, 2004
"... Web-page classification is much more difficult than pure-text classification due to a large variety of noisy information embedded in Web pages. In this paper, we propose a new Webpage classification algorithm based on Web summarization for improving the accuracy. We first give empirical evidence tha ..."
Abstract
-
Cited by 18 (2 self)
- Add to MetaCart
Web-page classification is much more difficult than pure-text classification due to a large variety of noisy information embedded in Web pages. In this paper, we propose a new Webpage classification algorithm based on Web summarization for improving the accuracy. We first give empirical evidence that ideal Web-page summaries generated by human editors can indeed improve the performance of Web-page classification algorithms. We then propose a new Web summarization-based classification algorithm and evaluate it along with several other state-of-the-art text summarization algorithms on the LookSmart Web directory. Experimental results show that our proposed summarization-based classification algorithm achieves an approximately 8.8 % improvement as compared to pure-text-based classification algorithm. We further introduce an ensemble classifier using the improved summarization algorithm and show that it achieves about 12.9 % improvement over pure-text based methods.
Visual Based Content Understanding towards Web Adaptation
- In Second International Conference on Adaptive Hypermedia and Adaptive Web-based Systems (AH2002
, 2002
"... Web content structure is proposed to facilitate automatic web page adaptation in this paper. By identifying the logic relationship of web content based on layout information, web content structure effectively represents authors ' presentation intention. An automatic top-down, tag-tree independent ..."
Abstract
-
Cited by 17 (3 self)
- Add to MetaCart
Web content structure is proposed to facilitate automatic web page adaptation in this paper. By identifying the logic relationship of web content based on layout information, web content structure effectively represents authors ' presentation intention. An automatic top-down, tag-tree independent approach to detect web content structure is presented. It simulates how a user understands web layout structure based on his vision. Comparing to other content analysis techniques, our approach is independent to physical realization and works well even when the physical structure is far different from layout structure.
Staging Transformations for Multimodal Web Interaction Management
, 2004
"... Multimodal interfaces are becoming increasingly ubiquitous with the advent of mobile devices, accessibility considerations, and novel software technologies that combine diverse interaction media. In addition to improving access and delivery capabilities, such interfaces enable flexible and personali ..."
Abstract
-
Cited by 13 (11 self)
- Add to MetaCart
Multimodal interfaces are becoming increasingly ubiquitous with the advent of mobile devices, accessibility considerations, and novel software technologies that combine diverse interaction media. In addition to improving access and delivery capabilities, such interfaces enable flexible and personalized dialogs with websites, much like a conversation between humans. In this paper, we present a software framework for multimodal web interaction management that supports mixed-initiative dialogs between users and websites. A mixed-initiative dialog is one where the user and the website take turns changing the flow of interaction. The framework supports the functional specification and realization of such dialogs using staging transformations -- a theory for representing and reasoning about dialogs based on partial input. It supports multiple interaction interfaces, and offers sessioning, caching, and co-ordination functions through the use of an interaction manager. Two case studies are presented to illustrate the promise of this approach.
Efficient browsing of web search results on mobile devices based on block importance model
- Proc. Pervasive Computing and Communications, IEEE
, 2005
"... It is expected that more and more people will search the web when they are on the move. Though conventional search engines can be directly visited from mobile devices with web browsing capabilities, the information is not as conveniently accessible from a handheld device as it is from desktops. Exis ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
It is expected that more and more people will search the web when they are on the move. Though conventional search engines can be directly visited from mobile devices with web browsing capabilities, the information is not as conveniently accessible from a handheld device as it is from desktops. Existing information discovery mechanisms for searching the web are not well-suited to mobile devices. In this paper, a block importance model is employed to assign importance values to different segments of a web page, in order to extract and present more condensed search results to mobile users. Based on the block importance model, three presentations for displaying the result pages in different levels of detail have been designed to reduce both the number of user interactions and the overall search time. A set of user study experiments have been carried out to compare the three presentations and a commercial service on typical mobile devices. Experimental results show that our approaches can help users to explore web search results more efficiently. 1.
Building a Web Thesaurus from Web Link Structure
, 2003
"... Thesaurus has been widely used in many applications, including information retrieval, natural language processing, and question answering. In this paper, we propose a novel approach to automatically constructing a domain-specific thesaurus from the Web using link structure information. The proposed ..."
Abstract
-
Cited by 6 (2 self)
- Add to MetaCart
Thesaurus has been widely used in many applications, including information retrieval, natural language processing, and question answering. In this paper, we propose a novel approach to automatically constructing a domain-specific thesaurus from the Web using link structure information. The proposed approach is able to identify new terms and reflect the latest relationship between terms as the Web evolves. First, a set of high quality and representative websites of a specific domain is selected. After filtering out navigational links, link analysis is applied to each website to obtain its content structure. Finally, the thesaurus is constructed by merging the content structures of the selected websites. The experimental results on automatic query expansion based on our constructed thesaurus show 20% improvement in search precision compared to the baseline.
Learning Important Models for Web Page Blocks based on Layout and Content Analysis
- ACM SIGKDD Explorations Newsletter
, 2004
"... Previous work shows that a web page can be partitioned into multiple segments or blocks, and often the importance of those blocks in a page is not equivalent. It has also been proven that differentiating noisy and unimportant blocks from pages can facilitate web mining, search and accessibility. How ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
Previous work shows that a web page can be partitioned into multiple segments or blocks, and often the importance of those blocks in a page is not equivalent. It has also been proven that differentiating noisy and unimportant blocks from pages can facilitate web mining, search and accessibility. However, no uniform approach and model has been presented to measure the importance of different blocks in a web page. Through a user study, we found that people do have a consistent view about the importance of blocks in a web page. Thus, we investigate how to find a model to automatically assign importance values to blocks in a web page. We formulate the block importance estimation as a learning problem. First, we use a vision-based page segmentation technique to partition a web page into semantic blocks with a hierarchical structure. Then spatial features (such as position and size) and content features (such as the number of images and links) are extracted to construct a feature vector for each block. Then, learning algorithms are used to train a model to assign importance to each block in the web page. In our experiments, the best model can achieve the performance with Micro-F1 80.2 % and Micro-

