Results 1 - 10
of
38
Learning block importance models for web pages
- In Intl. World Wide Web Conf. (WWW
, 2004
"... Some previous works show that a web page can be partitioned to multiple segments or blocks, and usually the importance of those blocks in a page is not equivalent. Also, it is proved that differentiating noisy or unimportant blocks from pages can facilitate web mining, search and accessibility. But ..."
Abstract
-
Cited by 45 (6 self)
- Add to MetaCart
Some previous works show that a web page can be partitioned to multiple segments or blocks, and usually the importance of those blocks in a page is not equivalent. Also, it is proved that differentiating noisy or unimportant blocks from pages can facilitate web mining, search and accessibility. But in these works, no uniform approach or model is presented to measure the importance of different portions in web pages. Through a user study, we found that people do have a consistent view about the importance of blocks in web pages. In this paper, we investigate how to find a model to automatically assign importance values to blocks in a web page. We define the block importance estimation as a learning problem. First, we use the VIPS (VIsion-based Page Segmentation) algorithm to partition a web page into semantic blocks with a hierarchy structure. Then spatial features (such as position, size) and content features (such as the number of images and links) are extracted to construct a feature vector for each block. Based on these features, learning algorithms, such as SVM and neural network, are applied to train various block importance models. In our experiments, the best model can achieve the performance with Micro-F1 79 % and Micro-Accuracy 85.9%, which is quite close to a person’s.
Hierarchical clustering of WWW image search results using visual, textual and link analysis
- ACM Multimedia
, 1016
"... We consider the problem of clustering Web image search results. Generally, the image search results returned by an image search engine contain multiple topics. Organizing the results into different semantic clusters facilitates users ’ browsing. In this paper, we propose a hierarchical clustering me ..."
Abstract
-
Cited by 39 (2 self)
- Add to MetaCart
We consider the problem of clustering Web image search results. Generally, the image search results returned by an image search engine contain multiple topics. Organizing the results into different semantic clusters facilitates users ’ browsing. In this paper, we propose a hierarchical clustering method using visual, textual and link analysis. By using a vision-based page segmentation algorithm, a web page is partitioned into blocks, and the textual and link information of an image can be accurately extracted from the block containing that image. By using block-level link analysis techniques, an image graph can be constructed. We then apply spectral techniques to find a Euclidean embedding of the images which respects the graph structure. Thus for each image, we have three kinds of representations, i.e. visual feature based representation, textual feature based representation and graph based representation. Using spectral clustering techniques, we can cluster the search results into different semantic clusters. An image search example illustrates the potential of these techniques.
Collapse-to-Zoom: Viewing Web Pages on Small Screen Devices by Interactively Removing Irrelevant Content
, 2004
"... Overview visualizations for small-screen web browsers were designed to provide users with visual context and to allow them to rapidly zoom in on tiles of relevant content. Given that content in the overview is reduced, however, users are often unable to tell which tiles hold the relevant material, w ..."
Abstract
-
Cited by 31 (1 self)
- Add to MetaCart
Overview visualizations for small-screen web browsers were designed to provide users with visual context and to allow them to rapidly zoom in on tiles of relevant content. Given that content in the overview is reduced, however, users are often unable to tell which tiles hold the relevant material, which can force them to adopt a time-consuming hunt-and-peck strategy. Collapse-to-zoom addresses this issue by offering an alternative exploration strategy. In addition to allowing users to zoom into relevant areas, collapse -to-zoom allows users to collapse areas deemed irrelevant, such as columns containing menus, archive material, or advertising. Collapsing content causes all remaining content to expand in size causing it to reveal more detail, which increases the user's chance of identifying relevant content. Collapse-to-zoom navigation is based on a hybrid between a marquee selection tool and a marking menu, called marquee menu. It offers four commands for collapsing content areas at different granularities and to switch to a full-size reading view of what is left of the page.
Block-level Link Analysis
- In SIGIR
, 2004
"... Link Analysis has shown great potential in improving the performance of web search. PageRank and HITS are two of the most popular algorithms. Most of the existing link analysis algorithms treat a web page as a single node in the web graph. However, in most cases, a web page contains multiple semanti ..."
Abstract
-
Cited by 30 (4 self)
- Add to MetaCart
Link Analysis has shown great potential in improving the performance of web search. PageRank and HITS are two of the most popular algorithms. Most of the existing link analysis algorithms treat a web page as a single node in the web graph. However, in most cases, a web page contains multiple semantics and hence the web page might not be considered as the atomic node. In this paper, the web page is partitioned into blocks using the visionbased page segmentation algorithm. By extracting the page-toblock, block-to-page relationships from link structure and page layout analysis, we can construct a semantic graph over the WWW such that each node exactly represents a single semantic topic. This graph can better describe the semantic structure of the web. Based on block-level link analysis, we proposed two new algorithms, Block Level PageRank and Block Level HITS, whose performances we study extensively using web data.
Blockbased web search
- In ACM SIGIR Conference
, 2004
"... Multiple-topic and varying-length of web pages are two negative factors significantly affecting the performance of web search. In this paper, we explore the use of page segmentation algorithms to partition web pages into blocks and investigate how to take advantage of block-level evidence to improve ..."
Abstract
-
Cited by 19 (7 self)
- Add to MetaCart
Multiple-topic and varying-length of web pages are two negative factors significantly affecting the performance of web search. In this paper, we explore the use of page segmentation algorithms to partition web pages into blocks and investigate how to take advantage of block-level evidence to improve retrieval performance in the web context. Because of the special characteristics of web pages, different page segmentation method will have different impact on web search performance. We compare four types of methods, including fixed-length page segmentation, DOM-based page segmentation, vision-based page segmentation, and a combined method which integrates both semantic and fixed-length properties. Experiments on block-level query expansion and retrieval are performed. Among the four approaches, the combined method achieves the best performance for web search. Our experimental results also show that such a semantic partitioning of web pages effectively deals with the problem of multiple drifting topics and mixed lengths, and thus has great potential to boost up the performance of current web search engines.
Multi-model similarity propagation and its application for web image retrieval
- In Proc. ACM Multimedia
, 2004
"... In this paper, we propose an iterative similarity propagation approach to explore the inter-relationships between Web images and their textual annotations for image retrieval. By considering Web images as one type of objects, their surrounding texts as another type, and constructing the links struct ..."
Abstract
-
Cited by 15 (1 self)
- Add to MetaCart
In this paper, we propose an iterative similarity propagation approach to explore the inter-relationships between Web images and their textual annotations for image retrieval. By considering Web images as one type of objects, their surrounding texts as another type, and constructing the links structure between them via webpage analysis, we can iteratively reinforce the similarities between images. The basic idea is that if two objects of the same type are both related to one object of another type, these two objects are similar; likewise, if two objects of the same type are related to two different, but similar objects of another type, then to some extent, these two objects are also similar. The goal of our method is to fully exploit the mutual reinforcement between images and their textual annotations. Our experiments based on 10,628 images crawled from the Web show that our proposed approach can significantly improve Web image retrieval performance.
A probabilistic semantic model for image annotation and multi-modal image retrieval
- IN PROC INT’L CONF COMPUTER VISION
, 2005
"... This paper addresses automatic image annotation problem and its application to multi-modal image retrieval. The contribution of our work is three-fold. (1) We propose a probabilistic semantic model in which the visual features and the textual words are connected via a hidden layer which constitutes ..."
Abstract
-
Cited by 11 (1 self)
- Add to MetaCart
This paper addresses automatic image annotation problem and its application to multi-modal image retrieval. The contribution of our work is three-fold. (1) We propose a probabilistic semantic model in which the visual features and the textual words are connected via a hidden layer which constitutes the semantic concepts to be discovered to explicitly exploit the synergy among the modalities. (2) The association of visual features and textual words is determined in a Bayesian framework such that the confidence of the association can be provided. (3) Extensive evaluation on a large-scale, visually and semantically diverse image collection crawled from Web is reported to evaluate the prototype system based on the model. In the proposed probabilistic model, a hidden concept layer which connects the visual feature and the word layer is discovered by fitting a generative model to the training image and annotation words through an Expectation-Maximization (EM) based iterative learning procedure. The evaluation of the prototype system on 17,000 images and 7,736 automatically extracted annotation words from crawled Web pages for multi-modal image retrieval has indicated that the proposed semantic model and the developed Bayesian framework are superior to a state-of-the-art peer system in the literature.
Data-Driven Approach for Bridging the Cognitive Gap
- in Image Retrieval. Procs. IEEE ICME 2004
, 2004
"... Bridging the cognitive gap in image retrieval has been an active research direction in recent years. Existing solutions typically require a large volume of training data that could be difficult to obtain in practice. In this paper, we propose a data-driven approach that uses Web images and their sur ..."
Abstract
-
Cited by 9 (2 self)
- Add to MetaCart
Bridging the cognitive gap in image retrieval has been an active research direction in recent years. Existing solutions typically require a large volume of training data that could be difficult to obtain in practice. In this paper, we propose a data-driven approach that uses Web images and their surrounding textual annotations as the source of training data to bridge the cognitive gap. We construct an image thesaurus that contains a set of codewords, each representing a semantically related subspace in the feature space. We also explore the use of query expansion based on the constructed image thesaurus for improving image retrieval performance. 1.
Efficient browsing of web search results on mobile devices based on block importance model
- Proc. Pervasive Computing and Communications, IEEE
, 2005
"... It is expected that more and more people will search the web when they are on the move. Though conventional search engines can be directly visited from mobile devices with web browsing capabilities, the information is not as conveniently accessible from a handheld device as it is from desktops. Exis ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
It is expected that more and more people will search the web when they are on the move. Though conventional search engines can be directly visited from mobile devices with web browsing capabilities, the information is not as conveniently accessible from a handheld device as it is from desktops. Existing information discovery mechanisms for searching the web are not well-suited to mobile devices. In this paper, a block importance model is employed to assign importance values to different segments of a web page, in order to extract and present more condensed search results to mobile users. Based on the block importance model, three presentations for displaying the result pages in different levels of detail have been designed to reduce both the number of user interactions and the overall search time. A set of user study experiments have been carried out to compare the three presentations and a commercial service on typical mobile devices. Experimental results show that our approaches can help users to explore web search results more efficiently. 1.
ImageSeer: Clustering and Searching WWW Images Using Link and Page Layout Analysis”, Microsoft
, 2004
"... Due to the rapid growth of the number of digital images on the Web, there is an increasing demand for effective and efficient method for organizing and retrieving the images available. This paper describes iFind, a system for cluster-ing and searching WWW images. By using a vision-based page segment ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
Due to the rapid growth of the number of digital images on the Web, there is an increasing demand for effective and efficient method for organizing and retrieving the images available. This paper describes iFind, a system for cluster-ing and searching WWW images. By using a vision-based page segmentation algorithm, a web page is partitioned into blocks, and the textual and link information of an image can be accurately extracted from the block containing that image. The textual information is used for image representation. By extracting the page-to-block, block-to-image, block-to-page relationships through link structure and page layout analysis, we construct an image graph. Our method is less sensitive to noisy links than previous methods like PicASHOW, and hence the image graph can better reflect the semantic relationship between images. Using the notion of Markov Chain, we can infer the semantic de-grees of the images, i.e. ImageRanks, which characterize the importance of the images. The ImageRanks are com-bined with the relevance scores to produce the final ranking for image search. With the graph models, we can also use techniques from spectral graph theory for image clustering and embedding, or 2-D visualization. Some experi-mental results on 11.6 million images downloaded from the Web are provided in the paper.

