Results 1 - 10
of
113
Web Mining in Soft Computing Framework: Relevance, State of the Art and Future Directions
- IEEE Transactions on Neural Networks
, 2002
"... This paper summarizes the different characteristics of web data, the basic components of web mining and its different types, and their current states of the art. The reason for considering web mining, a separate field from data mining, is explained. The limitations of some of the existing web mining ..."
Abstract
-
Cited by 43 (2 self)
- Add to MetaCart
This paper summarizes the different characteristics of web data, the basic components of web mining and its different types, and their current states of the art. The reason for considering web mining, a separate field from data mining, is explained. The limitations of some of the existing web mining methods and tools are enunciated, and the significance of soft computing (comprising fuzzy logic (FL), artificial neural networks (ANNs), genetic algorithms (GAs), and rough sets (RSs) highlighted. A survey of the existing literature on "soft web mining" is provided along with the commercially available systems. The prospective areas of web mining where the application of soft computing needs immediate attention are outlined with justification. Scope for future research in developing "soft web mining" systems is explained. An extensive bibliography is also provided.
Efficient phrase-based document indexing for Web document clustering
- IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING
, 2004
"... Document clustering techniques mostly rely on single term analysis of the document data set, such as the Vector Space Model. To achieve more accurate document clustering, more informative features including phrases and their weights are particularly important in such scenarios. Document clustering ..."
Abstract
-
Cited by 31 (1 self)
- Add to MetaCart
Document clustering techniques mostly rely on single term analysis of the document data set, such as the Vector Space Model. To achieve more accurate document clustering, more informative features including phrases and their weights are particularly important in such scenarios. Document clustering is particularly useful in many applications such as automatic categorization of documents, grouping search engine results, building a taxonomy of documents, and others. This paper presents two key parts of successful document clustering. The first part is a novel phrase-based document index model, the Document Index Graph, which allows for incremental construction of a phrase-based index of the document set with an emphasis on efficiency, rather than relying on single-term indexes only. It provides efficient phrase matching that is used to judge the similarity between documents. The model is flexible in that it could revert to a compact representation of the vector space model if we choose not to index phrases. The second part is an incremental document clustering algorithm based on maximizing the tightness of clusters by carefully watching the pair-wise document similarity distribution inside clusters. The combination of these two components creates an underlying model for robust and accurate document similarity calculation that leads to much improved results in Web document clustering over traditional methods.
Toward a basic framework for webometrics
- Journal of the American Society for Information Science and Technology
, 2004
"... In this article, we define webometrics within the framework of informetric studies and bibliometrics, as belonging to library and information science, and as associated with cybermetrics as a generic subfield. We develop a consistent and detailed link typology and terminology and make explicit the d ..."
Abstract
-
Cited by 20 (1 self)
- Add to MetaCart
In this article, we define webometrics within the framework of informetric studies and bibliometrics, as belonging to library and information science, and as associated with cybermetrics as a generic subfield. We develop a consistent and detailed link typology and terminology and make explicit the distinction among different Web node levels when using the proposed conceptual framework. As a consequence, we propose a novel diagram notation to fully appreciate and investigate link structures between Web nodes in webometric analyses. We warn against taking the analogy between citation analyses and link analyses too far.
Design and evaluation of a multi-agent collaborative Web Mining System
, 2003
"... Most existing Web search tools work only with individual users and do not help a user benefit from previous search experiences of others. In this paper, we present the Collaborative Spider, a multi-agent system designed to provide post-retrieval analysis and enable across-user collaboration in Web s ..."
Abstract
-
Cited by 18 (8 self)
- Add to MetaCart
Most existing Web search tools work only with individual users and do not help a user benefit from previous search experiences of others. In this paper, we present the Collaborative Spider, a multi-agent system designed to provide post-retrieval analysis and enable across-user collaboration in Web search and mining. This system allows the user to annotate search sessions and share them with other users. We also report a user study designed to evaluate the effectiveness of this system. Our experimental findings show that subjects' search performance was degraded, compared to individual search scenarios in which users had no access to previous searches, when they had access to a limited number (e.g., 1 or 2) of earlier search sessions done by other users. However, search performance improved significantly when subjects had access to more search sessions. This indicates that gain from collaboration through collaborative Web searching and analysis does not outweigh the overhead of browsing and comprehending other users' past searches until a certain number of shared sessions have been reached. In this paper, we also catalog and analyze several different types of user collaboration behavior observed in the context of Web mining. D 2002 Elsevier Science B.V. All rights reserved.
Web Page Classification: Features and Algorithms
, 2007
"... Classification of web page content is essential to many tasks in web information retrieval such as maintaining web directories and focused crawling. The uncontrolled nature of web content presents additional challenges to web page classification as compared to traditional text classification, but th ..."
Abstract
-
Cited by 16 (0 self)
- Add to MetaCart
Classification of web page content is essential to many tasks in web information retrieval such as maintaining web directories and focused crawling. The uncontrolled nature of web content presents additional challenges to web page classification as compared to traditional text classification, but the interconnected nature of hypertext also provides features that can assist the process. As we review work in web page classification, we note the importance of these web-specific features and algorithms, describe state-of-the-art practices, and track the underlying assumptions behind the use of information from neighboring pages. 1
Web Usage Mining - Languages and Algorithms
- In Studies in Classification, Data Analysis, and Knowledge Organization
, 2001
"... We propose two new XML applications, XGMML and LOGML. XGMML is a graph description language and LOGML is a web-log report description language. We generate a web graph in XGMML format for a web site using the web robot of the WWWPal system (developed for web visualization and organization). We gener ..."
Abstract
-
Cited by 15 (2 self)
- Add to MetaCart
We propose two new XML applications, XGMML and LOGML. XGMML is a graph description language and LOGML is a web-log report description language. We generate a web graph in XGMML format for a web site using the web robot of the WWWPal system (developed for web visualization and organization). We generate web-log reports in LOGML format for a web site from web log files and the web graph. In this paper, we further illustrate the usefulness of these two XML applications with a web data mining example. Moreover, we show the simplicity with which this mining algorithm can be specified and implemented efficiently using our two XML applications. We provide sample results, namely frequent patterns of users in a web site, with our web data mining algorithm.
Web Log Data Warehousing and Mining for Intelligent Web Caching
, 2001
"... We introduce intelligent web caching algorithms that employ predictive models of web requests; the general idea is to extend the LRU policy of web and proxy servers by making it sensible to web access models extracted from web log data using data mining techniques. Two approaches have been studied i ..."
Abstract
-
Cited by 15 (1 self)
- Add to MetaCart
We introduce intelligent web caching algorithms that employ predictive models of web requests; the general idea is to extend the LRU policy of web and proxy servers by making it sensible to web access models extracted from web log data using data mining techniques. Two approaches have been studied in particular, frequent patterns and decision trees. The experimental results of the new algorithms show substantial improvement over existing LRU-based caching techniques, in terms of hit rate. We designed and developed a prototypical system, which supports data warehousing of web log data, extraction of data mining models and simulation of the web caching algorithms.
Analysis of the query logs of a web site search engine
- Journal of the American Society for Information Science and Technology
, 2005
"... A large number of studies have investigated the transaction log of general-purpose search engines such as Excite and AltaVista, but few studies have reported on the analysis of search logs for search engines that are limited to particular Web sites, namely, Web site search engines. In this article, ..."
Abstract
-
Cited by 14 (1 self)
- Add to MetaCart
A large number of studies have investigated the transaction log of general-purpose search engines such as Excite and AltaVista, but few studies have reported on the analysis of search logs for search engines that are limited to particular Web sites, namely, Web site search engines. In this article, we report our research on analyzing the search logs of the search engine of the Utah state government Web site. Our results show that some statistics, such as the number of search terms per query, of Web users are the same for general-purpose search engines and Web site search engines, but others, such as the search topics and the terms used, are considerably different. Possible reasons for the differences include the focused domain of Web site search engines and users ’ different information needs. The findings are useful for Web site developers to improve the performance of their services provided on the Web and for researchers to conduct further research in this area. The analysis also can be applied in e-government research by investigating how information should be delivered to users in government Web sites.
An Online Recommender System for Large Web Sites
- in Proc. of ACM/IEEE Web Intelligence Conference (WI’04
, 2004
"... In this paper we propose a WUM recommender system, called SUGGEST 3.0, that dynamically generates links to pages that have not yet been visited by a user and might be of his potential interest. Differently from the recommender systems proposed so far, SUGGEST 3.0 does not make use of any off-line co ..."
Abstract
-
Cited by 11 (1 self)
- Add to MetaCart
In this paper we propose a WUM recommender system, called SUGGEST 3.0, that dynamically generates links to pages that have not yet been visited by a user and might be of his potential interest. Differently from the recommender systems proposed so far, SUGGEST 3.0 does not make use of any off-line component, and is able to manage Web sites made up of pages dynamically generated. To this purpose SUGGEST 3.0 incrementally builds and maintains historical information by means of an incremental graph partitioning algorithm, requiring no off-line component. The main innovation proposed here is a novel strategy that can be used to manage large Web sites. Experiments, conducted in order to evaluate SUGGEST 3.0 performance, demonstrated that our system is able to anticipate users' requests that will be made farther in the future, introducing a limited overhead on the Web server activity .
Web Dynamic
- Software Focus
, 2001
"... The global usage and continuing exponential growth of the World-Wide-Web poses a host of challenges to the research community. In particular, thereis an urgent need to understand and manage the dynamics of the Web, in order to develop new techniques which will make the Web tractable. We provide an o ..."
Abstract
-
Cited by 10 (1 self)
- Add to MetaCart
The global usage and continuing exponential growth of the World-Wide-Web poses a host of challenges to the research community. In particular, thereis an urgent need to understand and manage the dynamics of the Web, in order to develop new techniques which will make the Web tractable. We provide an overview of recent statistics relating to the size of the Web graph and its growth. We then briefly review some of the key areas relating to Webdynamics with reference to the recent literature. Finally, we summarise the talks given in a recent workshop devoted to Webdynamics which was held in the beginning of January 2001 at the University of London. Keywords. Web dynamics, Web graph, information retrieval, collaborative filtering, Web navigation,Website design, data-intensive Web applications, workflow management, e-commerce,mobile computation.

