Results 1 - 10
of
139
Data Preparation for Mining World Wide Web Browsing Patterns
- KNOWLEDGE AND INFORMATION SYSTEMS
, 1999
"... The World Wide Web (WWW) continues to grow at an astounding rate in both the sheer volume of tra#c and the size and complexity of Web sites. The complexity of tasks such as Web site design, Web server design, and of simply navigating through a Web site have increased along with this growth. An i ..."
Abstract
-
Cited by 367 (39 self)
- Add to MetaCart
The World Wide Web (WWW) continues to grow at an astounding rate in both the sheer volume of tra#c and the size and complexity of Web sites. The complexity of tasks such as Web site design, Web server design, and of simply navigating through a Web site have increased along with this growth. An important input to these design tasks is the analysis of how a Web site is being used. Usage analysis includes straightforward statistics, such as page access frequency, as well as more sophisticated forms of analysis, such as finding the common traversal paths through a Web site. Web Usage Mining is the application of data mining techniques to usage logs of large Web data repositories in order to produce results that can be used in the design tasks mentioned above. However, there are several preprocessing tasks that must be performed prior to applying data mining algorithms to the data collected from server logs. This paper presents several data preparation techniques in order to identify unique users and user sessions. Also, a method to divide user sessions into semantically meaningful transactions is defined and successfully tested against two other methods. Transactions identified by the proposed methods are used to discover association rules from real world data using the WEBMINER system [15].
Efficiently Mining Frequent Trees in a Forest
, 2002
"... Mining frequent trees is very useful in domains like bioinformatics, web mining, mining semi-structured data, and so on. We formulate the problem of mining (embedded) subtrees in a forest of rooted, labeled, and ordered trees. We present TreeMiner, a novel algorithm to discover all frequent subtrees ..."
Abstract
-
Cited by 138 (6 self)
- Add to MetaCart
Mining frequent trees is very useful in domains like bioinformatics, web mining, mining semi-structured data, and so on. We formulate the problem of mining (embedded) subtrees in a forest of rooted, labeled, and ordered trees. We present TreeMiner, a novel algorithm to discover all frequent subtrees in a forest, using a new data structure called scope-list. We contrast TreeMiner with a pattern matching tree mining algorithm (PatternMatcher). We conduct detailed experiments to test the performance and scalability of these methods. We find that TreeMiner outperforms the pattern matching approach by a factor of 4 to 20, and has good scaleup properties. We also present an application of tree mining to analyze real web logs for usage patterns.
Using Information Scent to Model User Information Needs and Actions on the Web
, 2001
"... On the Web, users typically forage for information by navigating from page to page along Web links. Their surfing patterns or actions are guided by their information needs. Researchers need tools to explore the complex interactions between user needs, user actions, and the structures and contents of ..."
Abstract
-
Cited by 108 (2 self)
- Add to MetaCart
On the Web, users typically forage for information by navigating from page to page along Web links. Their surfing patterns or actions are guided by their information needs. Researchers need tools to explore the complex interactions between user needs, user actions, and the structures and contents of the Web. In this paper, we describe two computational methods for understanding the relationship between user needs and user actions. First, for a particular pattern of surfing, we seek to infer the associated information need. Second, given an information need, and some pages as starting points, we attempt to predict the expected surfing patterns. The algorithms use a concept called “information scent”, which is the subjective sense of value and cost of accessing a page based on perceptual cues. We present an empirical evaluation of these two algorithms, and show their effectiveness.
Integrating Web Usage and Content Mining for More Effective Personalization
- IN E-COMMERCE AND WEB TECHNOLOGIES," LECTURE NOTES IN COMPUTER SCIENCE (LNCS) 1875
, 2000
"... Recent proposals have suggested Web usage mining as an enabling mechanism to overcome the problems associated with more traditional Web personalization techniques such as collaborative or contentbased filtering. These problems include lack of scalability, reliance on subjective user ratings or s ..."
Abstract
-
Cited by 64 (9 self)
- Add to MetaCart
Recent proposals have suggested Web usage mining as an enabling mechanism to overcome the problems associated with more traditional Web personalization techniques such as collaborative or contentbased filtering. These problems include lack of scalability, reliance on subjective user ratings or static profiles, and the inability to capture a richer set of semantic relationships among objects (in content-based systems). Yet, usage-based personalization can be problematic when little usage data is available pertaining to some objects or when the site contentchanges regularly.For more effective personalization, both usage and content attributes of a site must be integrated into a Web mining framework and used by the recommendation engine in a uniform manner. In this
Web Usage Mining: Discovery and Application of Interestin Patterns from Web Data
, 2000
"... Web Usage Mining is the application of data mining techniques to Web clickstream data in order to extract usage patterns. As Web sites continue to grow in size and complexity, the results of Web Usage Mining have become critical for a number of applications such as Web site design, business and mark ..."
Abstract
-
Cited by 57 (0 self)
- Add to MetaCart
Web Usage Mining is the application of data mining techniques to Web clickstream data in order to extract usage patterns. As Web sites continue to grow in size and complexity, the results of Web Usage Mining have become critical for a number of applications such as Web site design, business and marketing decision support, personalization, usability studies, and network trac analysis. The two major challenges involved in Web Usage Mining are preprocessing the raw data to provide an accurate picture of how a site is being used, and ltering the results of the various data mining algorithms in order to present only the rules and patterns that are potentially interesting. This thesis develops and tests an architecture and algorithms for performing Web Usage Mining. An evidence combination framework referred to as the information lter is developed to compare and combine usage, content, and structure information about a Web site. The information lter automatically identi es the discovered ...
Mining Partially Periodic Event Patterns With Unknown Periods
- Proc. ICDE
, 2000
"... Periodic behavior is common in real-world applications. However, in many cases, periodicities are partial in that they are present only intermittently. Herein, we study such intermittent patterns, which we refer to as p-patterns. Our formulation of p-patterns takes into account imprecise time inf ..."
Abstract
-
Cited by 43 (1 self)
- Add to MetaCart
Periodic behavior is common in real-world applications. However, in many cases, periodicities are partial in that they are present only intermittently. Herein, we study such intermittent patterns, which we refer to as p-patterns. Our formulation of p-patterns takes into account imprecise time information (e.g., due to unsynchronized clocks in distributed environments), noisy data (e.g., due to extraneous events), and shifts in phase and/or periods. We structure mining for p-patterns as two sub-tasks: (1) finding the periods of p-patterns and (2) mining temporal associations. For (2), a level-wise algorithm is used. For (1), we develop a novel approach based on a chi-squared test, and study its performance in the presence of noise.
Discovery of Interesting Usage Patterns from Web Data
- Advances in Web Usage Analysis and User Profiling. LNAI 1836
, 1999
"... . Web Usage Mining is the application of data mining techniques to large Web data repositories in order to extract usage patterns. As with many data mining application domains, the identification of patterns that are considered interesting is a problem that must be solved in addition to simply g ..."
Abstract
-
Cited by 43 (0 self)
- Add to MetaCart
. Web Usage Mining is the application of data mining techniques to large Web data repositories in order to extract usage patterns. As with many data mining application domains, the identification of patterns that are considered interesting is a problem that must be solved in addition to simply generating them. A necessary step in identifying interesting results is quantifying what is considered uninteresting in order to form a basis for comparison. Several research efforts have relied on manually generated sets of uninteresting rules. However, manual generation of a comprehensive set of evidence about beliefs for a particular domain is impractical in many cases. Generally, domain knowledge can be used to automatically create evidence for or against a set of beliefs. This paper develops a quantitative model based on support logic for determining the interestingness of discovered patterns. For Web Usage Mining, there are three types of domain information available; usage, co...
WebSIFT: The Web Site Information Filter System
- In Proceedings of the Web Usage Analysis and User Profiling Workshop
, 1999
"... Web Usage Mining is the application of data mining techniques to large Web data repositories in order to extract usage patterns. As with many data mining application domains, the identification of patterns that are considered interesting is a problem that must be solved in addition to simply gene ..."
Abstract
-
Cited by 36 (1 self)
- Add to MetaCart
Web Usage Mining is the application of data mining techniques to large Web data repositories in order to extract usage patterns. As with many data mining application domains, the identification of patterns that are considered interesting is a problem that must be solved in addition to simply generating them. A necessary step in identifying interesting results is quantifying what is considered uninteresting in order to form a basis for comparison. Several research efforts have relied on manually generated sets of uninteresting rules. However, manual generation of a comprehensive set of evidence about beliefs for a particular domain is impractical in many cases. Generally, domain knowledge can be used to automatically create evidence for or against a set of beliefs. For Web Usage Mining, there are three types of domain information available; usage, content, and structure. The Web Site Information Filter (WebSIFT) system uses the content and structure information from a Web site...
A Data Miner analyzing the Navigational Behaviour of Web Users
- In Proc. of the Workshop on Machine Learning in User Modelling of the ACAI99
, 1999
"... Web site design is currently based on thorough investigations about the interests of web site visitors and on less investigated assumptions about their exact behaviour. Concrete knowledge on the way visitors navigate in a web site could prevent disorientation and help owners in placing important inf ..."
Abstract
-
Cited by 33 (2 self)
- Add to MetaCart
Web site design is currently based on thorough investigations about the interests of web site visitors and on less investigated assumptions about their exact behaviour. Concrete knowledge on the way visitors navigate in a web site could prevent disorientation and help owners in placing important information exactly where the visitors look for it. Our Web Utilization Miner tool can provide such knowledge. The general problem we address is: Given a number of traversed paths, discover subpaths with structural or statistical properties of interest. In fact, we anticipate that not all nodes in a subpath are of equal importance. Hence, we allow that subpaths having only some nodes in common be combined into a pattern that shows the desired properties as a whole. To capture the ambiguous expressions of this problem, we provide a powerful mining language, by which the expert can specify the desired structural and statistical properties of the patterns to be constructed. To efficient...

