Results 1 -
2 of
2
Automatic New Topic Identification in Search Engine Transaction Logs using Multiple Linear Regression*
"... Content analysis of search engine user queries is an important task for search engine research, and identification of topic changes within a user search session is a key issue in content analysis of search engine user queries. The purpose of this study is to provide automatic new topic identificatio ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Content analysis of search engine user queries is an important task for search engine research, and identification of topic changes within a user search session is a key issue in content analysis of search engine user queries. The purpose of this study is to provide automatic new topic identification of search engine query logs, and estimate the effect of statistical characteristics of search engine queries on new topic identification. By applying multiple linear regression and ANOVA on a sample data log from the FAST search engine, we have reached the following findings: 1) We demonstrated that the statistical characteristics of Web search queries are effective on shifting to a new topic; 2) Multiple linear regression is a successful tool for estimating topic shifts and continuations. This study provides statistical proof for the relationship between the non-semantic characteristics of Web search queries and the occurrence of topic shifts and continuations. 1. Introduction and Related
USING MONTE-CARLO SIMULATION FOR AUTOMATIC NEW TOPIC IDENTIFICATION OF SEARCH ENGINE TRANSACTION LOGS
"... One of the most important dimensions of search engine user information seeking behavior and search engine research is content-based behavior, and limited research has focused on content-based behavior of search engine users. The purpose of this study is to present a simulation application on informa ..."
Abstract
- Add to MetaCart
One of the most important dimensions of search engine user information seeking behavior and search engine research is content-based behavior, and limited research has focused on content-based behavior of search engine users. The purpose of this study is to present a simulation application on information science, by performing automatic new topic identification in search engine transaction logs using Monte Carlo simulation. Sample data logs from FAST and Excite are used in the study. Findings show that Monte Carlo simulation for new topic identification yields satisfactory results in terms of identifying topic continuations, however the performance measures regarding topic shifts should be improved. 1 INTRODUCTION AND RELATED RESEARCH

