Results 1 -
5 of
5
A Personalized Search Engine Based on Web-Snippet Hierarchical Clustering
, 2005
"... In this paper we propose a hierarchical clustering engine, called SnakeT, that is able to organize on-the-fly the search results drawn from 16 commodity search engines into a hierarchy of labeled folders. The hierarchy o#ers a complementary view to the flat-ranked list of results returned by current ..."
Abstract
-
Cited by 54 (3 self)
- Add to MetaCart
In this paper we propose a hierarchical clustering engine, called SnakeT, that is able to organize on-the-fly the search results drawn from 16 commodity search engines into a hierarchy of labeled folders. The hierarchy o#ers a complementary view to the flat-ranked list of results returned by current search engines. Users can navigate through the hierarchy driven by their search needs. This is especially useful for informative, polysemous and poor queries.
Predicting Searcher Frustration
"... When search engine users have trouble finding information, they may become frustrated, possibly resulting in a bad experience (even if they are ultimately successful). In a user study in which participants were given difficult information seeking tasks, half of all queries submitted resulted in some ..."
Abstract
-
Cited by 10 (1 self)
- Add to MetaCart
When search engine users have trouble finding information, they may become frustrated, possibly resulting in a bad experience (even if they are ultimately successful). In a user study in which participants were given difficult information seeking tasks, half of all queries submitted resulted in some degree of self-reported frustration. A third of all successful tasks involved at least one instance of frustration. By modeling searcher frustration, search engines can predict the current state of user frustration and decide when to intervene with alternative search strategies to prevent the user from becoming more frustrated, giving up, or switching to another search engine. We present several models to predict frustration using features extracted from query logs and physical sensors. We are able to predict frustration with a mean average precision of 66 % from the physical sensors, and 87% from the query log features.
An empirical comparison of techniques for extracting concept abbreviations from identifiers
- In Proceedings of IASTED International Conference on Software Engineering and Applications (SEA 2006
, 2006
"... When a programmer is faced with the task of modifying code written by others, he or she must first gain an understanding of the concepts and entities used by the program. Comments and identifiers are the two main sources of such knowledge. In the case of identifiers, the meaning can be hidden in abb ..."
Abstract
-
Cited by 8 (3 self)
- Add to MetaCart
When a programmer is faced with the task of modifying code written by others, he or she must first gain an understanding of the concepts and entities used by the program. Comments and identifiers are the two main sources of such knowledge. In the case of identifiers, the meaning can be hidden in abbreviations that make comprehension more difficult. A tool that can automatically replace abbreviations with their full word meanings would improve the comprehension ability (especially of less experienced programmers) to understand and work with the code. Such a tool first needs to isolate abbreviations within the identifiers. When identifiers are separated by division markers such as underscores or camel-casing, this isolation task is trivial. However, many identifiers lack these division markers. Therefore, the first task of automatic expansion is separation of identifiers into their constituent parts. Presented here is a comparison of three techniques that accomplish this task: a random algorithm (used as a straw man), a greedy algorithm, and a neural network based algorithm. The greedy algorithm’s performance ranges from 75 to 81 percent correct, while the neural network’s performance ranges from 71 to 95 percent correct.
Empirical Software Engineering manuscript No. (will be inserted by the editor) Quantifying Identifier Quality: An Analysis of Trends
"... The date of receipt and acceptance will be inserted by the editor Abstract Identifiers, which represent the defined concepts in a program, account for, by some measures, almost three quarters of source code. The makeup of identifiers plays a key role in how well they communicate these defined concep ..."
Abstract
- Add to MetaCart
The date of receipt and acceptance will be inserted by the editor Abstract Identifiers, which represent the defined concepts in a program, account for, by some measures, almost three quarters of source code. The makeup of identifiers plays a key role in how well they communicate these defined concepts. An empirical study of identifier quality based on almost 50 million lines of code, covering thirty years, four programming languages, and both open and proprietary source is presented. For the purposes of the study, identifier quality is conservatively defined as the possibility of constructing the identifier out of dictionary words or known abbreviations. Four hypotheses related to identifier quality are considered using linear mixed effect regression models. For example, the first hypothesis is that modern programs include higher quality identifiers than older ones. In this case, the results show that better programming practices are producing higher quality identifies. Results also confirm some commonly held beliefs, such as proprietary code having more acronyms than open source code.
Tracking Changes in Language 1
"... One recent field of study is the extraction of useful information from changes in a data stream including natural language. Statistical tests upon single word occurrences can reveal many apparent differences. However, to automatically ascertain the causes of changes in the data stream requires metho ..."
Abstract
- Add to MetaCart
One recent field of study is the extraction of useful information from changes in a data stream including natural language. Statistical tests upon single word occurrences can reveal many apparent differences. However, to automatically ascertain the causes of changes in the data stream requires methods for finding structure within the entire set of individual changed items. This work presents a methodology for understanding how a language model has altered based upon utterance clustering and statistical tests on individual features. It further examines clustering of lexical items via profiles of changes in association scores. A machine using an analysis package based on these techniques can isolate novel portions of the data stream. Human inspection of such data then readily determines the nature of the observed change. We investigate several variants of this analysis upon data drawn from an automated call center. Clustering, change detection, speech data mining

