Results 1 -
8 of
8
Improving Text Classification by Shrinkage in a Hierarchy of Classes
, 1998
"... When documents are organized in a large number of topic categories, the categories are often arranged in a hierarchy. The U.S. patent database and Yahoo are two examples. ..."
Abstract
-
Cited by 203 (5 self)
- Add to MetaCart
When documents are organized in a large number of topic categories, the categories are often arranged in a hierarchy. The U.S. patent database and Yahoo are two examples.
Context-Sensitive Modeling of Web-Surfing Behaviour using Concept Trees
- in Proceedings of the 5 th WEBKDD Workshop
, 2003
"... Early approaches to mathematically abstracting websurfing behavior were largely based on first-order Markov models. Most humans however do not surf in a "memoryless " fashion, rather they are guided by their timedependent situational context and associated information needs. This belief is corrobora ..."
Abstract
-
Cited by 14 (0 self)
- Add to MetaCart
Early approaches to mathematically abstracting websurfing behavior were largely based on first-order Markov models. Most humans however do not surf in a "memoryless " fashion, rather they are guided by their timedependent situational context and associated information needs. This belief is corroborated by the non-exponential revisit times observed in many site-centric weblogs. In this paper, we propose a general framework for modeling users whose surfing behavior is dynamically governed by their current topic of interest. This allows a modeled surfer to behave differently on the same page, depending on his situational context. The proposed methodology involves mapping each visited page to a topic or concept, (conceptually) imposing a tree hierarchy on these topics, and then estimating the parameters of a semi-Markov process defined on this tree based on the observed transitions among the underlying visited pages. The semi-Markovian assumption imparts additional flexibility by allowing for non-exponential state re-visit times, and the concept hierarchy provides a nice way of capturing context and user intent. Our approach is computationally much less demanding as compared to the alternative approach of using higher order Markov models for capturing history-sensitive surfing behavior. Several practical applications are described. The application of better predicting which outlink a surfer may take, is illustrated using web-log data from a rich community portal, www.sulekha.com as an example, though the focus of the paper is on forming a plausible generative model rather than solving any specific task.
Music Genre Classification with Taxonomy
- in Proc. of IEEE Int. Conference on Acoustics, Speech and Signal Processing
, 2005
"... Automatic music genre classification is a fundamental component of music information retrieval systems and has been gaining importance and enjoying a growing amount of attention with the emergence of digital music on the Internet. Although considerable research has been conducted in automatic music ..."
Abstract
-
Cited by 12 (0 self)
- Add to MetaCart
Automatic music genre classification is a fundamental component of music information retrieval systems and has been gaining importance and enjoying a growing amount of attention with the emergence of digital music on the Internet. Although considerable research has been conducted in automatic music genre classification, little has been done on hierarchical classification with taxonomies. The underlying hierarchical taxonomy identifies the relationships of dependence between different genres and provides valuable sources of information for genre classification. This paper investigates the use of taxonomy for music genre classification. Our empirical experiments on two datasets show that using taxonomy improves the classification performance. We also propose an approach for automatically generating genre taxonomies based on the confusion matrix via linear discriminant projection. Our work also provides some insights for future research. 1.
Gene function classification using Bayesian models with hierarchybased priors
- BMC Bioinformatics
, 2006
"... Abstract. We investigate the application of hierarchical classification schemes to the annotation of gene function based on several characteristics of protein sequences including phylogenic descriptors, sequence based attributes, and predicted secondary structure. We discuss three Bayesian models an ..."
Abstract
-
Cited by 7 (2 self)
- Add to MetaCart
Abstract. We investigate the application of hierarchical classification schemes to the annotation of gene function based on several characteristics of protein sequences including phylogenic descriptors, sequence based attributes, and predicted secondary structure. We discuss three Bayesian models and compare their performance in terms of predictive accuracy. These models are the ordinary multinomial logit (MNL) model, a hierarchical model based on a set of nested MNL models, and a MNL model with a prior that introduces correlations between the parameters for classes that are nearby in the hierarchy. We also provide a new scheme for combining different sources of information. We use these models to predict the functional class of Open Reading Frames (ORFs) from the E. coli genome. The results from all three models show substantial improvement over previous methods, which were based on the C5 algorithm. The MNL model using a prior based on the hierarchy outperforms both the non-hierarchical MNL model and the nested MNL model. In contrast to previous attempts at combining these sources of information, our approach results in a higher accuracy rate when compared to models that use each data source alone. Together, these results show that gene function can be predicted with higher accuracy than previously achieved, using Bayesian models that incorporate suitable prior information. 1
Hierarchical Text Categorization and Its Application to Bioinformatics
, 2005
"... In a hierarchical categorization problem, categories are partially ordered to form a hier-archy. In this dissertation, we explore two main aspects of hierarchical categorization: learning algorithms and performance evaluation. We introduce the notion of consistent hierarchical classification that ma ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
In a hierarchical categorization problem, categories are partially ordered to form a hier-archy. In this dissertation, we explore two main aspects of hierarchical categorization: learning algorithms and performance evaluation. We introduce the notion of consistent hierarchical classification that makes classification results more comprehensible and easily interpretable for end-users. Among the previously introduced hierarchical learning algo-rithms, only a local top-down approach produces consistent classification. The present work extends this algorithm to the general case of DAG class hierarchies and possible internal class assignments. In addition, a new global hierarchical approach aimed at performing consistent classification is proposed. This is a general framework of convert-ing a conventional “flat ” learning algorithm into a hierarchical one. An extensive set of experiments on real and synthetic data indicate that the proposed approach significantly outperforms the corresponding “flat ” as well as the local top-down method. For eval-uation purposes, we use a novel hierarchical evaluation measure that is superior to the existing hierarchical and non-hierarchical evaluation techniques according to a number
Classifying web documents in a hierarchy of categories: a comprehensive study
- J INTELL INF SYST
, 2007
"... Most of the research on text categorization has focused on classifying text documents into a set of categories with no structural relationships among them (flat classification). However, in many information repositories documents are organized in a hierarchy of categories to support a thematic searc ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
Most of the research on text categorization has focused on classifying text documents into a set of categories with no structural relationships among them (flat classification). However, in many information repositories documents are organized in a hierarchy of categories to support a thematic search by browsing topics of interests. The consideration of the hierarchical relationship among categories opens several additional issues in the development of methods for automated document classification. Questions concern the representation of documents, the learning process, the classification process and the evaluation criteria of experimental results. They are systematically investigated in this paper, whose main contribution is a general hierarchical text categorization framework where the hierarchy of categories is involved in all phases of automated document classification, namely feature selection, learning and classification of a new document. An automated threshold determination method for classification scores is embedded in the proposed framework. It can be applied to any classifier that returns a degree of membership of a document to a category. In this work three learning methods are considered for the construction
Towards a Comprehensive Topic Hierarchy for News
, 2000
"... To date, a comprehensive, Yahoo-like hierarchy of topics has yet to be offered for the domain of news. The Yahoo approach of managing such a hierarchy --- hiring editorial staff to read documents and correctly assign them to topics --- is simply not practical in the domain of news. Far too many stor ..."
Abstract
- Add to MetaCart
To date, a comprehensive, Yahoo-like hierarchy of topics has yet to be offered for the domain of news. The Yahoo approach of managing such a hierarchy --- hiring editorial staff to read documents and correctly assign them to topics --- is simply not practical in the domain of news. Far too many stories are written and made available online everyday. While many Machine Learning methods exist for organising documents into topics, these methods typically require a large number of labelled training examples before performing accurately. When managing a large and ever-changing topic hierarchy, it is unlikely that there would be enough time to provide many examples per topic. For this reason, it would be useful to identify extra information within the domain of news that could be harnessed to minimise the number of labelled examples required to achieve reasonable accuracy. To this end, the notion of a semi-labelled document is introduced. These documents, which are partially labelled by th...

