Results 1 - 10
of
22
Learning to Classify Documents According to Genre
- In IJCAI-03 Workshop on Computational Approaches to Style Analysis and Synthesis
, 2003
"... Genre or style analysis can be used to improve results achieved using standard IR techniques. A genre class is a group of documents that are written in a similar style. Genre classification can identify documents that are written in a style most likely to satisfy a user's information need. ..."
Abstract
-
Cited by 56 (0 self)
- Add to MetaCart
Genre or style analysis can be used to improve results achieved using standard IR techniques. A genre class is a group of documents that are written in a similar style. Genre classification can identify documents that are written in a style most likely to satisfy a user's information need.
Automatically Analyzing and Organizing Music Archives
- In Proceedings of the European Conference on Research and Advanced Technology for Digital Libraries (ECDL
, 2001
"... . We are experiencing a tremendous increase in the amount of music being made available in digital form. With the creation of large multimedia collections, however, we need to devise ways to make those collections accessible to the users. While music repositories exist today, they mostly limit a ..."
Abstract
-
Cited by 40 (17 self)
- Add to MetaCart
. We are experiencing a tremendous increase in the amount of music being made available in digital form. With the creation of large multimedia collections, however, we need to devise ways to make those collections accessible to the users. While music repositories exist today, they mostly limit access to their content to query-based retrieval of their items based on textual meta-information, with some advanced systems supporting acoustic queries. What we would like to have additionally, is a way to facilitate exploration of musical libraries. We thus need to automatically organize music according to its sound characteristics in such a way that we nd similar pieces of music grouped together, allowing us to nd a classical section, or a hard-rock section etc. in a music repository. In this paper we present an approach to obtain such an organization of music data based on an extension to our SOMLib digital library system for text documents. Particularly, we employ the Self-Organizing Map to create a map of a musical archive, where pieces of music with similar sound characteristics are organized next to each other on the two-dimensional map display. Locating a piece of music on the map then leaves you with related music next to it, allowing intuitive exploration of a music archive. Keywords: Multimedia, Music Library, Self-Organizing Map (SOM), Exploration of Information Spaces, User Interface, MP3 1
Genre Classification and Domain Transfer for Information Filtering
, 2002
"... The World Wide Web is a vast repository of information, but the sheer volume makes it difficult to identify useful documents. We identify document genre is an important factor in retrieving useful documents and focus on the novel document genre dimension of subjectivity. ..."
Abstract
-
Cited by 26 (3 self)
- Add to MetaCart
The World Wide Web is a vast repository of information, but the sheer volume makes it difficult to identify useful documents. We identify document genre is an important factor in retrieving useful documents and focus on the novel document genre dimension of subjectivity.
Automatic Genre Classification of MIDI Recordings
, 2004
"... A software system that automatically classifies MIDI files into hierarchically organized taxonomies of musical genres is presented. This extensible software includes an easy to use and flexible GUI. An extensive library of high-level musical features is compiled, including many original features. A ..."
Abstract
-
Cited by 20 (12 self)
- Add to MetaCart
A software system that automatically classifies MIDI files into hierarchically organized taxonomies of musical genres is presented. This extensible software includes an easy to use and flexible GUI. An extensive library of high-level musical features is compiled, including many original features. A novel hybrid classification system is used that makes use of hierarchical, flat and round robin classification. Both k-nearest neighbour and neural network-based classifiers are used, and feature selection and weighting are performed using genetic algorithms. A thorough review of previous research in automatic genre classification is presented, along with an overview of automatic feature selection and classification techniques. Also included is a discussion of the theoretical issues relating to musical genre, including but not limited to what mechanisms humans use to classify music by genre and how realistic genre taxonomies can be constructed.
A document corpus browser for in-depth reading
- In JCDL ’04: Proceedings of the Fourth ACM/IEEE Joint Conference on Digital Libraries
, 2004
"... Software tools, including Web browsers, e-books, electronic document formats, search engines, and digital libraries are changing the way people read, making it easier for them to find and view documents. However, while these tools provide significant help with short-term reading projects involving s ..."
Abstract
-
Cited by 12 (7 self)
- Add to MetaCart
Software tools, including Web browsers, e-books, electronic document formats, search engines, and digital libraries are changing the way people read, making it easier for them to find and view documents. However, while these tools provide significant help with short-term reading projects involving small numbers of documents, they provide less help with longer-term reading projects, in which a topic is to be understood in depth by reading many documents. For such projects, readers must find and manage many documents and citations, remember what has been read, and prioritize what to read next. This paper describes three integrated software tools that facilitate in-depth reading. A first tool extracts citation information from documents. A second finds on-line documents from their citations. The last is a document corpus browser that uses a zoomable user interface to show a corpus at multiple granularities while supporting reading tasks that take days, weeks, or longer. We describe these tools and the design principles that motivated them.
Document style recognition using shallow statistical analysis”, in ESSLLI 2004 Workshop on Combining shallow and deep processing for NLP
- In Proc. of the ESSLLI 2004 Workshop on Combining Shallow and Deep Processing for NLP
, 2004
"... ..."
The Naming of Cats: Automated genre classification
- International Journal for Digital Curation
"... “The Naming of Cats is a difficult matter, It isn’t just one of your holiday games; You may think at first I’m as mad as a hatter, When I tell you, a cat must have three different names. ”- T.S. Eliot, The Naming of Cats Abstract. This paper builds on the work presented at the ECDL 2006 ([29]) in au ..."
Abstract
-
Cited by 4 (3 self)
- Add to MetaCart
“The Naming of Cats is a difficult matter, It isn’t just one of your holiday games; You may think at first I’m as mad as a hatter, When I tell you, a cat must have three different names. ”- T.S. Eliot, The Naming of Cats Abstract. This paper builds on the work presented at the ECDL 2006 ([29]) in automated genre classification as a step toward automating metadata extraction from digital documents for ingest into digital repositories such as those run by archives, libraries and eprint services. We divide features of the documents into five types: features for visual layout, linguistically modeled syntactic features, stylo-metric features, features for semantic structure, and contextual features as an object linked to previously classified objects and other external sources. Results concerning the first two types have been described elsewhere([29]). The current paper discusses results from testing classifiers based on image and stylometric features and shows that genres for which image features fail to cluster are the genres for which stylo-metric features cluster very well. 1 Background and Objective In [29], we summarised the valuable role of automated metadata extraction in the cost-effective efficient management of digital collections: metadata play a key role in management processes ([43], [23]) and the manual creation of metadata is expensive ([15], [23], [40]). As we pointed out in [29], ERPANET’s ([18]) Packaged Object Ingest Project ([19]) identified automatic extraction tools for technical metadata (e.g. [33], [35]), and substantial work on descriptive metadata extraction within specific domains has been conducted (e.g. [32], [13], [2],
Searching Documents Based on Relevance and Type
- Advances in Information Retrieval, Proceeding of 29th European Conference on IT Research (ECIR 2007
, 2007
"... Abstract. This paper extends previous work on document retrieval and document type classification, addressing the problem of ‘typed search’. Specifically, given a query and a designated document type, the search system retrieves and ranks documents not only based on the relevance to the query, but a ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Abstract. This paper extends previous work on document retrieval and document type classification, addressing the problem of ‘typed search’. Specifically, given a query and a designated document type, the search system retrieves and ranks documents not only based on the relevance to the query, but also based on the likelihood of being the designated document type. The paper formalizes the problem in a general framework consisting of ‘relevance model’ and ‘type model’. The relevance model indicates whether or not a document is relevant to a query. The type model indicates whether or not a document belongs to the designated document type. We consider three methods for combing the models: linear combination of scores, thresholding on the type score, and a hybrid of the previous two methods. We take course page search and instruction document search as examples and have conducted a series of experiments. Experimental results show our proposed approaches can significantly outperform the baseline methods. 1
Classifying XML Documents by Using Genre Features
- TIR-07 4TH INTERNATIONAL WORKSHOP ON TEXT-BASED INFORMATION RETRIEVAL (DEXA 2007)
, 2007
"... The categorization of documents is traditionally topic-based. This paper presents a complementary analysis of research and experiments on genre to show that encouraging results can be obtained by using genre structure (form) features. We conducted an experiment to assess the effectiveness of using e ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
The categorization of documents is traditionally topic-based. This paper presents a complementary analysis of research and experiments on genre to show that encouraging results can be obtained by using genre structure (form) features. We conducted an experiment to assess the effectiveness of using extensible mark-up language (XML) tag information, and part-of-speech (P-O-S) features, for the classification of genres, testing the hypothesis that if a focus on genre can lead to high precision on normal textual documents, then good results can be achieved using XML tag information in addition to P-O-S information. An experiment was carried out on a subsection of the initiative for the evaluation of XML (INEX) 1.4 collection. The features were extracted and documents were classified using machine learning algorithms, which yielded encouraging results for logistic regression and neural networks. We propose that utilizing these features and training a classifier may benefit retrieval for most world wide web (WWW) technologies such as XML and extensible hypertext markup language) XHTML. 1.
Examining Variations of Prominent Features in Genre Classification
"... This paper investigates the correlation between features of three types (visual, stylistic and topical types) and genre classes. The majority of previous studies in automated genre classification have created models based on an amalgamated representation of a document using a combination of features ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
This paper investigates the correlation between features of three types (visual, stylistic and topical types) and genre classes. The majority of previous studies in automated genre classification have created models based on an amalgamated representation of a document using a combination of features. In these models, the inseparable roles of different features make it difficult to determine a means of improving the classifier when it exhibits poor performance in detecting selected genres. In this paper we use classifiers independently modeled on three groups of features to examine six genre classes to show that the strongest features for making one classification is not necessarily the best features for carrying out another classification. 1.

