• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations

Drum sound recognition for polyphonic audio signals by adaptation and matching of spectrogram templates with harmonic structure suppression,” (2007)

by K Yoshii, M Goto, H Okuno
Venue:IEEE Transactions on Audio, Speech and Language Processing,
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 16
Next 10 →

Harmonic/percussive separation using median filtering

by Derry Fitzgerald - In Proc. of the 13th Int. Conference on Digital Audio Effects (DAFx-10 , 2010
"... ABSTRACT In this paper, we present a fast, simple and effective method to separate the harmonic and percussive parts of a monaural audio signal. The technique involves the use of median filtering on a spectrogram of the audio signal, with median filtering performed across successive frames to suppr ..."
Abstract - Cited by 39 (1 self) - Add to MetaCart
ABSTRACT In this paper, we present a fast, simple and effective method to separate the harmonic and percussive parts of a monaural audio signal. The technique involves the use of median filtering on a spectrogram of the audio signal, with median filtering performed across successive frames to suppress percussive events and enhance harmonic components, while median filtering is also performed across frequency bins to enhance percussive events and supress harmonic components. The two resulting median filtered spectrograms are then used to generate masks which are then applied to the original spectrogram to separate the harmonic and percussive parts of the signal. We illustrate the use of the algorithm in the context of remixing audio material from commercial recordings.
(Show Context)

Citation Context

...the effects of pitched instruments can help improve results for the automatic transcription of drum instruments, rhythm analysis beat tracking. Recently, the authors proposed a tensor factorisation based algorithm capable of obtaining good quality separation of harmonic and percussive sources [1]. This algorithm incorporated an additive synthesis based source-filter model for pitched instruments, as well as constraints to encourage temporal continuity on pitched sources. A principal advantage of this approach was that it required little or no pretraining in comparison to many other approaches [2, 3, 4]. Unfortunately, a considerable shortcoming of the tensor factorisation approach is that it is both processor and memory intensive, making it impractical for use when whole songs need to be processed, for example such as when remixing a song. In an effort to overcome this, it was decided to investigate other approaches capable of separating harmonic and percussive components without pretraining, but which were also computationally less intensive. Of particular interest was the approach developed by Ono et al [5]. This technique was based on the intuitive idea that stable harmonic or stationary...

ACTIVE MUSIC LISTENING INTERFACES BASED ON SIGNAL PROCESSING

by Masataka Goto , 2007
"... This paper introduces our research aimed at building “active music listening interfaces”. This research approach is intended to enrich end-users’ music listening experiences by applying music-understanding technologies based on signal processing. Active music listening is a way of listening to music ..."
Abstract - Cited by 28 (12 self) - Add to MetaCart
This paper introduces our research aimed at building “active music listening interfaces”. This research approach is intended to enrich end-users’ music listening experiences by applying music-understanding technologies based on signal processing. Active music listening is a way of listening to music through active interactions. We have developed seven interfaces for active music listening, such as interfaces for skipping sections of no interest within a musical piece while viewing a graphical overview of the entire song structure, for displaying virtual dancers or song lyrics synchronized with the music, for changing the timbre of instrument sounds in compact-disc recordings, and for browsing a large music collection to encounter interesting musical pieces or artists. These interfaces demonstrate the importance of music-understanding technologies and the benefit they offer to end users. Our hope is that this work will help change music listening into a more active, immersive experience.
(Show Context)

Citation Context

...r can actively change the volume or timbre of the sounds of bass and snare drums during music playback. The onset times of those drums are automatically estimated by our drum-sound recognition method =-=[12]-=-, which is based on the adapIV  1443 tation and matching of drum-sound templates. 3.5. Drumix: Audio player with a real-time drum-part editing function Drumix [13] is a user interface for playing bac...

Drum Sound Detection in Polyphonic Music with Hidden Markov Models

by Jouni Paulus, Anssi Klapuri , 2009
"... This paper proposes a method for transcribing drums from polyphonic music using a network of connected hidden Markov models (HMMs). The task is to detect the temporal locations of unpitched percussive sounds (such as bass drum or hi-hat) and recognise the instruments played. Contrary to many earlier ..."
Abstract - Cited by 6 (0 self) - Add to MetaCart
This paper proposes a method for transcribing drums from polyphonic music using a network of connected hidden Markov models (HMMs). The task is to detect the temporal locations of unpitched percussive sounds (such as bass drum or hi-hat) and recognise the instruments played. Contrary to many earlier methods, a separate sound event segmentation is not done, but connected HMMs are used to perform the segmentation and recognition jointly. Two ways of using HMMs are studied: modelling combinations of the target drums and a detector-like modelling of each target drum. Acoustic feature parametrisation is done with mel-frequency cepstral coefficients and their first-order temporal derivatives. The effect of lowering the feature dimensionality with principal component analysis and linear discriminant analysis is evaluated. Unsupervised acoustic model parameter adaptation with maximum likelihood linear regression is evaluated for compensating the differences between the training and target signals. The performance of the proposed method is evaluated on a publicly available data set containing signals with and without accompaniment, and compared with two reference methods. The results suggest that the transcription is possible using connected HMMs, and that using detector-like models for each target drum provides a better performance than modelling drum combinations.
(Show Context)

Citation Context

...Some methods cannot be assigned to either of the categories above. These include template matching and adaptation methods operating with time-domain signals [33], or with a spectrogram representation =-=[31]-=-. The main weakness with the “segment and classify” methods is the segmentation. The classification phase is not able to recover any events missed in the segmentation without an explicit error correct...

USING TENSOR FACTORISATION MODELS TO SEPARATE DRUMS FROM POLYPHONIC MUSIC

by Derry Fitzgerald, Eugene Coyle, Matt Cranitch
"... This paper describes the use of Non-negative Tensor Factorisation models for the separation of drums from polyphonic audio. Im-proved separation of the drums is achieved through the incorpo-ration of Gamma Chain priors into the Non-negative Tensor Fac-torisation framework. In contrast to many previo ..."
Abstract - Cited by 5 (1 self) - Add to MetaCart
This paper describes the use of Non-negative Tensor Factorisation models for the separation of drums from polyphonic audio. Im-proved separation of the drums is achieved through the incorpo-ration of Gamma Chain priors into the Non-negative Tensor Fac-torisation framework. In contrast to many previous approaches, the method used in this paper requires little or no pre-training or use of drum templates. The utility of the technique is shown on real-world audio examples. 1.
(Show Context)

Citation Context

...from a limitation in that it could not deal with simultaneous occurrences of these drums. A more advanced template adaptation scheme was used by Yoshii et al in the context of both drum transcription =-=[7]-=- and drum sound remixing [8]. In these papers initial seed templates consisting of spectrograms of the various drums to be separated are iteratively adapted to provide a better match to the drums in t...

An error correction framework based on drum pattern periodicity for improving drum sound detection

by Kazuyoshi Yoshii, Masataka Goto, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno - In Proc. ICASSP’06 , 2006
"... This paper presents a framework for correcting errors of automatic drum sound detection focusing on the periodicity of drum patterns. We define drum patterns as periodic structures found in onset sequences of bass and snare drum sounds. Our framework extracts periodic drum patterns from imperfect on ..."
Abstract - Cited by 4 (0 self) - Add to MetaCart
This paper presents a framework for correcting errors of automatic drum sound detection focusing on the periodicity of drum patterns. We define drum patterns as periodic structures found in onset sequences of bass and snare drum sounds. Our framework extracts periodic drum patterns from imperfect onset sequences of detected drum sounds (bottom-up processing) and corrects errors using the periodicity of the drum patterns (top-down processing). We implemented this framework on our drum-sound detection system. We first obtained onset sequences of the drum sounds with our system and extracted drum patterns. On the basis of our observation that the same drum patterns tend to be repeated, we detected time points which deviate from the periodicity as error candidates. Finally, we verified each error candidate to judge whether it is an actual onset or not. Experiments of drum sound detection for polyphonic audio signals of popular CD recordings showed that our correction framework improved the average detection accuracy from 77.4 % to 80.7%. 1.
(Show Context)

Citation Context

... we are working on an automatic rhythm description. Because drums are closely related to the rhythm, many drum-sound detection systems [1, 2] have been proposed. We developed a system, called AdaMast =-=[3, 4]-=-, based on adaptation and matching of drum-sound spectrogram templates. However, bottom-up methods are required to describe higher-level content (e.g., tempo). Although AdaMast can automatically detec...

SIMULTANEOUS PROCESSING OF SOUND SOURCE SEPARATION AND MUSICAL INSTRUMENT IDENTIFICATION USING BAYESIAN SPECTRAL MODELING

by Katsutoshi Itoyama, Masataka Goto, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno
"... This paper presents a method of both separating audio mixtures into sound sources and identifying the musical instruments of the sources. A statistical tone model of the power spectrogram, called an integrated model, is defined and source separation and instrument identification are carried out on t ..."
Abstract - Cited by 3 (1 self) - Add to MetaCart
This paper presents a method of both separating audio mixtures into sound sources and identifying the musical instruments of the sources. A statistical tone model of the power spectrogram, called an integrated model, is defined and source separation and instrument identification are carried out on the basis of Bayesian inference. Since, the parameter distributions of the integrated model depend on each instrument, the instrument name is identified by selecting the one that has the maximum relative instrument weight. Experimental results showed correct instrument identification enables precise source separation even when many overtones overlap. Index Terms — Source separation, instrument identification, Bayesian methods, spectrogram
(Show Context)

Citation Context

...(F0) estimates [3, 4, 5], tempo estimates and beat tracking [6, 7, 8]. Methods of sound source separation have also been reported for separating harmonic sounds [9, 10] and separating percussive ones =-=[11, 12]-=-. Although methods of blind source separation and source (talker) identification have been reported [13] for multi-channel audio signals recorded by using a microphone array, these methods cannot be a...

Music listening in the future: Augmented Music Understanding Interfaces and Crowd Music Listening

by Masataka Goto - in Proc. of the AES 42nd International Con! on Semantic Audio , 2011
"... Correspondence should be addressed to Masataka Goto (m.goto[at]aist.go.jp) In the future, music listening can be more active, more immersive, richer, and deeper by using automatic music-understanding technologies (semantic audio analysis). In the rst half of this invited talk, four Aug-mented Music- ..."
Abstract - Cited by 3 (1 self) - Add to MetaCart
Correspondence should be addressed to Masataka Goto (m.goto[at]aist.go.jp) In the future, music listening can be more active, more immersive, richer, and deeper by using automatic music-understanding technologies (semantic audio analysis). In the rst half of this invited talk, four Aug-mented Music-Understanding Interfaces that facilitate deeper understanding of music are introduced. In our interfaces, visualization of music content and music touch-up (customization) play important roles in augmenting people's understanding of music because understanding is deepened through seeing and editing. In the second half, a new style of music listening called Crowd Music Listening is discussed. By posting, sharing, and watching time-synchronous comments (semantic information), listeners can enjoy music to-gether with the crowd. Such Internet-based music listening with shared semantic information also helps music understanding because understanding is deepened through communication. Two systems that deal with new trends in music listening | time-synchronous comments and mashup music videos | are nally introduced. 1.
(Show Context)

Citation Context

...an casually switch drum sounds and drum patterns as the urge arises during music playback in real time. The onset times of those drums are automatically estimated by our drum-sound recognition method =-=[15]-=-, which is based on the adaptation and matching of drumsound templates. To deal with drum patterns in units of bar (measure), it also uses our beat-tracking method [16]. Other drum-sound recognition m...

Automatic Transcription of Pitch Content in Music and Selected Applications

by Matti Ryynänen
"... Transcription of music refers to the analysis of a music signal in order to produce a parametric representation of the sounding notes in the signal. This is conventionally carried out by listening to a piece of music and writing down the symbols of common musical notation to represent the occurring ..."
Abstract - Cited by 2 (0 self) - Add to MetaCart
Transcription of music refers to the analysis of a music signal in order to produce a parametric representation of the sounding notes in the signal. This is conventionally carried out by listening to a piece of music and writing down the symbols of common musical notation to represent the occurring notes in the piece. Automatic transcription of music refers to the extraction of such representations using signal-processing methods. This thesis concerns the automatic transcription of pitched notes in musical audio and its applications. Emphasis is laid on the transcription of realistic polyphonic music, where multiple pitched and percussive instruments are sounding simultaneously. The methods included in this thesis are based on a framework which combines both low-level acoustic modeling and high-level musicological modeling. The emphasis in the acoustic modeling has been set to note events so that the methods produce discrete-pitch notes with onset times and durations
(Show Context)

Citation Context

...ern recognition and separation-based methods. Despite the rapid development of the methods, their performance is still somewhat limited for polyphonic music. For different approaches and results, see =-=[30, 138]-=-. 10Instrument recognition methods aim at classifying the sounding instrument, or instruments, in music signals. The methods usually model the instrument timbre via various acoustic features. Classif...

DRUM TRANSCRIPTION USING PARTIALLY FIXED NON-NEGATIVE MATRIX FACTORIZATION

by Chih-wei Wu, Er Lerch
"... In this paper, a drum transcription algorithm using partially fixed non-negative matrix factorization is presented. The pro-posed method allows users to identify percussive events in complex mixtures with a minimal training set. The algorithm decomposes the music signal into two parts: percussive pa ..."
Abstract - Cited by 2 (0 self) - Add to MetaCart
In this paper, a drum transcription algorithm using partially fixed non-negative matrix factorization is presented. The pro-posed method allows users to identify percussive events in complex mixtures with a minimal training set. The algorithm decomposes the music signal into two parts: percussive part with pre-defined drum templates and harmonic part with un-defined entries. The harmonic part is able to adapt to the music content, allowing the algorithm to work in polyphonic mixtures. Drum event times can be simply picked from the percussive activation matrix with onset detection. The system is efficient and robust even with a minimal training set. The recognition rates for the ENST dataset vary from 56.7 to 78.9% for three percussive instruments extracted from polyphonic music.
(Show Context)

Citation Context

...tive enough. Another difficulty is the determination of the rank required for the decomposition process. The third type of approaches (match and adapt) uses pretrained templates to detect drum events =-=[12]-=-. The templates are searched for the closest match and adapted in an iterative process. 3. METHOD 3.1. Algorithm Description In this paper, we propose a method using partially fixed NMF to transcribe ...

AN OPEN-SOURCE DRUM TRANSCRIPTION SYSTEM FOR PURE DATA AND MAX MSP

by Marius Miron, Matthew E. P. Davies, Fabien Gouyon
"... This paper presents a drum transcription algorithm adjusted to the constraints of real-time audio. We introduce an instance filtering (IF) method using sub-band onset detection, which improves the performance of a system having at its core a feature-based K-nearest neighbor classifier (KNN). The ar- ..."
Abstract - Cited by 2 (1 self) - Add to MetaCart
This paper presents a drum transcription algorithm adjusted to the constraints of real-time audio. We introduce an instance filtering (IF) method using sub-band onset detection, which improves the performance of a system having at its core a feature-based K-nearest neighbor classifier (KNN). The ar-chitecture proposed allows for adapting different parts of the algorithm for either bass drum, snare drum or hi-hat cym-bals. The open-source system is implemented in the graphic programming languages Pure Data (PD) and Max MSP, and aims to work with a large variety of drum sets. We evalu-ated its performance on a database of audio samples gener-ated from a well known collection of midi drum loops ran-domly matched with a diverse collection of drum sets. Both of the evaluation stages, testing and validation, show a signifi-cant improvement in the performance when using the instance filtering algorithm. Index Terms — drum transcription, feature-based classi-fication, real-time audio, Pure Data, Max MSP 1.
(Show Context)

Citation Context

...d hi-hats. Paulus and Virtanen [10] combine principal component analysis and non-negative matrix factorization to decompose the spectrogram into spectrograms of the targeted drum sounds. Yoshii et al =-=[11]-=- proposed an approach which computes an adaptive spectrogram template for each drum class. Despite the number of methods published, very little code has been made available and most systems are not ta...

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University