Results 1 -
9 of
9
Audio content analysis for online audiovisual data segmentation and classification
- 62 IEEE SIGNAL PROCESSING MAGAZINE MARCH 2004
, 2001
"... Abstract—While current approaches for audiovisual data segmentation and classification are mostly focused on visual cues, audio signals may actually play a more important role in content parsing for many applications. An approach to automatic segmentation and classification of audiovisual data based ..."
Abstract
-
Cited by 46 (2 self)
- Add to MetaCart
Abstract—While current approaches for audiovisual data segmentation and classification are mostly focused on visual cues, audio signals may actually play a more important role in content parsing for many applications. An approach to automatic segmentation and classification of audiovisual data based on audio content analysis is proposed. The audio signal from movies or TV programs is segmented and classified into basic types such as speech, music, song, environmental sound, speech with music background, environmental sound with music background, silence, etc. Simple audio features including the energy function, the average zero-crossing rate, the fundamental frequency, and the spectral peak tracks are extracted to ensure the feasibility of real-time processing. A heuristic rule-based procedure is proposed to segment and classify audio signals and built upon morphological and statistical analysis of the time-varying functions of these audio features. Experimental results show that the proposed scheme achieves an accuracy rate of more than 90 % in audio classification. Index Terms—Audio analysis, audio indexing, audio segmentation, audiovisual content parsing, information filtering and retrieval, multimedia database management. I.
An efficient pitch-tracking algorithm using a combination of Fourier transforms
- Proceedings of the COST G-6 Conference on Digital Audio Effects (DAFX-01
, 2001
"... In this paper we present a technique for detecting the pitch of sound using a series of two forward Fourier transforms. We use an enhanced version of the Fourier transform for a better accuracy, as well as a tracking strategy among pitch candidates for an increased robustness. This efficient techniq ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
In this paper we present a technique for detecting the pitch of sound using a series of two forward Fourier transforms. We use an enhanced version of the Fourier transform for a better accuracy, as well as a tracking strategy among pitch candidates for an increased robustness. This efficient technique allows us to precisely find out the pitches of harmonic sounds such as the voice or classic musical instruments, but also of more complex sounds like rippled noises.
Extracting Sinusoids From Harmonic Signals
- Proceedigns of the 2nd COST G-6 Workshop on Digital Audio Effects (DAFx99
, 1999
"... This paper presents a special window function for a Fast Fourier Transform (FFT) based spectral modeling approach for signals consisting of sinusoids plus noise. The main new idea is to choose a time window function with a simple Fourier transform. With the knowledge of the Fourier transform of the ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
This paper presents a special window function for a Fast Fourier Transform (FFT) based spectral modeling approach for signals consisting of sinusoids plus noise. The main new idea is to choose a time window function with a simple Fourier transform. With the knowledge of the Fourier transform of the window function we are able to extract the parameters (frequency, amplitude, and phase) of sinusoids in real-time with a digital signal processor. 1. INTRODUCTION For any application that deals with power spectrum estimation and harmonic analysis, it is important to extract the line spectrum components before performing any spectral noise "smoothing" because otherwise the lines would lose their sharpness [1]. Also in speech and audio coders, an important task is to extract harmonic signals and to calculate masking thresholds for adaptive bit allocation. Many approaches [2] have been proposed for this task, e.g. timedomain Prony's method, subband modeling, least-square fitting [3] and frequ...
From Raw Polyphonic Audio to Locating Recurring Themes
- Proceedings of the 1st Annual International Symposium on Music Information Retrieval (ISMIR 2000). Retrieved February 7, 2002, from http://ciir.cs.umass.edu/music2000/posters/shroeter_ruger.pdf
, 2000
"... We present research studies of two related strands in content-based music retrieval: the automatic transcription of raw audio from a single polyphonic instrument with discrete pitch (eg piano) and the location of recurring themes from a Humdrum score. ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
We present research studies of two related strands in content-based music retrieval: the automatic transcription of raw audio from a single polyphonic instrument with discrete pitch (eg piano) and the location of recurring themes from a Humdrum score.
之自動切音研究 A Study on Automatic Phonetic Segmentation for Mandarin Speech/Singing Voice Synthesis
"... model) 的強制性比對方法去進行初始切音的工作。另一方面,對於歌聲語料庫, 除了採用前者的方法之外,我們也加入了動態時間扭曲演算 法 (dynamic time warping)。由於這兩種初始切音的準確度並不高,於是我們使用一個後處理的切音 ..."
Abstract
- Add to MetaCart
model) 的強制性比對方法去進行初始切音的工作。另一方面,對於歌聲語料庫, 除了採用前者的方法之外,我們也加入了動態時間扭曲演算 法 (dynamic time warping)。由於這兩種初始切音的準確度並不高,於是我們使用一個後處理的切音
Fundamental Frequency Estimation in the SMS Analysis
- Proceedings of the Digital Audio Effects Workshop (DAFX98
, 1998
"... This paper deals with the fundamental frequency estimation for monophonic sounds in the SMS analysis environment. The importance of the fundamental frequency as well as some uses in SMS is commented. The particular method of F 0 estimation based on a two-way mismatched measure is described as wel ..."
Abstract
- Add to MetaCart
This paper deals with the fundamental frequency estimation for monophonic sounds in the SMS analysis environment. The importance of the fundamental frequency as well as some uses in SMS is commented. The particular method of F 0 estimation based on a two-way mismatched measure is described as well as some modifications. Finally we explain how pitch-unpitched decision is performed.
Development of a Methodology for Identification of Indian Musical Instruments
, 2001
"... In this work, an attempt is made to develop a methodology for Identication of Indian Musical Instruments. Given a digital audio le with mono recording of an Indian Instrument, we identify the instrument played. The approach involves feature extraction from the signal based on Digital Signal Processi ..."
Abstract
- Add to MetaCart
In this work, an attempt is made to develop a methodology for Identication of Indian Musical Instruments. Given a digital audio le with mono recording of an Indian Instrument, we identify the instrument played. The approach involves feature extraction from the signal based on Digital Signal Processing techniques. The spectral moments and pitch of the music signal are used as features. The features extracted from the training data are stored in a database for a learning system based on the k-Nearest Neighbor classier (k-NN). The k-NN method uses a priori information from the training data set to estimate posterior probabilities for an unknown data. We implement the same and test our approach for 4 Indian Instruments - Sitar, Sarod, Tabla and Bansuri. A total of 60 les consisting of 15 recordings of each of the 4 instruments were tested. The recognition was as high as 73.33% for the Tabla and as low as 26.67% for the Sitar.
Image Analysis Lab.
"... Abstract—Auditory scene in a natural environment contains multiple sources. Auditory scene analysis (ASA) is the process in which the auditory system segregates a scene into streams corresponding to different sources. The determination of range of pitch frequency is necessary for segmentation. We pr ..."
Abstract
- Add to MetaCart
Abstract—Auditory scene in a natural environment contains multiple sources. Auditory scene analysis (ASA) is the process in which the auditory system segregates a scene into streams corresponding to different sources. The determination of range of pitch frequency is necessary for segmentation. We propose a system to determine the range of pitch frequency by analyzing onsets and offsets in modulation frequency domain. In the proposed system, first the modulation spectrum of speech is calculated and then, in each subband onsets and offsets will be detected. Thereafter, the segments are generated by matching corresponding onset and offset front. Finally, by choosing the desired segments, the rage of pitch frequency is determined. Systematic evaluation shows that the range of pitch frequency is estimated with good accuracy. Keywords- pitch frequency; onset/offset algorithm; modulation frequency domain I.

