## Pitch Extraction and Fundamental Frequency: History and Current Techniques (2003)

Citations: | 18 - 0 self |

### BibTeX

@TECHREPORT{Gerhard03pitchextraction,

author = {David Gerhard},

title = {Pitch Extraction and Fundamental Frequency: History and Current Techniques},

institution = {},

year = {2003}

}

### Years of Citing Articles

### OpenURL

### Abstract

Pitch extraction (also called fundamental frequency estimation) has been a popular topic in many fields of research since the age of computers. Yet in the course of some 50 years of study, current techniques are still not to a desired level of accuracy and robustness. When

### Citations

484 |
Auditory scene analysis
- Bregman
- 1990
(Show Context)
Citation Context ... perception of pitch. Pitch perception also changes with intensity, duration and other physical features of the waveform. There is some controversy as to how the human auditory system perceives pitch =-=[1, 18, 30]-=-. One group of people have traditionally used pure tone pitches to measure phenomena like critical bands, masking, and pitch perception. The other group of people use more complex tones to see how hum... |

282 | Construction and evaluation of a robust multifeature speech/music discriminator
- Scheirer, M
- 1997
(Show Context)
Citation Context ...ter [22]) It has since been shown that ZCR is an informative feature in and of itself, unrelated to how well it tracks f0. Many researchers have examined statistical features of the ZCR. For example, =-=[25]-=- uses the ZCR as a correlate of the spectral centroid, or balance point, of the waveform, which, unless the spectrum is bimodal, is often the location of most of the power in the waveform. If the spec... |

266 |
The Computer Music Tutorial
- Roads
- 1996
(Show Context)
Citation Context ... for was f0. The thought was that the ZCR should be directly related to the number of times the waveform repeated per unit time. It was soon made clear that there are problems with this measure of f0 =-=[22]-=-. If the spectral power of the waveform is concentrated around f0, then it will cross the zero line twice per cycle, as in Figure 5a. However, if the waveform contains higher-frequency spectral compon... |

132 |
Speech analysis; synthesis and perception
- Flanagan
- 1972
(Show Context)
Citation Context ...the disk appear stationary. 5.3 Cepstrum Analysis Cepstrum analysis is a form of spectral analysis where the output is the Fourier transform of the log of the magnitude spectrum of the input waveform =-=[9]-=-. This procedure was developed in an attempt to make a non-linear system more linear. Naturally occurring partials in a frequency spectrum are often slightly inharmonic, and the cepstrum attempts to m... |

71 |
Laws: Minutes from an Infinite
- Fractals, Power
- 1991
(Show Context)
Citation Context ...ental. Humans do this as well, and it is a result more of the signal itself than of the recognition algorithm. A period-k signal can become a period-2k signal through a process called period doubling =-=[26, 13]-=-. At the transition point, it is unclear whether it is appropriate to count the period as k or 2k. This transition point is unstable, so it is uncommon to hear signals of ambiguous pitch in nature. Ho... |

61 |
a fundamental frequency estimator for speech and music
- Yin
- 2002
(Show Context)
Citation Context ...form, but it reduces the robustness and increases the computational complexity to have the algorithm try to distinguish between “large” and “small” peaks. 4.2.1 The YIN Estimator The YIN f0 es=-=timator [3], d-=-eveloped by Alain de Cheveigné and Hideki Kawahara, is named after the oriental yin-yang philosophical principal of balance, representing this author’s attempts to balance between autocorrelation a... |

44 |
Tracking of partials for additive sound synthesis using Hidden Markov Models
- Depalle, Garcia, et al.
- 1993
(Show Context)
Citation Context ...ypothesis is a simple operation for each past time frame considered. A more involved comparison method is the use of hidden Markov models (HMMs), statistical models which track variables through time =-=[4]-=-. These models have been used for linguistics and circuit theory as well as f0 estimation. HMMs are state machines, with a hypothesis available for the output variable at each state. At each time fram... |

40 |
Sensation and Perception
- Coren, Porac, et al.
- 1978
(Show Context)
Citation Context .... The perceived pitch of a sinusoid increases with intensity 1swhen the sinusoid is above 3000 Hz, and a sinusoid with frequency below 2000 Hz is perceived to drop in pitch as the intensity increases =-=[2]-=-. It is important to note that these measurements of the differences between frequency and the perception were made on isolated sinusoids. Real-world sounds have many harmonics above the fundamental f... |

39 |
Spectral analysis and discrimination by zero-crossings
- Kedem
- 1986
(Show Context)
Citation Context ...er period in the waveform, such as a discontinuity in slope or amplitude, it may be identified and counted in the same way as the other methods. Zero-crossing rate (ZCR). Since it was made popular in =-=[15]-=-, the utility of the zero-crossing rate has often been in doubt, but lately it has been revived. Put simply, the ZCR is a measure of how often the waveform crosses zero per unit time. The idea is that... |

34 |
Fundamental frequency estimation and tracking using maximum likelihood harmonic matching and hmms
- Doval, Rodet
- 1993
(Show Context)
Citation Context ...doing what it does without knowing why or how. 13s6.2 Maximum Likelihood Estimators Boris Doval and Xavier Rodet have presented a series of papers on f0 estimation using maximum likelihood estimators =-=[6, 7]-=-. This statistical technique compares different variable value hypotheses based on the likelihood of their being correct in context with the past values of these variables. The intent is to recognize ... |

34 |
On the transcription of musical sound by computer
- Moorer
- 1977
(Show Context)
Citation Context ... peak lines up with the passband of a filter, the result is a higher value in the output of the filter than when the passband does not line up. 5.2.1 Optimum Comb Filter The optimum comb f0 estimator =-=[19]-=- is a robust but computationally intensive algorithm. A comb filter has many equally spaced pass-bands. In the case of the optimum comb filter algorithm, the location of the passbands are based on the... |

26 | Feature extraction and temporal segmentation of acoustic signals” [Online]. Paris : ICRAM. Available from: http://www.ircam.fr/equipes/analyse-synthese/rossigno/ icmc98/article6.html [Accessed 19.11.1999
- Rossignol, Rodet, et al.
(Show Context)
Citation Context ... spectral centroid is of fairly high frequency, it could mean that the signal is a fricative, or an unvoiced human speech phoneme. The ZCR has been used in the context of f0 estimation as recently as =-=[23]-=-, where the mean and the variance of the zero crossing rate were calculated to increase the robustness of a feature extractor. The feature is used to track the constancy of the f0 across time frames. ... |

24 |
Estimation of Fundamental Frequency of Musical Sound Signals
- Doval, Rodet
- 1991
(Show Context)
Citation Context ...doing what it does without knowing why or how. 13s6.2 Maximum Likelihood Estimators Boris Doval and Xavier Rodet have presented a series of papers on f0 estimation using maximum likelihood estimators =-=[6, 7]-=-. This statistical technique compares different variable value hypotheses based on the likelihood of their being correct in context with the past values of these variables. The intent is to recognize ... |

19 |
Predicting musical pitch from component frequency ratios
- Piszczalski, Galler
- 1979
(Show Context)
Citation Context ...al in this manner. 5.1 Component Frequency Ratios As early as 1979, Martin Piszczalski was working on a complete automatic music transcription system, the first step of which would be pitch detection =-=[20, 21]. -=-His system would extract the pitch of the signal (assuming that a single note was present at each point in time) and then find note boundaries, infer pitch key, and present a score. Piszczalski’s or... |

14 |
A neural network model for Pitch Perception
- Sano, Jenkins
- 1989
(Show Context)
Citation Context ...uld likely output a frequency hypothesis, which could then be translated to pitch. Another approach to using connectionist models for f0 estimation is the modeling of the human auditory system, as in =-=[24]-=-, where the authors present a neural network model based on the cochlear mechanisms of the human ear. Other neural network models could be based on the functioning of the neural pathways (although a g... |

13 |
Handbook for acoustic ecology
- Truax
- 1999
(Show Context)
Citation Context ...reinforce the sensation of the pitch, making the octave seem more “in-tune”. The more sine-like a waveform is, the more distinct the notion of frequency, but the less distinct the perception of pi=-=tch [29]-=-. This sensation also varies with the relationship between the partials. The more harmonically related the partials of a tone are, the more distinct the perception of pitch. Pitch perception also chan... |

10 |
A Computational Model for Music Transcription
- Piszczalski
- 1986
(Show Context)
Citation Context ...al in this manner. 5.1 Component Frequency Ratios As early as 1979, Martin Piszczalski was working on a complete automatic music transcription system, the first step of which would be pitch detection =-=[20, 21]. -=-His system would extract the pitch of the signal (assuming that a single note was present at each point in time) and then find note boundaries, infer pitch key, and present a score. Piszczalski’s or... |

9 |
Robust pitch determination using nonlinear state-space embedding
- Terez
(Show Context)
Citation Context ...presentations that include multi-dimensional phase space and pseudo-phase space representations, unless otherwise stated. For a more detailed discussion of a theoretical phase space f0 estimator, see =-=[11, 27, 28]-=-. 4.3.1 Phase Space and Frequency Any periodic signal forms a closed cycle in phase space, and the shape of the cyclic path depends on the harmonic composition of the signal. The f0 of a signal is rel... |

7 |
Improved musical pitch tracking using principal decomposition analysis
- Dorken, Nawab
- 1994
(Show Context)
Citation Context ... require that the fundamental frequency of the signal be present, and it works well with inharmonic partials and missing partials. Dorken and Nawab presented an improvement to Piszczalski’s method i=-=n [5]. Th-=-ey suggest “conditioning” the spectrum using a method they had previously used for principal decomposition analysis. This conditioning had the effect of identifying the frequency partials more acc... |

6 |
Chaos: Making a New Science
- Glieck
(Show Context)
Citation Context ...ental. Humans do this as well, and it is a result more of the signal itself than of the recognition algorithm. A period-k signal can become a period-2k signal through a process called period doubling =-=[26, 13]-=-. At the transition point, it is unclear whether it is appropriate to count the period as k or 2k. This transition point is unstable, so it is uncommon to hear signals of ambiguous pitch in nature. Ho... |

6 |
Pitch detection using a tunable IIR filter
- Lane
- 1990
(Show Context)
Citation Context ... filters that will have the same output amplitude, wherever a passband of the comb filter lines up with that fundamental. 5.2.2 Tunable IIR Filter A more recent filter-based f0 estimator suggested in =-=[16]-=-, this method consists of a narrow user-tunable band-pass filter, which is swept across the frequency spectrum. When the filter is in line with a strong frequency partial, a maximum output will be pre... |

5 | Audio visualization in phase space
- Gerhard
- 1999
(Show Context)
Citation Context ...presentations that include multi-dimensional phase space and pseudo-phase space representations, unless otherwise stated. For a more detailed discussion of a theoretical phase space f0 estimator, see =-=[11, 27, 28]-=-. 4.3.1 Phase Space and Frequency Any periodic signal forms a closed cycle in phase space, and the shape of the cyclic path depends on the harmonic composition of the signal. The f0 of a signal is rel... |

4 |
The multi-lag-window method for robust extended-range f0 determination
- Geoffriois
- 1996
(Show Context)
Citation Context ...signals, which are spectrally rich and have evenly spaced partials. 5.4 Multi-Resolution Methods An improvement that can be applied to any spectral f0 estimation method is to use multiple resolutions =-=[10]-=-. The idea is relatively simple: If the accuracy of a certain algorithm at a certain resolution is somewhat suspect, confirm or deny any f0 estimator hypothesis by using the same algorithm at a higher... |

4 |
Phase space representations of acoustical musical signals
- Gibiat
- 1988
(Show Context)
Citation Context ...story of a waveform in a way that makes repetitive cycles clear. The basic phase space representation is to plot the value of the waveform at time t versus the slope of the waveform at the same point =-=[12]. -=-A periodic signal should produce a repeating cycle in phase space, returning to a point with the same value and slope. Higher dimension phase space representations plot the value and n − 1 derivativ... |

3 |
Colea: A matlab software tool for speech analysis
- Loizou
(Show Context)
Citation Context ...ators For this report, four off-the-shelf f0 estimators are evaluated and compared. The first two f0 estimators are part of a speech analysis software package called Colea, developed by Philip Loizou =-=[17]-=- for the MATLAB programming environment. This package contains tools for analyzing speech using f0 estimation, formants, and spectral content. There are two f0 estimators built into this package, one ... |

2 |
Fundamental frequency estimation using signal embedding in state space
- Terez
- 2002
(Show Context)
Citation Context ...presentations that include multi-dimensional phase space and pseudo-phase space representations, unless otherwise stated. For a more detailed discussion of a theoretical phase space f0 estimator, see =-=[11, 27, 28]-=-. 4.3.1 Phase Space and Frequency Any periodic signal forms a closed cycle in phase space, and the shape of the cyclic path depends on the harmonic composition of the signal. The f0 of a signal is rel... |

2 |
Human Psychophysics
- Yost, Popper, et al.
- 1993
(Show Context)
Citation Context ... perception of pitch. Pitch perception also changes with intensity, duration and other physical features of the waveform. There is some controversy as to how the human auditory system perceives pitch =-=[1, 18, 30]-=-. One group of people have traditionally used pure tone pitches to measure phenomena like critical bands, masking, and pitch perception. The other group of people use more complex tones to see how hum... |