pitch detection with Harmonic Product Spectrum, how is it supposed to work? |
![]() ![]() |
pitch detection with Harmonic Product Spectrum, how is it supposed to work? |
Oct 13 2011, 23:23
Post
#1
|
|
![]() Group: Members Posts: 13 Joined: 28-December 10 Member No.: 86869 |
Hi,
I tried to implement the Harmonic Product Spectrum like it is described for instance in this Introduction to Signal Processing chapter. The issue I have is that the peak is always detected at the lower frequencies with the various music samples I tested. But I'm certainly doing something wrong, so I'll describe the process I've followed so far. First, the basis:
Those first steps are verified and OK, so I won't detail the implementation here. So now, concerning HPS: I first create a f0 histogram of length (N/2 + 1) / M, M being the number of downsampling - 1 (here, M=3). Each windows processing will increment the index of fundamental frequency found. Here is the code ran for each window: CODE for (i = 0; i < (N/2 + 1) / M; i++) { // multiply downsampled (M-1 times) magnitudes of length N/2 + 1 float mul = 1; for (n = 1; n <= M; n++) mul *= magnitude[i * n]; // update maximum magnitude and get its related frequency if (mul > max) max = mul, freq_id = i; } f0[freq_id]++; And at the end I pick the higher value in f0 in order to get the fundamental frequency of the whole song. But since the higher magnitudes are always in the lower frequencies, the HPS results (peak in freq_id=0) are to be expected. So the question is: how is that really supposed to work? |
|
|
|
Oct 14 2011, 21:27
Post
#2
|
|
|
Group: Members Posts: 2117 Joined: 24-August 07 From: Silicon Valley Member No.: 46454 |
QUOTE And at the end I pick the higher value in f0 in order to get the fundamental frequency of the whole song. But since the higher magnitudes are always in the lower frequencies, the HPS results (peak in freq_id=0) are to be expected. Sorry, I don't know what you mean by "fundamental frequency of the whole song"? I understand how the fundamental relates to a note or chord, but I don't know about a whole song... I would assume that means the lowest frequency in the song??? That might work for a solo instrument, but if you are analyzing a recording of a rock band, the "fundamental frequency" is probably the kick-drum. If you want to analyze the musical notes, you might need to filter-out (or ignore) the percussion. You might also need to ignore the attack and analyze the sustained part of the note/chord. This post has been edited by DVDdoug: Oct 14 2011, 21:28 |
|
|
|
Oct 15 2011, 09:09
Post
#3
|
|
![]() Group: Members Posts: 13 Joined: 28-December 10 Member No.: 86869 |
QUOTE And at the end I pick the higher value in f0 in order to get the fundamental frequency of the whole song. But since the higher magnitudes are always in the lower frequencies, the HPS results (peak in freq_id=0) are to be expected. Sorry, I don't know what you mean by "fundamental frequency of the whole song"? I understand how the fundamental relates to a note or chord, but I don't know about a whole song... I would assume that means the lowest frequency in the song???I am looking for the overall pitch of the song, so the histogram is here to count fundamental frequency of each window and grab the dominant one. That might work for a solo instrument, but if you are analyzing a recording of a rock band, the "fundamental frequency" is probably the kick-drum. If you want to analyze the musical notes, you might need to filter-out (or ignore) the percussion. You might also need to ignore the attack and analyze the sustained part of the note/chord. I'm looking for a way to extract the pitch of songs of any kind as best as possible, maybe HPS isn't what I need. Trying to filter-out some specific sounds might require a lot of heuristic I don't really want to deal with at first… If you have a few samples where HPS applies, I'm interested in them: I could check if at least the algorithm is implemented correctly and that my target (whole song instead of specific musical notes) is just wrong. Note that I'm kind of new to all of this so I'm certainly mixing up a bunch of things (you certainly have already noticed it). |
|
|
|
Oct 16 2011, 11:35
Post
#4
|
|
|
Group: Members Posts: 107 Joined: 3-April 09 Member No.: 68627 |
The issue I have is that the peak is always detected at the lower frequencies with the various music samples I tested. Maybe I'm wrong, but my guess would be you should apply some sort of equal loudness curve compensation to the spectrum. Also, the window size probably has to be optimized, maybe even dynamically optimized. Again, I can't tell you how exactly, but the word "autocorrelation" comes to mind. |
|
|
|
Oct 16 2011, 13:44
Post
#5
|
|
![]() Group: Members Posts: 13 Joined: 28-December 10 Member No.: 86869 |
The issue I have is that the peak is always detected at the lower frequencies with the various music samples I tested. Maybe I'm wrong, but my guess would be you should apply some sort of equal loudness curve compensation to the spectrum. Also, the window size probably has to be optimized, maybe even dynamically optimized. Again, I can't tell you how exactly, but the word "autocorrelation" comes to mind. I can't easily change the window size in the context of my app unfortunately. However, I started implementing the YIN method, and it seems much more efficient so I'll stick with that. It is "autocorrelation" based, so no spectrum comes into play, but results sound better. |
|
|
|
![]() ![]() |
|
Lo-Fi Version | Time is now: 23rd May 2013 - 19:45 |