Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: FFT averaging (windowed sampling) (Read 11889 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

FFT averaging (windowed sampling)

Hello, greetings from a newbie DSP'er   

I am looking for some advice from you more experienced fellows on how to proceed,

I am re-implementing a sound processing algorithm that requires a stream of real audio data to be framed into overlapping windows.

The frame length is 2048 samples and the frame hop size for each iteration is just 64 samples (the overlap factor is huge!).

At each iteration, I'm required to compute the PSD (power spectral density) by means of a fast fourier transform. (The frame is weighted by a hamming window before the FFT computation)

The algorithm description I possess doesn't state exactly how I'm supposed to average or weight the overlapping FFT results together.

I've spent almost two days researching the net, and I have found sources that state that the FFT results should be averaged, while others state that the overlapping results should simply be summed.  (As far as I understood, academically this sort of algorithm is known as Short Time Fourier Transform).

Currently, here's what I'm doing, how does this sound to you?

I'm keeping a history record with the sum of the overlapping FFT results contributing to an iteration result (I visualize 64). For each iteration I add the ouptut bin values of the iteration's current FFT to this array, and subtract the values of the oldest FFT (the one that is leaving the overlap area).

Then, before I compute the PSD I average the real and imaginary parts by dividing them by 64. My intuition tells me that perhaps I should be weighting the results by the overlap percentage (which I can visualise in the time domain), but I'm not sure how to perform this weighting in the frequency domain.

Any input is greatly appreciated!

hippIguana

FFT averaging (windowed sampling)

Reply #1
Quote
The frame length is 2048 samples and the frame hop size for each iteration is just 64 samples (the overlap factor is huge!).
[a href="index.php?act=findpost&pid=254215"][{POST_SNAPBACK}][/a]

Yes, it is.  (Do you have a good reason for such a low 'hop' ?)

Quote
The algorithm description I possess doesn't state exactly how I'm supposed to average or weight the overlapping FFT results together.
[a href="index.php?act=findpost&pid=254215"][{POST_SNAPBACK}][/a]

Depends on what it's supposed to do actually.

Quote
I've spent almost two days researching the net, and I have found sources that state that the FFT results should be averaged, while others state that the overlapping results should simply be summed.  (As far as I understood, academically this sort of algorithm is known as Short Time Fourier Transform).
[a href="index.php?act=findpost&pid=254215"][{POST_SNAPBACK}][/a]

The only difference of these methods is: The first computes the average energy while the second computes a multiple of the total energy ('multiple' due to the low hop). These results will only differ by a certain factor.

Quote
Currently, here's what I'm doing, how does this sound to you?

I'm keeping a history record with the sum of the overlapping FFT results contributing to an iteration result (I visualize 64). For each iteration I add the ouptut bin values of the iteration's current FFT to this array, and subtract the values of the oldest FFT (the one that is leaving the overlap area).
[a href="index.php?act=findpost&pid=254215"][{POST_SNAPBACK}][/a]

Now I'm pretty sure I did not understand what you mean by 'hop'. It does not seem to make sense for me.

Quote
Then, before I compute the PSD I average the real and imaginary parts by dividing them by 64. My intuition tells me that perhaps I should be weighting the results by the overlap percentage (which I can visualise in the time domain), but I'm not sure how to perform this weighting in the frequency domain.
[a href="index.php?act=findpost&pid=254215"][{POST_SNAPBACK}][/a]

Why do you do that ? (Averaging the real and imaginary parts)
Doesn't seem to make sense for me.

Just tell us what this algorithm is supposed to do. A Cool Edit-like spectral view perhaps ?


SebastianG

FFT averaging (windowed sampling)

Reply #2
Hi Sebastian, thanks for your reply!

I'm reimplementing an audio recognition solution.

Quote
Yes, it is.  (Do you have a good reason for such a low 'hop' ?)


The huge overlap is stated on an academic document I found that describes the algorithm, and according to it, the reason for it is to reduce misalignment between the frames stored in the database and the frames being sampled in real-time.  The documentation is definitely geared towards people experienced in DSP and fluent in math (I'm neither), so it is somewhat vague (at least for me)   

Quote
The only difference of these methods is: The first computes the average energy while the second computes a multiple of the total energy ('multiple' due to the low hop). These results will only differ by a certain factor.


Thanks for the information!  Since it is a linear operation, I don't think it is relevant. ( the algorithm works either with pure addition or averaging, exhibiting the behavior I'm describing below)

Quote
Why do you do that ? (Averaging the real and imaginary parts)
Doesn't seem to make sense for me.

I found the following power point presentation Fourier Analysis  (regarding FFT where it names the process of averaging the real and imaginary parts independently as "coherent averaging",
It sounded better to me than "incoherent averaging" 

I think I have a clue regarding what is happening.  Please tell me what you think:

The program is almost working, but it is presenting a strange behavior which I have definitely traced to frame misalignment between the database frames and the frames taken from the realtime audio stream:

Depending on how aligned the frames being compared are, I am getting the following periodic recognition patterns  (1 being a positive recognition of a frame group belonging to a previously stored audio stream)


1 1 1 1 1 1 ....    ideal case
1 0 1 0 1 0 ....      \
1 1 0 1 1 0 ....      most common cases
1 0 0 1 0 0 ....      /
0 0 0 0 0 0 ...      worst case 



The overlapping frames can be misaligned by  a range of 0 to 31 samples, the worst case happens when the misalignment is between 15 or 16 samples.  I think that despite the huge overlap in the windows, I am somehow introducing aliasing into the framing process.  ( I saw a picture describing aliasing by a sampling frequency that rendered a sine wave as a straight line - my intuition tells me I'm doing something like that) 

As an additional info, there is an inner loop which is driving the recognition algorithm.  A match can be declared during any iteration of this loop. The iteration indexes at which positive ID occur are periodic.

Let's say that an audiostream is giving the following recognition pattern
 
1 1 0 1 1 0 1 1 ...

The indexes at which positive ID recognition occurs might look like:

223,    97  ,  NA,    223,    97,    NA    224,    97,    NA


The behavior is exactly the same even with my "coherently incoherent" averaging
being applied to the overlapping FFTs. 

Could you give me any thoughts on this behavior? 

I really appreciate it!

hippi

FFT averaging (windowed sampling)

Reply #3
Quote
I found the following power point presentation Fourier Analysis  (regarding FFT where it names the process of averaging the real and imaginary parts independently as "coherent averaging",
It sounded better to me than "incoherent averaging"  
[a href="index.php?act=findpost&pid=254374"][{POST_SNAPBACK}][/a]


Interesting. A few comments:

She mentioned "Hamming" and "Hanning" windows. There is no "Hanning" window. It's "Hann". Unfortunately many people out there do the same mistake.

On page 12 (incoherent averaging) she averages absolute DFT bins and talks about averaging the power whereas averaging power is equivalent to averaging the squares of the DFT bins ( |X|^2 = (re(X))^2 + (im(X))^2 )

I'm totally non-convinced about the coherent averaging thing. IMHO this is bull pie.

Quote
I think I have a clue regarding what is happening.  Please tell me what you think:

The program is almost working, but it is presenting a strange behavior which I have definitely traced to frame misalignment between the database frames and the frames taken from the realtime audio stream:

(...)

The overlapping frames can be misaligned by  a range of 0 to 31 samples, the worst case happens when the misalignment is between 15 or 16 samples.  I think that despite the huge overlap in the windows, I am somehow introducing aliasing into the framing process.  ( I saw a picture describing aliasing by a sampling frequency that rendered a sine wave as a straight line - my intuition tells me I'm doing something like that)  
[a href="index.php?act=findpost&pid=254374"][{POST_SNAPBACK}][/a]


This is no surprise since a delay is an LTI system which affects the phase of all frequencies (the complex DFT bins get rotated around zero). You should be comparing absolute values instead of real and imaginary parts separately. Also this 'coherent' averaging thing does not seem to make sense. I may be wrong on this.

Quote
As an additional info, there is an inner loop which is driving the recognition algorithm.  A match can be declared during any iteration of this loop. The iteration indexes at which positive ID occur are periodic.

(...)

Could you give me any thoughts on this behavior? 

I really appreciate it!
[a href="index.php?act=findpost&pid=254374"][{POST_SNAPBACK}][/a]


I'm sorry, but I'm not that experienced in the audio recognition area. I guess it also depends strongly on what kind of audio signals you are working on... For speech for example it's NOT important what the exact pitch is. The only things that are importent (I guess) are:
1) Is the current time segment voiced or unvoiced ?
2) spectral envelope

I once successfully coded an experimental pitch-independant vowel classifier. My neighbours must have thought I was going nuts  I recorded my voice in real time (saying "eeeehh" and "ooohh") ran an FFT, averaged the squares of some DFT bins within a certain audio band to compute a small "spectral envelope vector" and compared it to some spectral vectors I previosly recorded.

Well, it did work, but I lost interest in doing further R/D in this sector. This is really tough.
BTW: AFAIK IBM made its speech recognition software open sourse. (?)


SebastianG

edit: typo & shortened quotations

FFT averaging (windowed sampling)

Reply #4
I'm no DSP guy at all, so take what I say with care. My FFT experience was over 10 years back.

I have only 2 suggestions.
If you do not need phase information, then convert to PSD early and work with PSD. That hides all the phase shift issues of moving window completely. (a la waterfall analysis)
So perhaps you should reorder your algoritm to go to PSD before averaging, ie. incoherent averaging. (Background noise fluctuations reduced)

If you have realtime linear signal and window sliding over it, you might want to investigate (weighted) moving-average. After initial startup, your set of moving averages would be describing instantaneous average of last N samples. This avoids sudden changes yet tracks changing input well and you have averaged data always handy.

Neither suggestion might be applicable to you, but you asked for any input 
It really really did sound different. Not in a placebo way.

FFT averaging (windowed sampling)

Reply #5
wimms, Sebastian:

Thanks a lot for your input!

I finally got it working reliably!!     

The problem turned to be so simple!  (it seems no DFT averaging was required at all).

The core of the algorithm is working with energy differences obtained from the PSD.  A different program flow is taken when one of those differences is less than zero.

When the frames were misaligned, there were very small differences that approached zero.  I was very naive as I was comparing exactly with zero.
When I clamped this ultra-small values to zero, everything began working smoothly.

Now I get a 1 -1 -1 -1 recognition pattern consistently.
Well, sometimes, after a lot of trials, I can make  a 1 - 1 - 0 pattern appear (that usually corrects itself after a few iterations). 

I'm going to try the moving average filter as you suggest, wimms, to see if I can make this tiny glitch completely go away.

Thanks again for your input!

hippi

FFT averaging (windowed sampling)

Reply #6
Quote
Quote
I found the following power point presentation Fourier Analysis  (regarding FFT where it names the process of averaging the real and imaginary parts independently as "coherent averaging",
It sounded better to me than "incoherent averaging"  
[a href="index.php?act=findpost&pid=254374"][{POST_SNAPBACK}][/a]


Interesting. A few comments:

She mentioned "Hamming" and "Hanning" windows. There is no "Hanning" window. It's "Hann". Unfortunately many people out there do the same mistake.
[a href="index.php?act=findpost&pid=254523"][{POST_SNAPBACK}][/a]


Just one comment on your comments: Hanning windows do exist. They are basically Hann windows without zero-weighted samples at the beginning and end of the window.

FFT averaging (windowed sampling)

Reply #7
Quote
Just one comment on your comments: Hanning windows do exist. They are basically Hann windows without zero-weighted samples at the beginning and end of the window.
[a href="index.php?act=findpost&pid=267411"][{POST_SNAPBACK}][/a]


Duh! You mean Hamming windows - with double m in the middle.
There are those 2 guys: Hamming and Julius von Hann.
A Hamming window is given by 0.54+0.46*cos(x*pi)
A Hann window is given by 0.5+0.5*cos(x*pi)

Unfortunately the Hann window is often called Hanning window although the guy's name who came up with it is Julius von Hann.


SebastianG

FFT averaging (windowed sampling)

Reply #8
Quote
Quote
Just one comment on your comments: Hanning windows do exist. They are basically Hann windows without zero-weighted samples at the beginning and end of the window.
[a href="index.php?act=findpost&pid=267411"][{POST_SNAPBACK}][/a]


Duh! You mean Hamming windows - with double m in the middle.
[a href="index.php?act=findpost&pid=267439"][{POST_SNAPBACK}][/a]


No I don't 

There's a small difference between Hann and Hanning windows. For discrete time n and total number of smaples N, Hann follows the equation:

Hann(n) = 0.5 - 0.5*cos(2*pi*(n-1)/(N-1))

and Hanning:

Hanning(n) = 0.5 - 0.5*cos(2*pi*n/(N+1))

With n ranging from 1 to N. The difference is subtle, but it's there. The Hamming equation is:

Hamming(n) = 0.54 - 0.46*cos(2*pi*(n-1)/(N-1))

* EDIT * As you can see from the equations, the only difference between Hann and Hanning, is the fact that Hann windows are forced to zero at the outher edges.

FFT averaging (windowed sampling)

Reply #9
Wow. Obviously there are quite some contradicotry definitions / opinions. I personally find it ridiculous to make a distinction for such a small difference in the discrete case for the same continuous function.

BTW: I often use
Code: [Select]
window = 0.5-0.5*cos(((1:n)-0.5)./n*2*pi);

(Matlab code)

According to your definitions this would neither be a Hann nor a Hanning window.



SebastianG

FFT averaging (windowed sampling)

Reply #10
Quote
Wow. Obviously there are quite some contradicotry definitions / opinions. I personally find it ridiculous to make a distinction for such a small difference in the discrete case for the same continuous function.

BTW: I often use
Code: [Select]
window = 0.5-0.5*cos(((1:n)-0.5)./n*2*pi);

(Matlab code)

According to your definitions this would neither be a Hann nor a Hanning window.



SebastianG
[a href="index.php?act=findpost&pid=267498"][{POST_SNAPBACK}][/a]


I agree with you that it doesn't really matter, because the difference is so small (certainly for windows with a large number of samples). However, the equations I wrote down are copied out of the "official" Matlab v6 scripts hann.m, hanning.m and hamming.m 

FFT averaging (windowed sampling)

Reply #11
Quote
However, the equations I wrote down are copied out of the "official" Matlab v6 scripts hann.m, hanning.m and hamming.m 
[a href="index.php?act=findpost&pid=267500"][{POST_SNAPBACK}][/a]


Yeah, the existing of MatLab code for Hann and Hanning is finally convincing me.
Thanks for this enlightment. I was so sure about "Hanning" only being a naming mistake . I'm sorry for my rather inpolite first reaction.


SebastianG

FFT averaging (windowed sampling)

Reply #12
Quote
Quote
However, the equations I wrote down are copied out of the "official" Matlab v6 scripts hann.m, hanning.m and hamming.m 
[a href="index.php?act=findpost&pid=267500"][{POST_SNAPBACK}][/a]


Yeah, the existing of MatLab code for Hann and Hanning is finally convincing me.
Thanks for this enlightment. I was so sure about "Hanning" only being a naming mistake . I'm sorry for my rather inpolite first reaction.


SebastianG
[a href="index.php?act=findpost&pid=267694"][{POST_SNAPBACK}][/a]


Nevermind 

It's just that DSP is kind of what I do at the moment and I probably on avarage use thousands of hanning windows each day, so I was rather surprised when I read your statement that they don't exist 

Those Matlab functions come in very handy if you need a window quick, by the way.