Help - Search - Members - Calendar
Full Version: mfcc calculation
Hydrogenaudio Forums > Hydrogenaudio Forum > Scientific Discussion
eDSP
Hi everyone. I'm new here, so apologies if this is the wrong forum to post in. I'm working on a project which groups similar MP3 files. As a start, I'd like to find the MFCC of each mp3 file. I'm using Fmod (although, I could use Bass if anyone has examples for that) and C#. I need some help calculating it. Hopefully someone can guide me here. Here's what I got from Wikipedia, and my psuedocode for it:
1) Take the Fourier transform of (a windowed excerpt of) a signal.
Code:
I use getSpectrum with the FMOD_DSP_FFT_WINDOW_TRIANGLE parameter

2)Map the powers of the spectrum obtained above onto the mel scale, using triangular overlapping windows.

I go through the spectrum array and use this equation on each value:
Code:
mel = 1127.01048log e (1+f/700)


Take the logs of the powers at each of the mel frequencies.
I then go through the new mel-array and take the log of each :
Code:
mLog[i] = (Math.log(melArray[i]))


Take the discrete cosine transform of the list of mel log powers, as if it were a signal. (Then find amplitude of DCT result)
I'm not sure what I do here. How do I calculate the DCT?

I hope I'm on the right path there?
Thanks for any help. smile.gif
Martel

I guess you want this and simply imagine that n is i and xn is mLog[i].
I know that cepstrum works well to discriminate human voice but I'm not really convinced that it will work well for complex material such as music.

IIRC, you apply overlapping triangle windows not in time domain but in frequency domain (for each frame, you have typically 512/1024/2048... spectrum samples which is too much so you basically split spectrum into bands each covering multiple spectrum samples and for each such subband, you calculate a single energy value using triangular weighted average of the samples from that subband). This way, you throw out a lot of information but still keep some "fingerprint" of the frame's content. I would also use Hanning or Hamming window in the time domain instead of triangular.
I think that you also need to calculate energy spectrum after FFT which should be like taking the real and imaginary part of each freq. sample and summing their squares (or you can do this while calculating the triangular weighted averages).
I have done this quite long ago so I don't guarantee that I'm not talking nonsense. smile.gif
This is a "lo-fi" version of our main content. To view the full version with more information, formatting and images, please click here.
Invision Power Board © 2001-2009 Invision Power Services, Inc.