Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: objective evaluation of audio encoders (Read 7428 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

objective evaluation of audio encoders

I'm currently working on a project on audio encoder quality evaluation. To be more specific, I'm using peaq/eaqual with mp3 codecs. The initial goal is to map the nmr values(on a frame-by-frame basis) obtained from peaq and lame. The problem is that both th codes use different framesizes and different number of critical bands. How do I overcome this problem? Also, can anyone tell me the main differences in the psychoacoustic modelling done in peaq and lame? Plz. help.

objective evaluation of audio encoders

Reply #1
peaq/eaqual aren't codecs

I think it's impossible to answer this sufficiently without you being more specific about what you actually want to do.

The difference in critical bands for example is probably because the scalefactor bands in MP3 don't map directly to bark bands in psychoacoustics.

There's no way to know what to do with that without more information.

objective evaluation of audio encoders

Reply #2
Sorry....I'll be more specific and yes, I am aware that eaqual is not a codec  . Instead of describing the goal of my work, I think I'll just explain where exactly I'm stuck.

I take a WAV file, encode it with lame, decode it(with lame) and compare the output with the original file using peaq/eaqual. PEAQ/EAQUAL gives me ODG,NMR,etc. which is NOT what I'm currently interested in. I tap the noise and mask values(for each band over all frames) from the lame encoder and also from EAQUAL. Now, I want to compare the noise and mask values on a frame-by-frame basis. Here, I hit a roadblock :
1. lame uses a framesize of 1152 samples whereas eaqual uses 1024.
2. the psychoacoustic models used are different.

To tackle (1), my idea was to use pure tones/multitones so that the spectrum would become independent of the frame i consider. However, even for multitones, i observed that the mask plots obtained from the 2 sources were different. Only later did i notice that the psychoacoustic model used r different for both. I have also tried ISO-MP3 codec instead of lame only to get similar results. Moreover, for a tone, where mask values r expected to be constant over all frames, lame gives different mask values for different frames.

Hence, can anyone explain the above observations and also provide me with some info as to what kind of psycho model is used in lame and in eaqual.....especially the masking threshold calculation part.

objective evaluation of audio encoders

Reply #3
Quote
Hence, can anyone explain the above observations and also provide me with some info as to what kind of psycho model is used in lame and in eaqual.....especially the masking threshold calculation part.


Well, this brings up some interesting issues, I think.  I'd personally think that one would need a lot more time resolution than either "every 1152 samples" or a 1/2 overlap-added version, in order to capture things like coarticulation unmasking and pre-echo, which are some of the most lame (not the codec) artifacts out there.

Personally, I haven't been very satisfied with automatic evaluation algorithms. I am aware that they do match the output of some of the various tests pretty well, but I fear greatly the outliers. I also fear greatly what happens when we introduce binaural hearing.
-----
J. D. (jj) Johnston

objective evaluation of audio encoders

Reply #4
Quote
Hence, can anyone explain the above observations and also provide me with some info as to what kind of psycho model is used in lame and in eaqual.....especially the masking threshold calculation part.


Well, this brings up some interesting issues, I think.  I'd personally think that one would need a lot more time resolution than either "every 1152 samples" or a 1/2 overlap-added version, in order to capture things like coarticulation unmasking and pre-echo, which are some of the most lame (not the codec) artifacts out there.

Personally, I haven't been very satisfied with automatic evaluation algorithms. I am aware that they do match the output of some of the various tests pretty well, but I fear greatly the outliers. I also fear greatly what happens when we introduce binaural hearing.


Where did you get the source code for PEAQ?

objective evaluation of audio encoders

Reply #5
EAQUAL is one of 2 publicly available implementations of basic model of PEAQ. Other one from McGill is much faster.
PEAQ is described in ITU-R BS.1387, which can be downloaded from ITU site. There is also "An Examination and Interpretation of ITU-R BS.1387" from McGill.
Lame psychoacoustic model you can directly analyze in Lame source code.

I agree with Woodinville that there are a lot of problems with many outliers, but automatic evaluation algorithms can perform very well in predicting average codec score from a subjective listening test.

objective evaluation of audio encoders

Reply #6
... Here, I hit a roadblock :
1. lame uses a framesize of 1152 samples whereas eaqual uses 1024.

I may be waaaaay off base here (after all, I've never tried to do what you are attempting!), but couldn't you treat every eight LAME frames as a sequence of nine EAQUAL frames, to get an equivalent number of samples for analysis?

    - M.

objective evaluation of audio encoders

Reply #7
I may be waaaaay off base here (after all, I've never tried to do what you are attempting!), but couldn't you treat every eight LAME frames as a sequence of nine EAQUAL frames, to get an equivalent number of samples for analysis?


Basicaly this is a good idea, but you also need to figure out how to map those 8 values from Lame into 9 PEAQ values.