Help - Search - Members - Calendar
Full Version: objective evaluation of audio encoders
Hydrogenaudio Forums > Hydrogenaudio Forum > Listening Tests
dsp_cuppa
I'm currently working on a project on audio encoder quality evaluation. To be more specific, I'm using peaq/eaqual with mp3 codecs. The initial goal is to map the nmr values(on a frame-by-frame basis) obtained from peaq and lame. The problem is that both th codes use different framesizes and different number of critical bands. How do I overcome this problem? Also, can anyone tell me the main differences in the psychoacoustic modelling done in peaq and lame? Plz. help.
Garf
peaq/eaqual aren't codecs

I think it's impossible to answer this sufficiently without you being more specific about what you actually want to do.

The difference in critical bands for example is probably because the scalefactor bands in MP3 don't map directly to bark bands in psychoacoustics.

There's no way to know what to do with that without more information.
dsp_cuppa
Sorry....I'll be more specific and yes, I am aware that eaqual is not a codec smile.gif . Instead of describing the goal of my work, I think I'll just explain where exactly I'm stuck.

I take a WAV file, encode it with lame, decode it(with lame) and compare the output with the original file using peaq/eaqual. PEAQ/EAQUAL gives me ODG,NMR,etc. which is NOT what I'm currently interested in. I tap the noise and mask values(for each band over all frames) from the lame encoder and also from EAQUAL. Now, I want to compare the noise and mask values on a frame-by-frame basis. Here, I hit a roadblock :
1. lame uses a framesize of 1152 samples whereas eaqual uses 1024.
2. the psychoacoustic models used are different.

To tackle (1), my idea was to use pure tones/multitones so that the spectrum would become independent of the frame i consider. However, even for multitones, i observed that the mask plots obtained from the 2 sources were different. Only later did i notice that the psychoacoustic model used r different for both. I have also tried ISO-MP3 codec instead of lame only to get similar results. Moreover, for a tone, where mask values r expected to be constant over all frames, lame gives different mask values for different frames.

Hence, can anyone explain the above observations and also provide me with some info as to what kind of psycho model is used in lame and in eaqual.....especially the masking threshold calculation part.

Woodinville
QUOTE(dsp_cuppa @ Jun 15 2005, 02:45 AM)
Hence, can anyone explain the above observations and also provide me with some info as to what kind of psycho model is used in lame and in eaqual.....especially the masking threshold calculation part.


Well, this brings up some interesting issues, I think. I'd personally think that one would need a lot more time resolution than either "every 1152 samples" or a 1/2 overlap-added version, in order to capture things like coarticulation unmasking and pre-echo, which are some of the most lame (not the codec) artifacts out there.

Personally, I haven't been very satisfied with automatic evaluation algorithms. I am aware that they do match the output of some of the various tests pretty well, but I fear greatly the outliers. I also fear greatly what happens when we introduce binaural hearing.
Futum
QUOTE(Woodinville @ Jun 17 2005, 16:56) *

QUOTE(dsp_cuppa @ Jun 15 2005, 02:45 AM)
Hence, can anyone explain the above observations and also provide me with some info as to what kind of psycho model is used in lame and in eaqual.....especially the masking threshold calculation part.


Well, this brings up some interesting issues, I think. I'd personally think that one would need a lot more time resolution than either "every 1152 samples" or a 1/2 overlap-added version, in order to capture things like coarticulation unmasking and pre-echo, which are some of the most lame (not the codec) artifacts out there.

Personally, I haven't been very satisfied with automatic evaluation algorithms. I am aware that they do match the output of some of the various tests pretty well, but I fear greatly the outliers. I also fear greatly what happens when we introduce binaural hearing.


Where did you get the source code for PEAQ?

muaddib
EAQUAL is one of 2 publicly available implementations of basic model of PEAQ. Other one from McGill is much faster.
PEAQ is described in ITU-R BS.1387, which can be downloaded from ITU site. There is also "An Examination and Interpretation of ITU-R BS.1387" from McGill.
Lame psychoacoustic model you can directly analyze in Lame source code.

I agree with Woodinville that there are a lot of problems with many outliers, but automatic evaluation algorithms can perform very well in predicting average codec score from a subjective listening test.
M
QUOTE(dsp_cuppa @ Jun 15 2005, 05:45) *

... Here, I hit a roadblock :
1. lame uses a framesize of 1152 samples whereas eaqual uses 1024.

I may be waaaaay off base here (after all, I've never tried to do what you are attempting!), but couldn't you treat every eight LAME frames as a sequence of nine EAQUAL frames, to get an equivalent number of samples for analysis?

- M.
muaddib
QUOTE(M @ Apr 20 2007, 13:13) *

I may be waaaaay off base here (after all, I've never tried to do what you are attempting!), but couldn't you treat every eight LAME frames as a sequence of nine EAQUAL frames, to get an equivalent number of samples for analysis?


Basicaly this is a good idea, but you also need to figure out how to map those 8 values from Lame into 9 PEAQ values.
This is a "lo-fi" version of our main content. To view the full version with more information, formatting and images, please click here.
Invision Power Board © 2001-2008 Invision Power Services, Inc.