mp3, m4a – how close to the original sound they are.

2007-08-20 22:23:25

All these started maybe a few months ago when I saw on a tracker a statement which sounded like this:
“We share (as an mp3 file) the best quality Armin van Buuren’s ASOT show even better than the DI.fm’s.”
It was intriguing, I downloaded the file checked the spectrum and saw … the band width (BW) was around 16 kHz. So, their source seemed to have been an FM radio broadcast and the recording has been encoded with Lame’s V0 option. I thought it was impossible for an mp3 having a BW of 19 kHz, as mp3 @192 kbps has it, to have worse quality than any other mp3, originating from the same source, with BW less than 19 kHz. The guys were not agreed, so I decided to use a simulation model and to check this.

First of all I decided to measure the distance between two sounds using the mean squared error:

One of the sounds, a reference, is an original while the second is some derived from it through filtering and encoding. With this metric I could say how close the two sounds are and to conclude that one which is closer to the original is better. So, I needed the .wav files to work with the sound samples.
I used as a test (reference sound) a CD rip with a full BW of 22.05 kHz. I filtered this sound through two filters having cut off frequencies of 19 kHz and 16 kHz. I encoded/decoded these two sounds and measured the differences between the results and the reference sound. Initially I thought to compare all the results to the test signal, but I couldn’t obtain well distinguished measurements. That is why I used a third filter with a cut off frequency of 22.05 kHz only to have the filter’s phase distortions in the reference test signal also.
The filters I used are FIR filters designed by Kaiser Window method with parameters: beta = 0.001 (attenuation of 60 dB in the stop band) and transition band of 250 Hz. I used a type I FIR filter with an order of 638 (639 coefficients).
Here is the model.

The distances (as sqrt(mse)) for the two band restricted sounds are: 110 for the 19.wav and 246 for the 16.wav.
To check the model, here are the spectra for the model sounds, where the spectra 22-19.wav, 11-16.wav, and 19-16.wav are for the corresponding difference signals.

I encoded the test signals using Lame 3.98 beta 5 with options:
-q0 –mj for all cases (the best possible results per bit),
-b and 192, 224, 256, 320 for the constant bit rate,
-Vx –lowpass YY.Y where: x = [0, 1, 2], and YY.Y = [19.5, 16.5] – for variable bit rate.
(I used –lowpass option to force the encoder not to determine its low pass filter, and this way to change the spectrum band width, according to the parameter Vx.
Here are the results (provided are SQRT(mse)) for the Lame 3.98 beta5 mp3 encoder.

What is easy to see is:
• When working at CBR the encoder distinguishes the BW difference only at 320 kbps.
• When working at VBR the encoder distinguishes the BW difference at bit rate higher than, let say, 210 kbps although the bit rate is not a parameter you can control directly.
(The lines corresponding to 110 and 246 are the differences between the reference 22 kHz signal and the 19 kH and the 16 kHz tests.)

As to the question “Is it possible for a 16 kHz BW sound to be closer to the original than an 18.9 kHz (mp3 @192 kbps)” yes in this model it is possible. To check this I encoded/decoded the TEST.wav and the 22.wav signals and their closeness to the original is 592 and 561. So, the parameters V0, V1 can generate a file from 16 kHz test file which is closer to the original.
AT this point I have to say: I was wrong when I have said this was impossible. Sorry for that. The guys, maybe, were right.
But, what about AAC @192 kbps? The 22.wav encoded @192 CBR has a distance of 432, possible for Lame @232 kbps VBR. The 22.wav encoded @192 VBR has a distance of 402, possible for Lame @320 kbps. So, the signal band width does essentially matter.

When I was done I decided to see what an AAC encoder (neroaacenc) is capable of.
Here are the results for both – the Lame and neroaacenc. The new information I added concerning Lame encoder is ABR results obtained for 19 kHz test signal. This option is close to the CBR option.
As it can be seen the neroaacenc’s results are closer to the originals at any bit rate for both CBR and VBR.
For curiosity only – Lame with a 19 kHz sound and Neroaacenc at CBR with 16 kHz sound have the same closeness only @320 kbps. Which encoder, do you think, is better?

mp3, m4a – how close to the original sound they are.

Reply #1 – 2007-08-20 22:32:21

Quote from: SpasV on 2007-08-20 22:23:25

First of all I decided to measure the distance between two sounds using the mean squared error:

One of the sounds, a reference, is an original while the second is some derived from it through filtering and encoding. With this metric I could say how close the two sounds are and to conclude that one which is closer to the original is better.

No. Your "mean squared error measure" has no relation to how we perceive sound. This has been discussed before. Your method is 100% useless.

mp3, m4a – how close to the original sound they are.

Reply #2 – 2007-08-20 23:16:51

What about mse = 0?
What about it is small?

mp3, m4a – how close to the original sound they are.

Reply #3 – 2007-08-20 23:23:15

Why would you type all that up without first looking into what you're measuring? That method is obviously worthless.

mp3, m4a – how close to the original sound they are.

Reply #4 – 2007-08-21 20:11:20

I do not share your opinions about the metric I used.
It is a metric to measure a distance between functions.
Its interpretation is up to your understanding.
It happened for these functions to represent the sound also.

mp3, m4a – how close to the original sound they are.

Reply #5 – 2007-08-22 01:51:50

You are still missing the point. Two functions can be mathematically very close but sound very different, and they can be mathematically very different but sound indistinguishable from one another. This has been rehashed many, many times. Do a search.

mp3, m4a – how close to the original sound they are.

Reply #6 – 2007-08-22 08:41:50

Quote from: SpasV on 2007-08-21 20:11:20

I do not share your opinions about the metric I used.
It is a metric to measure a distance between functions.
Its interpretation is up to your understanding.
It happened for these functions to represent the sound also.

Hi SpacV. People often come here with exactly the same metric and very similar analysis as you've just presented and they all get the same response. It's not a good metric because it makes no reference to psycho-acoustics. Your metric is blind to the ears differential sensitivity at different frequencies, it's blind to the fact that some sounds mask others, it's blind to the fact that some forms of distortion sound really bad and others don't. The metric is just too simple and is no substitute for actual listening tests.

More sophisticated metrics that do include a psycho-acoustic model can have a place, at least in the initial testing of an encoder, but even they are usually replaced with actually listening tests in the final tuning.

Notice