Help - Search - Members - Calendar
Full Version: LAME 3.96b1 vs. 3.90.3 Discussion
Hydrogenaudio Forums > Lossy Audio Compression > MP3 > MP3 - General
tigre
This thread has been split from a lame 3.90.3 vs. 3.96b1 test thread. It's for discussion about LAME 3.96b1 vs. 3.90.3, A new 'recommended version' for HA.org?! only. This thread is containing discussion related to the test too, so before posting here, you might want to read it.
emtee
Shouldn't we wait for the developers to improve some of those regression examples? That topic had a lot of replies, and i'm sure the feedback will be useful to improve the codec. Maybe we should wait for a 3.96 beta 2 with some bugfixes before comparing it with the mighty 3.90.3, otherwise the regression thread was in vain.
john33
I believe I'm correct in saying that 3.96 is scheduled to go final at the end of this month and that any further commits will be bugfixes only and not any further tuning, etc., at this stage. Gabriel, correct me if I'm wrong. wink.gif
dev0
Testing has to be done anyway and if another version is released meanwhile another testing thread will be opened, it's as simple as that.
amano
One problem is that with 3.96b bitrates aren't comparable with 3.90.3 anymore. With --preset 128 the difference between 116 and 138 kbps is quite significant. With --preset standard the differences are much bigger, up to 50 kbps.

Isn't there any chance for gabriel and the other devs to tune the presets to be comparable in size (at least the target kbps presets).

Other way I forsee the result that all samples are significantly smaller, but tend to sound worse. Then the results don't tell anything because the quality/size ratio is what counts and can only be guessed that way.
dev0
I assumed that --preset standard is still ment to be transparent on 99% of all samples, just like 3.90.3 --alt-preset standard.
The bitrate is a rather uninteresting factor here.
kwanbis
QUOTE (dev0 @ Mar 19 2004, 04:13 PM)
I assumed that --preset standard is still ment to be transparent on 99% of all samples, just like 3.90.3 --alt-preset standard.
The bitrate is a rather uniniteresting factor here.

totally agree on that one wink.gif
evereux
QUOTE (dev0 @ Mar 19 2004, 04:13 PM)
I assumed that --preset standard is still ment to be transparent on 99% of all samples, just like 3.90.3 --alt-preset standard.
The bitrate is a rather uninteresting factor here.

I whole-heartedly dissagree. I don't think it's a good idea to have a new recommended Lame version that acheives transparency at the cost of file size (if that is indeed what happens on average) in comparison to an encoder that acheives transparency with a smaller file. Especially when that Lame version has many samples where it has regressed from 3.90.3 even in it's wrongly compiled state?
dev0
If it saves bitrate that's a nice bonus, but the new recommended version should be at least as transparent using --preset standard as 3.90.3 is now.
Sacrificing quality for bitrate is not an option!
Fede
It's really progression to compress with the same sound quality at higher bitrate? It's progression to reach the same quality we can have @192kbps with the 3.90.3 at 256 kbps with another encoder???? If the answer is yes I think there's many other encoder working good.
We are talking about reach the best quality @ best size, if size isn't a priority go for lossless.

Just my 2 cents.
evereux
I see some samples are being tested with 3.90.2 (the user hasn't stated whether -Z was used) and not 3.90.3. Whilst this goes against the test requirements I actually think this is a better idea since we have little idea of what effect the compilation issues has had on 3.90.3 (I'm referring to this discussion).
[proxima]
QUOTE (evereux @ Mar 21 2004, 02:26 PM)
I see some samples are being tested with 3.90.2 (the user hasn't stated whether -Z was used) and not 3.90.3.

if you are referring to me, i tested 3.90.2 only with abr 128 presets, so -Z is not included in 3.90.3.
I've only tested rebel.wav with --aps -Z and 3.90.2 and i specified this in my post.
QUOTE
Whilst this goes against the test requirements I actually think this is a better idea since we have little idea of what effect the compilation issues has had on 3.90.3

I think compilation issues should be very small since john33 compile include the same switches used by Dibrom. Nevertheless i agree with you, this is the reason because i tested 3.90.2 (with -Z when needed) wink.gif
tigre
Some clarifications about ABX tests in this test:

Especially at low bitrates like ~128kbps many artifacts are so obvious that ABXing appears as waste of time, but even if it doesn't give the tester any additional security about the result, ...
- it gives security to others that a result is valid AND
- you can be sure that artifacts in one encoded version are really different from the ones in another version if you ABX two encoded versions against each other

To prove that one encoded version is worse than the other in such obvious cases, it should be enough to ABX
- Original vs. worse sounding encoded version
- better sounding vs. worse sounding encoded version

ABC/HR results without ABXing (especially the encoded versions that seem to be different against each other) can't be used for public tests like this because we need at least a minimum security (=ABX results) that the individual results of the participants are reliable. It would be different, if a big number of participants submitted ABC/HR results of the same sample(s). In this case the results could be evaluated using a statistical analysis similar to rjamorim's tests.
Jojo
QUOTE (evereux @ Mar 21 2004, 05:26 AM)
I see some samples are being tested with 3.90.2 (the user hasn't stated whether -Z was used) and not 3.90.3. Whilst this goes against the test requirements I actually think this is a better idea since we have little idea of what effect the compilation issues has had on 3.90.3 (I'm referring to this discussion).

I agree to that in some extend. If the goal was just to get the closest to transparent the recommended setting should be --preset 320
But then, --preset 128 offers the best ratio of quality and size...so it's a really tricky question. However, I use LAME 3.95.1 since it uses lower bitrates than LAME 3.92 or LAME 3.90.3 in most cases. If Lame 3.96 is found to be just as good as LAME 3.90.3 it should be recommended since it's faster and uses as lower bitrate.
Jojo
QUOTE (dev0 @ Mar 19 2004, 08:53 AM)
If it saves bitrate that's a nice bonus, but the new recommended version should be at least as transparent using --preset standard as 3.90.3 is now.
Sacrificing quality for bitrate is not an option!

Why isn't --preset 320 recommended then laugh.gif
PowerMacG4
QUOTE (Jojo @ Mar 21 2004, 08:08 AM)
QUOTE (dev0 @ Mar 19 2004, 08:53 AM)
If it saves bitrate that's a nice bonus, but the new recommended version should be at least as transparent using --preset standard as 3.90.3 is now.
Sacrificing quality for bitrate is not an option!

Why isn't --preset 320 recommended then laugh.gif

Because "--preset-insane" wastes bitrate without much gain in quality at all. --preset-standard is designed to achieve transparency on a vast majority of samples at the lowest bitrate possible. Transparency doesn't mean "kinda close". It means "transparent".
evereux
QUOTE (PowerMacG4 @ Mar 21 2004, 04:15 PM)
QUOTE (Jojo @ Mar 21 2004, 08:08 AM)
QUOTE (dev0 @ Mar 19 2004, 08:53 AM)
If it saves bitrate that's a nice bonus, but the new recommended version should be at least as transparent using --preset standard as 3.90.3 is now.
Sacrificing quality for bitrate is not an option!

Why isn't --preset 320 recommended then laugh.gif

Because "--preset-insane" wastes bitrate without much gain in quality at all. --preset-standard is designed to achieve transparency on a vast majority of samples at the lowest bitrate possible. Transparency doesn't mean "kinda close". It means "transparent".

His point was, when do we draw the line and say that these bit-rates are becoming too inflated for a standard preset? Transparency in the true sense of the word will never be reached.

That being said, I'm in the process of encoding around 20GBs of wav files and the file size increase doesn't look alarming at all. I'll be more specific when the encoding is complete.
evereux
QUOTE ([proxima)
,Mar 21 2004, 01:37 PM]
QUOTE (evereux @ Mar 21 2004, 02:26 PM)
I see some samples are being tested with 3.90.2 (the user hasn't stated whether -Z was used) and not 3.90.3.

if you are referring to me, i tested 3.90.2 only with abr 128 presets, so -Z is not included in 3.90.3.
I've only tested rebel.wav with --aps -Z and 3.90.2 and i specified this in my post.

You are quite right, my apologies. smile.gif
evereux
I've encoded 22.4GB of wav files using the LAME versions 3.96b1 (--preset standard) and 3.90.3 (--alt-preset standard). The resultant directory sizes are as follows:

3.90.3 = 3.24GB
3.96b1 = 2.99GB

I can provide an album list if required (I first have some sorting to do to ease the compilation of it).
Jebus
QUOTE (evereux @ Mar 21 2004, 08:45 AM)
QUOTE (PowerMacG4 @ Mar 21 2004, 04:15 PM)
QUOTE (Jojo @ Mar 21 2004, 08:08 AM)
QUOTE (dev0 @ Mar 19 2004, 08:53 AM)
If it saves bitrate that's a nice bonus, but the new recommended version should be at least as transparent using --preset standard as 3.90.3 is now.
Sacrificing quality for bitrate is not an option!

Why isn't --preset 320 recommended then laugh.gif

Because "--preset-insane" wastes bitrate without much gain in quality at all. --preset-standard is designed to achieve transparency on a vast majority of samples at the lowest bitrate possible. Transparency doesn't mean "kinda close". It means "transparent".

His point was, when do we draw the line and say that these bit-rates are becoming too inflated for a standard preset? Transparency in the true sense of the word will never be reached.

That being said, I'm in the process of encoding around 20GBs of wav files and the file size increase doesn't look alarming at all. I'll be more specific when the encoding is complete.

no, your definition of "transparency" is wrong. Transparency does not mean bitwise equivelency, it means the files are audibly indistinguishable. This is the case with --alt-preset standard, or at least the intention. This is ALSO the case with --alt-preset insane, but the extra bits don't make the files sound any better since transparency was already acheived using --alt-preset standard.
evereux
QUOTE (Jebus @ Mar 22 2004, 01:22 AM)
no, your definition of "transparency" is wrong. Transparency does not mean bitwise equivelency, it means the files are audibly indistinguishable. This is the case with --alt-preset standard, or at least the intention. This is ALSO the case with --alt-preset insane, but the extra bits don't make the files sound any better since transparency was already acheived using --alt-preset standard.

I didn't define transparency, I said in the true sense of the word. Using your analogy, what then is the point of the insane preset? I still think the whole point is being missed and for me this discussion is redundant now anyway.
evereux
QUOTE (evereux @ Mar 21 2004, 10:21 PM)
I've encoded 22.4GB of wav files using the LAME versions 3.96b1 (--preset standard) and 3.90.3 (--alt-preset standard). The resultant directory sizes are as follows:

3.90.3 = 3.24GB
3.96b1 = 2.99GB

I can provide an album list if required (I first have some sorting to do to ease the compilation of it).

Futher to this:

comparison

Be aware that the table is 412kb, I couldn't find a better way to trim the size down (html tables are crap, or at least my knowledge of them).

Generated from Encspot output original files here.
tigre
Thanks evereux.

If possible, could you (or anyone else) please do a similar test with 3.96b1 VBR?

With 3.90.3 there are no recommended VBR settings for quality lower then --alt-preset standard -Y because ABR sounds better in most cases at comparable bitrate. AFAIK Gabriel has changed a lot to impove -V 3 and lower, so this could lead to 3.96b1 VBR replacing 3.90.3 ABR as recommended lame compile/settings combination for < 192kbps bitrates. To make ABR vs. VBR tests useful, we need some test similar to the one you did, evereux.

In theory V-settings should give those average bitrates:
-V 5 : 128kbps
-V 4 : 144kbps
-V 3 : 160kbps
-V 2 : 192kbps

With the samples I tried so far (mainly -V 5) the numbers have been 10-20kbps higher on average. So could you please do some mass-encoding at -V 2 - -V 6, maybe a (representative) part of the tracks you used for (preset)standard bitrate comparison is enough...
LoFiYo
I think there should be an official guideline for a recommended version of LAME since HA will be endorsing it as its official recommendation. In other words what do HA admins consider a superior version of LAME to the current recommended version (3.90.3)?

I would think:
Version X is superior to Version Y when the sound quality of the encoded file is found to be higher 9 times (or more) out of 10 in all preset modes (from insane down to abr/cbr [x] kbps).

What would be the official HA stance on this? I am asking this, because we can keep testing and testing many many samples, but if there isn't a definition as to what constitutes a superior version, the test might never end...
evereux
QUOTE (tigre @ Mar 23 2004, 01:43 PM)
With the samples I tried so far (mainly -V 5) the numbers have been 10-20kbps higher on average. So could you please do some mass-encoding at -V 2 - -V 6, maybe a (representative) part of the tracks you used for (preset)standard bitrate comparison is enough...

Sure, I can do that. I'll start at V6, work my way down the numbers and post my results as they're completed.

If anyone else would like to start at V2 V3, please do so.

edit: V2 V3 was V2
sony666
just wanted to thank the LAME devs because they added --noreplaygain in the CVS smile.gif
tigre
evereux, I MESSED IT UP. I didn't remember correctly the numbers Gabriel gave me via PM ~ 1 week ago. Correct 'official' numbers are:

-V 6 : 128kbps
-V 5 : 144kbps
-V 4 : 160kbps
-V 3 : 192kbps

testing -V 2 isn't necessary since the result will be the same as --preset standard.



Testing this independantly should make sense anyway - thanks alot.
tigre
LoFiYo: Good point. I thought about this too (but I missed coming to a conclusion - so thanks for bringing this up again). IMO a good way would be to use some statistical analysis similar to the one used in rjamorim's recent tests. Every quality setting should be analysed this way independantly. When a certain level of confidence is reached (e.g. 95%) that one encoder is better then the other, the officialy recommended version can be announced for this quality setting. Testing wouldn't have to stop, if new test results / problem samples ... appear, the statistics can be calculated again and the recommended version could change.

The remaining question is: How to calculate the statistics... I don't know if using ABC/HR rankings without a big number of results for each sample (at a given quality setting) and without anchor makes any sense. Simple 2 choices "a is better than b" or "b is better than a" might be enough. I could be wrong, but I think in this case the "chance that codec a isn't better then codec b, but by picking test samples reandomly it still performed better on x out of y samples" can be calculated the same way as ABX p-values.
E.g. if 3.90.3 --alt-preset standard wins on 37 out of samples, 3.96b1 --preset standard wins on 23 samples, the chance to get this result in spite of both codecs are equal on a big number of samples woud be 4.62%
Right now (3.90.3 --alt-preset standard won on 8 samples out of 12) the chance that 3.90.3 is not better (simplified wording) would be 19%

Anyone with enough statistics knowledge arround? ff123?
Jojo
I still wonder about the criteria’s an encoder has to meet in order to be recommended by HA, though. If the goal was to get it most close to transparent --preset 320 should be chosen. However, it hasn't. So it must be somewhat connected to the file size. But at what extend? --preset 128 gives probably the best ratio of quality / file size but it's still not recommended. So who makes the criteria’s for that? Lame 3.96 gives in general a significant lower bitrate at --aps than 3.90.3. Let's assume LAME 3.96 would overall perform a bit worse than 3.90.3 does (I know that you can only tell from the tested samples). Why wouldn't the people here accept this slightly quality drawback for the benefit of the much lower bitrates? I mean they did the exact some thing when choosing LAME 3.90.3 --aps...otherwise --preset 320 was chosen, right?!
I think it's essential to clarify first what's expected from a HA recommended version.

In addition, I think the samples should be rated in terms of their improvements made. For instance, if 5 files sound slightly better with LAME 3.90.3 but only one file was turned from totally horrible to perfectly transparent using LAME 3.96, 3.96 should still be considered as recommended. If all improvements were just counted equally it wouldn't give a true picture of the encoders’ quality. Overall an encoder could sound much better, but still be beaten by another one that just produces tiny enhancements on samples but fails even harder on others...
2Bdecided
Jojo,

I think it's simpler than you think!

--alt-preset standard was (is?) simply the best VBR algorithm available at the time in terms of making as many samples as possible transparent for as many possible people.

--alt-preset insane was (is?) simply the best CBR algorithm available at the time in terms of making as many samples as possible transparent for as many possible people.


Of the few problem samples which remain, there was nothing else efficient and straight-forward Dibrom could do to --alt-preset standard to make them better. Forcing the bitrate up across the board (inefficient, but obvious!) will often reduce any remaining problems, but may not make them go away entirely, and will waste bits on the 99.999% of already transparent signals. That's why more people choose --alt-preset standard than --alt-preset insane.

To replace --alt-preset standard, I think something should be either
a) as good, at a lower bitrate, or
b) better at the same bitrate, or
c) better, at a lower bitrate wink.gif

In short, any improvement must fix more samples than it breaks, and/or reduce the bitrate. Any "improvement" which pushes the bitrate through the roof for samples which were already transparent is inefficient. I think the aim of psychoacoustic coding which is transparent and efficient.

If you want more efficiency you can go lower than aps, but you can forget about transparency (for the most critical listeners) for many signals. If you want more transparency, you can go higher than aps, but you can forget about efficiency for many signals!

The aim is a setting which uses just as many bits as necessary, intelligently, to make a signal transparent (or as close to transparent as is possible for that format), but no more.

Cheers,
David.
evereux
The results from encoding 22.2GB of wav files using LAME 3.96b1 are as follows.

-V3
Compressed to 2.73GB
With an average of 172.4kbps
Details (195KB)

-V4
Compressed to 2.60GB
With an average of 163.7kbps
Details (195KB)

-V5
Compressed to 2.04GB
With an average of 128.7kbps
Details (195KB)

-V6
Compressed to 1.92GB
With an average of 120.7kbps
Details (195KB)




Here is a zip file containing the deliminated text should you wish to present the information in a better way.

I'll edit this post to add more results (this will most likely be a similar time tommorrow).
Jojo
@ 2Bdecided

thanks for your answer!
QUOTE
Of the few problem samples which remain, there was nothing else efficient and straight-forward Dibrom could do to --alt-preset standard to make them better

well, if that is true what are the developers of LAME still doing? Are they trying to make LAME more efficiant? Do they lower the overall bitrate and get it to sound almost as good as LAME 3.90.3? I think I start to understand. If you were comparing 3.90.3 vs. 3.96 at --aps but with the same bitrate, 3.96 would be most likely the winner...so I think I see the improvements and goals of the developers. I actually use LAME 3.95 because of that smile.gif - I can accept some quality drawbacks, which I probably won't notice anyway, but what I do notice is the lower bitrate + there are samples that have been improved (which I probably won't hear either wink.gif)

Anyway, I'll still follow the listening test thread with great interest smile.gif
2Bdecided
QUOTE (Jojo @ Mar 25 2004, 12:50 PM)
@ 2Bdecided

thanks for your answer!
QUOTE
Of the few problem samples which remain, there was nothing else efficient and straight-forward Dibrom could do to --alt-preset standard to make them better

well, if that is true what are the developers of LAME still doing?

Just because one person can't solve a problem, doesn't mean that no one can!

If you search way back on this board, I think you'll find Dibrom suggesting that to go much further, the entire lame psychoacoustic model would need to be overhauled. I believe this is planned by at least one developer for Lame 4.

However, that's not to belittle the work which has been done since, on Lame 3.9x, which does give improved quality on some samples, and achieves this at lower bitrates.

Remember too that lame is developed for free, by people in their spare time.


As for your other questions - you can read posts from Gabriel and others, and see the Lame History file to see exactly what they're doing. See the Lame site, and the lame mp3encoder mailing list.

Cheers,
David.
evereux
QUOTE (tigre @ Mar 23 2004, 03:16 PM)
evereux, I MESSED IT UP. I didn't remember correctly the numbers Gabriel gave me via PM ~ 1 week ago. Correct 'official' numbers are:

-V 6 : 128kbps
-V 5 : 144kbps
-V 4 : 160kbps
-V 3 : 192kbps

testing -V 2 isn't necessary since the result will be the same as --preset standard.



Testing this independantly should make sense anyway - thanks alot.


My figures are a little different:

-V 6 : 121kbps
-V 5 : 129kbps
-V 4 : 164kbps
-V 3 : 172kbps
Gabriel
Lame 3.96 is now in beta2 stage.

In the context of this thread, the interesting point is that V1 and V2 are now using 128kbps as minimal bitrate.

It means that some samples could be improved with this new beta. As this can not decrease quality, I think that you only need to check again samples where 3.90.3 was superior to 3.96b1.

3.96b2 is expected to go final in the next two weeks.
xmixahlx
3.96.b2 available at RareWares (Debian Repository) as lame-cvs

...i'm sure john will be right behind me ph34r.gif smile.gif


later
Jojo
thanks to Gabriel and all the other LAME developers for their great work! Keep up the good work!!!
Vietwoojagig
QUOTE (Gabriel @ Mar 28 2004, 02:28 PM)
In the context of this thread, the interesting point is that V1 and V2 are now using 128kbps as minimal bitrate.

It means that some samples could be improved with this new beta. As this can not decrease quality, I think that you only need to check again samples where 3.90.3 was superior to 3.96b1.

Does this mean, that -b 128 (for V1 and V2) is the only change between beta1 and beta2? No differences for V0, V3, V4...?
Gabriel
QUOTE
Does this mean, that -b 128 (for V1 and V2) is the only change between beta1 and beta2? No differences for V0, V3, V4...?

Regarding vbr-old, this is the only change.
tigre
I've performed some mass-encoding bitrate test similar to evereux. The result is here (might be updated):



3.96b1 was used, so the numbers for --preset standard (= -V 2) are too low compared to the latest version, but according to what Gabriel has said, for all other settings in the table there shouldn't be a difference.
For direct comparison:
results by | evereux | tigre . | average
-----------+---------+---------+---------
-V 3 ..... | 172kbps | 170kbps | 171kbps
-V 4 ..... | 164kbps | 157kbps | 160kbps
-V 5 ..... | 129kbps | 122kbps | 126kbps
-V 6 ..... | 121kbps | 112kbps | 117kbps


So IMO it should be safe to modify Test instructions:
QUOTE
(alt)presets + VBR/ABR
(320kbps) 3.96 --preset insane vs. 3.90.3 --alt-preset insane
(~256kbps) 3.96 --preset extreme vs. 3.90.3 --alt-preset extreme
(~210kbps) 3.96 --preset standard vs. 3.90.3 --alt-preset standard
(~160kbps) 3.96 -V 4 vs. 3.96 --preset 160 vs. 3.90.3 --alt-preset 160
(~128kbps) 3.96 -V 5 vs. 3.96 --preset 128 vs. 3.90.3 --alt-preset 128

It seems like there's not much interest in testing bitrates between 128kbps and (alt)preset standard anyway so this should be enough.
Comments?
This is a "lo-fi" version of our main content. To view the full version with more information, formatting and images, please click here.
Invision Power Board © 2001-2009 Invision Power Services, Inc.