Help - Search - Members - Calendar
Full Version: lame --ap-s bitrate goes wild
Hydrogenaudio Forums > Lossy Audio Compression > MP3 > MP3 - Tech
sony666
Hi all,
when playing around with wavegain, I made a rather surprising experience.
I grabbed a track of a very "hot" (pushed to max volume) CD and encoded it once normally and once after applying wavegain. both times with dibrom's lame 3.90.2 and --ap-s, no other switches.

lame output of the "normal" encoding (the original wav file):

lame --alt-preset standard "Warlock - Time to Die.wav" orig_aps.mp3
LAME version 3.90.2 MMX (http://www.mp3dev.org/)
-- Compiled at http://www.hydrogenaudio.org

CPU features: i387, MMX (ASM used), SIMD, SIMD2
Using polyphase lowpass filter, transition band: 18671 Hz - 19205 Hz
Encoding Warlock - Time to Die.wav
to orig_aps.mp3
Encoding as 44.1 kHz VBR(q=2) j-stereo MPEG-1 Layer III (ca. 7.4x) qval=2
32 [ 2] %
128 [ 20] %
160 [ 337] %%*****
192 [ 1630] %%%%%%%%%************************
224 [ 3341] %%%%%%%%%%%%%%%%%%%%%%********************************************
256 [ 3287] %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%**************************
320 [ 1641] %%%%%%%%%%%%%%%%%%%%%%%%%%*******
average: 242.2 kbps LR: 4859 (47.37%) MS: 5399 (52.63%)


now the lame result of the wavegained input:

lame --alt-preset standard wavgain_std.wav wavgained.mp3
LAME version 3.90.2 MMX (http://www.mp3dev.org/)
-- Compiled at http://www.hydrogenaudio.org

CPU features: i387, MMX (ASM used), SIMD, SIMD2
Using polyphase lowpass filter, transition band: 18671 Hz - 19205 Hz
Encoding wavgain_std.wav
to wavgained.mp3
Encoding as 44.1 kHz VBR(q=2) j-stereo MPEG-1 Layer III (ca. 7.4x) qval=2
32 [ 5] %
128 [ 52] %
160 [ 1090] %%%%%%%********
192 [ 4908] %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%*********************************
224 [ 3325] %%%%%%%%%%%%%%%%%%%%*************************
256 [ 765] %%%%%%%****
320 [ 113] %*
average: 204.8 kbps LR: 4860 (47.38%) MS: 5398 (52.62%)


to make it complete, the output of wavegain.exe (0.9.8 win32) for the second encoded wav:

Gain | Peak | Scale | New Peak | Track
---------------------------------------------------------
-8.42 dB | 32549 | 0.38 | 12346 | wavgain_std.wav


So basically the average bitrate for the original wav is 242.2k, the bitrate of the wavgained encode 204.8k.
I mean, I expected a small difference, but 37,4kbit/s a VERY hefty imho when you consider that both mp3 sound exactly the same after RG scanning them in fb2k.

Should wavegaining before encoding be recommended to reduce bitrates for a large portion of today's music? Dang, I wonder how much diskspace is wasted here on all my "hot" Hard&Heavy tracks I didnt wavegain blink.gif
sony666
--a-p extreme does not differ so much in bitrate, but look at the Joint Stereo handling:

normal:

32 [ 2] %
128 [ 8] %
160 [ 141] %**
192 [ 895] %%*************
224 [ 2224] %%%%%%********************************
256 [ 3036] %%%%%%%%%%%%%%%%%%%%%%%****************************
320 [ 3952] %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%*************
average: 266.7 kbps LR: 4952 (48.27%) MS: 5306 (51.73%)


wavegained:

32 [ 4] %
128 [ 21] %
160 [ 156] %**
192 [ 800] %%%%%%*******
224 [ 2671] %%%%%%%%%%%%%%%%%%%%%%%%%%%%***************
256 [ 4109] %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%************
320 [ 2497] %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%****
average: 256.4 kbps LR: 7699 (75.05%) MS: 2559 (24.95%)

The wavegained encode has 75% "true stereo" blocks against 48% on the normal wav file. Very strange
JohnV
Try the same test, but using also -Y switch. It may be that one reason to this big difference with APS is the sfb21 problem of vbr mp3.
tigre
This test shows similar things. So a possible reason could be that by applying wavegain some parts of the signals are moved below the encoder's ATH, so it decides there's no need to encode them.
sony666
Thanks for your answer John smile.gif

-lame 3.90.2 --alt-preset standard -Y (on the original wav rip):

*** WARNING *** the meaning of the experimental -Y has changed!
now it tells LAME to ignore sfb21 noise shaping (VBR)
LAME version 3.90.2 MMX (http://www.mp3dev.org/)
Using polyphase lowpass filter, transition band: 18671 Hz - 19205 Hz
Encoding as 44.1 kHz VBR(q=2) j-stereo MPEG-1 Layer III (ca. 7.4x) qval=2
32 [ 2] %
128 [ 141] %*
160 [ 4782] %%%%%%%%%%%%%%%%%%%%******************************************
192 [ 5130] %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%************************
224 [ 106] %*
256 [ 22] %
320 [ 75] %
average: 177.6 kbps LR: 4859 (47.37%) MS: 5399 (52.63%)

*************

-lame 3.90.2 --alt-preset standard -Y (on the wavegained wav):

Using polyphase lowpass filter, transition band: 18671 Hz - 19205 Hz
32 [ 5] %
128 [ 172] %**
160 [ 4584] %%%%%%%%%%%%%%%%%%%****************************************
192 [ 5212] %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%*************************
224 [ 181] %%*
256 [ 27] %
320 [ 77] %
average: 178.2 kbps LR: 4860 (47.38%) MS: 5398 (52.62%)

*************

-lame 3.90.2 --alt-preset extreme -Y (on the original rip):

Using polyphase lowpass filter, transition band: 19383 Hz - 19916 Hz
32 [ 2] %
128 [ 135] %*
160 [ 4609] %%%%%%%%%%%%%%%%%%%%***************************************
192 [ 5205] %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%************************
224 [ 170] %%*
256 [ 52] %
320 [ 85] %*
average: 178.7 kbps LR: 4952 (48.27%) MS: 5306 (51.73%)

*************

-lame 3.90.2 --alt-preset extreme -Y (on the wavegained wav):

Using polyphase lowpass filter, transition band: 19383 Hz - 19916 Hz
32 [ 4] %
128 [ 82] %*
160 [ 2409] %%%%%%%%%%%%%*******************
192 [ 5035] %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%*************
224 [ 1858] %%%%%%%%%%%%%%%%%%%%%%%%*
256 [ 664] %%%%%%%%%
320 [ 206] %%%
average: 196.4 kbps LR: 7699 (75.05%) MS: 2559 (24.95%)

*************

Hmm.. that looks a lot more reasonable for --a-p s, but --a-p e is still acting up on the wavegained file (JS frames, 18k higher bitrate than to original)
Now I dont know what to make of this... doesnt -Y cut off at 16Khz (thats what I read here a lot) or is the encoder-output lowpass correct?
Whats the impact on quality when bitrate differs a whole 65k for the aps / aps -Y files... huh.gif
JohnV
With APS the situation is clear. At first there were louder high frequencies which according to psychoacoutics needed more resolution, but because of SFB21 problem of mp3, the bitrate bloats with vbr if the psyac thinks it needs to give very high resolution to over 16khz frequencies. The result of lack of scalefactor for ScaleFactorBand21 in that case (sfb21 covers the over 16khz frequencies) is that lower than 16khz frequencies will be encoded with unnecessarely high resolution (as funny as it sounds), thus the bloat!

When you applied wavegain, the higher frequencies didn't need that kind of resolution according to psychoacoustics (because those were probably less relatively audible/lower energy high freqs), thus it didn't result that kind of bloat.

When you applied APS -Y, it disabled noise shaping for over 16khz frequencies (sfb21), and the bloating effect disappers, but only some of the strongest high frequencies will be encoded.

With --alt-preset extreme -Y the bitrate difference obviously results from the higher use of stereo frames with the wavegained original. Extreme preset gives more easily stereo frames than APS. The reason why the lr/ms frame distribution changed is that some of the masking properties changed when you applied wavegain, and since the decicion of using lr/ms is based on masking calculations, and since extreme gives more easily stereo frames, the distribution changed quite a lot, and the bitrate increased.
Lev
ahhh

- thanks, I followed that thread with great interest
sony666
why not use 3.94 b12 also smile.gif

-lame 3.94 b12 --alt-preset standard (on the original wav):

LAME version 3.94 MMX (alpha 12, Mar 31 2003 12:01:50) (http://www.mp3dev.org/)
warning: alpha versions should be used for testing only
CPU features: i387, MMX (ASM used), SIMD, SIMD2
Using polyphase lowpass filter, transition band: 18671 Hz - 19205 Hz
Encoding orig.wav to 394b12_orig_aps.mp3
Encoding as 44.1 kHz VBR(q=4) j-stereo MPEG-1 Layer III (ca. 10x) qval=3
32 [ 2] *
96 [ 8] *
112 [ 40] %
128 [ 162] %**
160 [ 1588] %%************************
192 [ 4114] %%%%%%%%%%%%%%%%%*************************************************
224 [ 3383] %%%%%%%%%%%%%%%%%%%%%%%%%******************************
256 [ 843] %%%%%%********
320 [ 118] %%
average: 202.9 kbps LR: 3022 (29.46%) MS: 7236 (70.54%)

*********

-lame 3.94 b12 --alt-preset standard (on the wavegained sample):

Using polyphase lowpass filter, transition band: 18671 Hz - 19205 Hz
Encoding as 44.1 kHz VBR(q=4) j-stereo MPEG-1 Layer III (ca. 10x) qval=3
32 [ 5] %
96 [ 20] %
112 [ 46] %
128 [ 215] %**
160 [ 2574] %%%***************************
192 [ 5824] %%%%%%%%%%%%%%%%%%%%%%%%******************************************
224 [ 1303] %%%%%%*********
256 [ 191] %%*
320 [ 80] %
average: 188.3 kbps LR: 3025 (29.49%) MS: 7233 (70.51%)

***********

-lame 3.94 b12 --alt-preset standard -Y (on the original wav):

32 [ 2] *
96 [ 56] *
112 [ 277] %****
128 [ 781] %************
160 [ 3814] %%%%%%%%%****************************************************
192 [ 4174] %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%***********************************
224 [ 907] %%%%%%%********
256 [ 166] %%*
320 [ 81] %*
average: 177.4 kbps LR: 3022 (29.46%) MS: 7236 (70.54%)

*************

-lame 3.94 b12 --alt-preset standard -Y (on the wavegained sample):

32 [ 5] %
96 [ 75] %*
112 [ 277] %****
128 [ 814] %************
160 [ 3694] %%%%%%%%%*************************************************
192 [ 4228] %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%************************************
224 [ 927] %%%%%%%********
256 [ 163] %%*
320 [ 75] %*
average: 177.3 kbps LR: 3025 (29.49%) MS: 7233 (70.51%)

*********
*********

-lame 3.94 b12 --alt-preset extreme (on the original wav):

Using polyphase lowpass filter, transition band: 19383 Hz - 19916 Hz
Encoding as 44.1 kHz VBR(q=4) j-stereo MPEG-1 Layer III (ca. 10x) qval=3
32 [ 2] %
128 [ 12] %
160 [ 51] %
192 [ 446] %%%%%%**
224 [ 3088] %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%**
256 [ 3868] %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%*
320 [ 2791] %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%*
average: 260.3 kbps LR: 9855 (96.07%) MS: 403 (3.929%)

*********

-lame 3.94 b12 --alt-preset extreme (on the wavegained sample):

32 [ 3] %
128 [ 22] %
160 [ 54] %
192 [ 655] %%%%%%%%%**
224 [ 4115] %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%**
256 [ 4248] %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%*
320 [ 1161] %%%%%%%%%%%%%%%%%%*
average: 245.5 kbps LR: 9854 (96.06%) MS: 404 (3.938%)

*******

-lame 3.94 b12 --alt-preset extreme -Y (on the original wav):

32 [ 2] %
128 [ 32] %
160 [ 1094] %%%%%%%%%%%*
192 [ 6156] %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%**
224 [ 2157] %%%%%%%%%%%%%%%%%%%%%%%*
256 [ 586] %%%%%%%
320 [ 231] %%%
average: 201.6 kbps LR: 9855 (96.07%) MS: 403 (3.929%)

*********

-lame 3.94 b12 --alt-preset extreme -Y (on the wavegained sample):

32 [ 3] %
128 [ 38] %
160 [ 1090] %%%%%%%%%%%*
192 [ 6165] %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%**
224 [ 2165] %%%%%%%%%%%%%%%%%%%%%%%*
256 [ 569] %%%%%%*
320 [ 228] %%%
average: 201.5 kbps LR: 9854 (96.06%) MS: 404 (3.938%)

******

phew, sorry for spamming the forum smile.gif My n00biish observations for 3.94 b12:
--ap extreme looks VERY aggressive on the LR-Stereo usage in 3.94 b12, no matter if gained and/or use of -Y
- the -Y switch gives steady and for my imagination, reasonable bitrates
sony666
QUOTE (JohnV @ Apr 2 2003 - 04:50 PM)
With APS the situation is clear. At first there were louder high frequencies which according to psychoacoutics needed more resolution, but because of SFB21 problem of mp3, the bitrate bloats with vbr if the psyac thinks it needs to give very high resolution to over 16khz frequencies. The result of lack of scalefactor for ScaleFactorBand21 in that case (sfb21 covers the over 16khz frequencies) is that lower than 16khz frequencies will be encoded with unnecessarely high resolution (as funny as it sounds), thus the bloat!

When you applied wavegain, the higher frequencies didn't need that kind of resolution according to psychoacoustics (because those were probably less relatively audible/lower energy high freqs), thus it didn't result that kind of bloat.

When you applied APS -Y, it disabled noise shaping for over 16khz frequencies (sfb21), and the bloating effect disappers, but only some of the strongest high frequencies will be encoded.

With --alt-preset extreme -Y the bitrate difference obviously results from the higher use of stereo frames with the wavegained original. Extreme preset gives more easily stereo frames than APS. The reason why the lr/ms frame distribution changed is that some of the masking properties changed when you applied wavegain, and since the decicion of using lr/ms is based on masking calculations, and since extreme gives more easily stereo frames, the distribution changed quite a lot, and the bitrate increased.

most interesting smile.gif
Might there be any chance that "the stuff -Y does" (or something very similar with a bit less impact but still stabilizing bitrates) will be defaulted in 3.94 for -a-p's?

It's just like, this was not a very difficult sample imho (standard Heavy Metal), I stumbled upon the bitrate discrepancy just by accident while testing wavegain. -Y reduced bitrate by 65k for aps/3.90.2, 88k (jeez) for ape/3.90.2, 25k for aps/3.94b12, 58k for ape/3.94b12 (the wavegain stuff not considered)
Gabriel
Heavy metal is considered to be difficult for mp3.
JohnV
QUOTE (sony666 @ Apr 2 2003 - 09:01 PM)
Might  there be any chance that "the stuff -Y does" (or something very similar with a bit less impact but still stabilizing bitrates) will be defaulted in 3.94 for -a-p's?

Well.. if you increase masking (decrease resolution) of sfb21 frequencies, then it prevents the bloat (prevents unnecessarely high resolution of under sfb21 frequencies).
APS already does this by using --ns-sfb21 3.75 but of course if you increase the over 16khz masking, it will result less high frequencies. As said before this whole thing is a problem in mp3 technical specs. If MP3 had a scalefactor for adjusting sfb21 resolution, this wouldn't be an issue.
I think Dibrom pretty much found out that increasing sfb21 masking by 3.75dB is quite optimal considering both quality and bloat, so the same figure is used in alpha builds. May be it's worth checking again because 3.94alphas have a bit different masking properties than stable builds.

Heavy metal is pretty much the type of music which results often this bloat effect.
This is a "lo-fi" version of our main content. To view the full version with more information, formatting and images, please click here.
Invision Power Board © 2001-2009 Invision Power Services, Inc.