Yes, I know that MP3Gain is lossless. No, I am not talking about how MP3Gain is limited to 1.5dB steps (though this is another reason).
What I am thinking is this:
Psychoacoustic models tend to encode louder signals with more bits, right? This is why remasters (compressed) tend to use higher bitrates than older versions. Correct?
If we are then encoding a VERY LOUD album with LAME -aps for instance, then normalizing it down say 10.5dB (for a really bad one), aren't we wasting bits? Wouldn't we be able to save some bits by normalizing it down to 89dB FIRST, then encoding it?
I know wavegain is NOT LOSSLESS, but I am (incorrectly) assuming that this is just a theoretical lossiness, not a perceptible one. If this is the case, couldn't we argue that wavegained tracks will be of lower bitrates while still maintaining transparency (the goal, afterall, of -aps)?
---------------------------------
| UPDATE (May 21st, 2005) |
---------------------------------
The discussion below basically concludes that running a wavegain analysis, then applying the recommended scalefactor to lame (via the --scale switch) will save bits due to high-frequency bloat inherent in the mp3 format. Lots of people have since chosen to use this method over mp3gain, and in fact it can be automated now from within EAC using
Wack, or my own
Omni Encoder. Read on!
Doing that right now... ETA 5min on the encode
Echizen
Jun 21 2003, 15:10
*lol* a new way to save bits. first decrease the volume, encode and then increase with mp3gain
john33
Jun 21 2003, 15:12
The theory does work in practice, although I don't believe the difference was huge. But then you wouldn't expect it to be that large would you? I should be quoting numbers here, but I don't have any to hand without retesting!
Hmmm... I'll hazard a simple guess and say "yes," since I know that you can save bits by first using WaveGain and then compressing with a lossless algorithm (tested extensively for FLAC, Shorten and WavPack, although I suspect Monkey's audio and the rest would exhibit similar behavior).
- M.
Here are the exciting results:
"Mer De Noms" by "A Perfect Circle"
MP3Gain wanted to do a -9dB adjustment, so this albums is pretty loud, though not the loudest by any means.
The files repaygained after the fact (MP3Gain) totalled 70.6MB
The files replaygained before encoding (Wavegain) totalled 63.4MB
So, using --alt-preset standard, you end up with about a 10% savings for a relatively loud modern recording. I assume the savings will be less for older recordings and more for hypercompressed drivel like they release today.
I think this is pretty significant...
My only thought it, are the files ABXable against eachother? I don't have the ears to do the test unfortunately... Anyone care to try? I certainly hope they are indistinguishable.
HansHeijden
Jun 21 2003, 15:22
For mp3, it would be an idea if wavegain could pass on just the replaygain value to lame's --scale, thereby avoiding the rounding to 16 bits.
QUOTE(Jebus @ Jun 21 2003 - 12:34 PM)
Psychoacoustic models tend to encode louder signals with more bits, right? This is why remasters (compressed) tend to use higher bitrates than older versions. Correct?
I believe that assertion may be false. If I understand correctly, the overall loudness of a track should have little effect on the encoding. The total amount of perceptible detail in a track is what affects bitrate. The remastered bitrates are probably like that because the process of compression brings previously inaudible details up to the level of audibility.
By decreasing the volume level, you decrease the amount of detail available overall, with some small-amplitude details being obliterated by the noise floor. Thus, saved bits.
Edit: This is my theory, anyhow... Any knowledgeable rebuttals?
QUOTE(HansHeijden @ Jun 21 2003 - 01:22 PM)
For mp3, it would be an idea if wavegain could pass on just the replaygain value to lame's --scale, thereby avoiding the rounding to 16 bits.
Dude, that is a wonderful idea! Any LAME/wavegain developers listening?? This would be SO AWSOME!
john33
Jun 21 2003, 15:35
QUOTE(HansHeijden @ Jun 21 2003 - 09:22 PM)
For mp3, it would be an idea if wavegain could pass on just the replaygain value to lame's --scale, thereby avoiding the rounding to 16 bits.
You can do that manually by just having the gain calculated.
QUOTE(john33 @ Jun 21 2003 - 01:35 PM)
QUOTE(HansHeijden @ Jun 21 2003 - 09:22 PM)
For mp3, it would be an idea if wavegain could pass on just the replaygain value to lame's --scale, thereby avoiding the rounding to 16 bits.
You can do that manually by just having the gain calculated.
yeh but what a pain! And AFAIK, replaygain won't display an album --scale value, so I'd have to calculate it manually, then apply this with --scale to every encoded MP3 i do. That's pretty time consuming...
john33
Jun 21 2003, 17:12
QUOTE(Jebus @ Jun 21 2003 - 09:39 PM)
QUOTE(john33 @ Jun 21 2003 - 01:35 PM)
QUOTE(HansHeijden @ Jun 21 2003 - 09:22 PM)
For mp3, it would be an idea if wavegain could pass on just the replaygain value to lame's --scale, thereby avoiding the rounding to 16 bits.
You can do that manually by just having the gain calculated.
yeh but what a pain! And AFAIK, replaygain won't display an album --scale value, so I'd have to calculate it manually, then apply this with --scale to every encoded MP3 i do. That's pretty time consuming...
WaveGain will display the Album Gain value in calculation mode and if you have it write the log file, all the values will be saved in the log file for future reference.
BTW, I am the author of WaveGain.
ah but i need to manually calculate scalefactor from that number right?
AFAIKT, this is simply 10E(replaygain * 0.05)
am I correct? So an album with a recommended gain adjustment of -10.02dB should have a --scale applied to lame that is 3.17? Seems to be roughly 10x too large. I'm going to write a script to automate this process now (once i figure this calc out), since I think this is all-around preferable to using MP3Gain.
EDIT: Nevermind, got it. I forgot to make the replaygain value negative (duh!).
mrosscook
Jun 21 2003, 20:48
Jebus,
In a problem like yours it can help to consider the asymptotic case, i.e., take the process to its logical conclusion. Suppose that we first wavegain the file to zero volume (digital silence). Then it takes only 1 bit to encode it. (Or is it 0 bits? Anyhow, it is a pretty good savings in file size). But if we try to gain back the volume of the encoded file, we find that we've lost all the information.
It's reasonable to expect that for intermediate amounts of wavegain, we should get intermediate savings in file size, and also lose intermediate amounts of information. (Of course, there's no guarantee that the process is linear or even monotonic; but we know where it has to start and end, so it has to pass through the points in between.)
westgroveg
Jun 21 2003, 21:03
The question of whether wavgain artifacts are perceptible still remains
Well as it stands, I have just ripped a couple of trial albums...
As recommended, what I did was extract using EAC, then run a wavegain ANALYSIS ONLY on the album. From the resulting suggested album gain adjustment I did 10E(gain * 0.05) to get the scalefactor, then ran LAME using:
lame --alt-preset standard --scale 0.3155 (for example - this is By The Way by the Red Hot Chili Peppers)
The resulting files are 10% smaller than if I had used MP3Gain after the encode, there is no theoretically lossy wavegaining being done, and I get much better than 1.5dB per step precision (the above files are 0.02dB over the ideal as set by replaygain).
I can't think of any reason why this is not the best possible way to encode MP3s. Not even much more work. Anyone have comments?
mrosscook
Jun 21 2003, 21:35
Westgroveq, I think it might be better to say that the question is not whether wavgain artifacts are perceptible, but when they become perceptible.
If we wavgained all the way to silence, anybody could ABX that as an artifact. But of course practical values of wavegain are going to lose much less information, and it wouldn't surprise me if 99.9% of all actual wavgain applications were impossible to ABX, by even the battiest of bat-ears on the forum.
It would be interesting to know when the information loss in wavgain becomes great enough to ABX, and it probably depends on the style of music, degree of compression and clipping in the original file, dynamic range of the music, etc. Any volunteers?
Jebus, Are you sure that --scale isn't lossy in itself? You could in principle use it to reduce the volume to zero, couldn't you, just as wavgain could do?
QUOTE(Jebus @ Jun 21 2003 - 07:23 PM)
As recommended, what I did was extract using EAC, then run a wavegain ANALYSIS ONLY on the album. From the resulting suggested album gain adjustment I did 10E(gain * 0.05) to get the scalefactor, then ran LAME using:
lame --alt-preset standard --scale 0.3155 (for example - this is By The Way by the Red Hot Chili Peppers)
So, just to be absolutely certain, you are not altering the source WAV file in any way, yet are getting smaller file sizes?
There should be no difference between the MP3 files, or very, very little at most, if my understanding is correct.
QUOTE(Canar @ Jun 21 2003 - 08:14 PM)
QUOTE(Jebus @ Jun 21 2003 - 07:23 PM)
As recommended, what I did was extract using EAC, then run a wavegain ANALYSIS ONLY on the album. From the resulting suggested album gain adjustment I did 10E(gain * 0.05) to get the scalefactor, then ran LAME using:
lame --alt-preset standard --scale 0.3155 (for example - this is By The Way by the Red Hot Chili Peppers)
So, just to be absolutely certain, you are not altering the source WAV file in any way, yet are getting smaller file sizes?
There should be no difference between the MP3 files, or very, very little at most, if my understanding is correct.
Yes that is correct. No modification of the source wave. They are quite a bit smaller. I've done a number of albums now from the last 10 years and they are coming out an average of 10% smaller.
QUOTE(mrosscook @ Jun 21 2003 - 07:35 PM)
Jebus, Are you sure that --scale isn't lossy in itself? You could in principle use it to reduce the volume to zero, couldn't you, just as wavgain could do?
--scale is absolutely lossy, as is the entire encoding/quantization process during which it is set to a particular --scale anyhow (doesn't --aps use a scalefactor of 0.98 anyhow?). By using wavegain you are introducing an extra lossy step which involves additional dithering. This cuts that step out.
indybrett
Jun 21 2003, 23:29
Geez...
Does this mean I have to re-encode everything again
QUOTE(Jebus @ Jun 22 2003 - 06:37 AM)
(doesn't --aps use a scalefactor of 0.98 anyhow?).
No, only the CBR presets.
- to avoid loss due to applying wavegain (= lowering volume at 16 bit res.) you could use fb2k's diskwriter @ 24 bit and encode the resulting .wav files with lame
- in my
MP3 vs. MPC vs. Ogg: Low volume test I got similar results, but I focussed on quality problems not on bitrate.
- I *think* messing with lame ATH settings could have the same effect (--althlower switch with negative values).
It does not surprise me that files get lower bitrates when encoding to mp3 after first wavgaining them.
The simple fact that I throw away something like 0.5 to 1.0 bit of digital resolution could explain the lower bitrate easiliy.
For example a full range 16 bit signal scaled with let's say a 0.7 factor gives ~15.48 bits digital resolution left.
In general the loss of bits can be expressed as:
CODE
n=-2log(scalefactor)
Where n = the loss of bit depth of the signal, 2log is the 2 based logarithm
Of course this effect could be avoided by first resampling to 24 bit wav as mentioned in another post
HansHeijden
Jun 22 2003, 04:53
--scale is applied to float values so the lower volume doesn't throw away bits.
The question is, if it would be better to let lame encode music that is already at a (supposedly) fixed listening volume level, rather than just encode at original level and make the correction afterwards. I would think the first, but perhaps lame (presets) needs some ath 'retuning' to correct for the usually much lower input volumes.
john33
Jun 22 2003, 05:06
When applying the gain, WaveGain converts the input data to float, performs all gain/hard limiting adjustment and then converts, with or without dithering, back to desired output bitwidth. You can write out 8 bit, 16 bit, 24 bit, 32 bit integers or floats as you wish regardless of the input bitwidth.
john33
Jun 22 2003, 05:45
QUOTE(Jebus @ Jun 21 2003 - 11:35 PM)
ah but i need to manually calculate scalefactor from that number right?
I just uploaded WaveGain V1.0.1 that does that for you and displays the Scale next to the Album Gain.
QUOTE(john33 @ Jun 22 2003 - 06:45 AM)
I just uploaded WaveGain V1.0.1 that does that for you and displays the Scale next to the Album Gain.
Nice! I had just thought to myself, "I wonder how long it will take John to add a scale-factor calculation to WaveGain," and thirty seconds later you did so.
Any chance the author of those CoolEdit plugins shared his algorithms...?
- M.
de Mon
Jun 22 2003, 07:17
May be I misunderstood something... If we are going to gain before encoding what will happen to signal/noise ratio? Gaining them down will increase noise level. Am I wrong?
Any non-zero value will be raised by the same amount as the gain applied to the file overall. But the difference between the floor and the peak values will stay constant (assuming no clipping). That's why there's no substitute for strongly recorded track versus a weak one that has been RG'd.
xen-uno
mrosscook
Jun 22 2003, 07:44
Tigre, Hanky, HansHeijden, and John33,
I don't think that going to a greater bitdepth, or even to floating-point calculations, alters the problem in principle. If we carried out the entire wavgain process, for example, from a floating-point input all the way down to floating-point silence, we have still lost all the information in the original signal; the asymptote is still complete loss.
I agree that in practice the amount of loss is likely to be so small as to be imperceptible, and that higher bitdepths will eliminate noise that would be introduced by dithering otherwise. But the basic problem is still there.
After using the search function and finally realizing that this issue was discussed many times before on HA, I came to the conclusion that the whole ReplayGain concept will always be a compromise.
Surprisingly this was stated very clearly in the
Replay Gain FAQ as early as december 2001 by David Robinson himself....
QUOTE
To maintain full dynamic range, the ideal solution is to feed the Replay Level value out of your PC, to your volume control. Obviously this requires dedicated hardware, and few people are going to do this, but it would be possible for those who demand highest quality to put an end to stupid fluctuations in level in this manner. No compromise. No downside.
QUOTE(john33 @ Jun 22 2003 - 03:45 AM)
QUOTE(Jebus @ Jun 21 2003 - 11:35 PM)
ah but i need to manually calculate scalefactor from that number right?
I just uploaded WaveGain V1.0.1 that does that for you and displays the Scale next to the Album Gain.
Thanks John! I actually made this small change myself, but didn't have access to a windows compiler. This shaves a few seconds off my work.
Whether this process is lossy is I suppose still debatable, but as far as I can tell, inputing a --scale value is in all ways superiour to MP3Gaining or Wavegaining...
1) you get precisely 89.0 dB (not +- 1.5dB) like MP3Gain does.
2) you don't have to dither before encoding like Wavegain requires.
3) newer (louder) albums do not receive an inflated bitrate like they would with MP3Gain.
4) signals below the threshold of hearing AFTER GAINING are simply discarded by the psymodel, instead of being artificially turned down later (by MP3Gain).
ibm2080
Jun 22 2003, 13:27
QUOTE(mrosscook @ Jun 22 2003 - 05:44 AM)
Tigre, Hanky, HansHeijden, and John33,
I don't think that going to a greater bitdepth, or even to floating-point calculations, alters the problem in principle. If we carried out the entire wavgain process, for example, from a floating-point input all the way down to floating-point silence, we have still lost all the information in the original signal; the asymptote is still complete loss.
I agree that in practice the amount of loss is likely to be so small as to be imperceptible, and that higher bitdepths will eliminate noise that would be introduced by dithering otherwise. But the basic problem is still there.
I completely agree with you.
You explain it in an earlier post too (same thread, page 1).
I am not sure how the code works at all, I haven't even looked at it to tell you the truth, but I think mrosscook's point is valid from a logical point of view, assuming that the floor is at a constant value and that values below that floor are dropped. If we apply a negative gain, everything in that file will drop by that much. Keep in mind that before the gain was applied there was some sound info that was just above the floor limit. If all that is true, then there should be some sound info that drops below the floor limit if we apply a negative gain (ie is dropped from the file), meaning there is less sound info after the gain process to encode. It also means that if you take the same track, and apply two different gains to it, one being more negative than the other, the resulting files should have two different sizes.
The thing is, when you MP3gain a file down a few dB, it ends up clipping sounds from the lowend anyhow... so it is still throwing away sound. It is still there in the MP3, just inaudible in the same sense that a too-loud signal is clipped WITHOUT the normalizing. By doing it my way with --scale you are still loosing the same information, but instead of keeping it in the MP3, it is discarded during the encode process.
essentially what I am saying is that yes, if you --scale 0.0 you will have a 0kB file with no information in it. But if you MP3gain a file down to 0, you are also getting a file that will have no audible information - it's still there, but its all been clipped below playback threshold. So in this sense, both methods are just as lossy... the MP3Gain method is reversible however, while the --scale method has lost that data forever in the interest of space savings.
mmortal03
Jun 22 2003, 14:08
QUOTE(Jebus @ Jun 22 2003 - 12:19 PM)
4) signals below the threshold of hearing AFTER GAINING are simply discarded by the psymodel, instead of being artificially turned down later (by MP3Gain).
This is what I was thinking. If we aren't going to amplify (or RG) these mp3s later, who cares if we are leaving out sounds below the threshold? As long as we were going to listen to them MP3Gained in the end anyway, and probably not ever touch them again, it shouldn't matter to the majority of people, and plus there is a 10% file size savings.
My question is, without amplifying these --scale'd mp3s, is there ANY perceptible difference with an equally MP3Gain scaled mp3? Peceptibly, there shouldn't be.
I personally can't ABX a difference, but my ears aren't as well tuned as others... Someone wish to do an ABX test on a highly compressed album done both with MP3Gain and --scale?
_Shorty
Jun 23 2003, 03:10
don't certain portions of the mp3 encoding process rely on the volume level of the signal to begin with? I'm sure it must determine the audibility of certain things in relation to full-scale, no? Or is it all relative to other portions of the signal content, with no regard for full-scale at all? Seems to me that if full-scale matters then encoding should be done first with replaygain/mp3gain only being done afterwards. If full-scale doesn't matter at all though, then it would seem to make sense to --scale the data after all. But this also makes me wonder something else. Wouldn't mp3 encoder quality, or any lossy format at all for that matter, sound best with a certain standardized playback volume so it could actually match up with what the psy-model is throwing away? Surely absolute volume matters too, and not just relative volume of differing frequency components.
2Bdecided
Jun 23 2003, 03:57
In most of this thread, the real problem is only just hinted at.
Don't worry about what replay gain is doing or what scale is doing - you're not losing anything at all here when using the calculated value to set --scale in lame. Also, don't worry about what happens at the very limit: --scale 0.0 isn't possible - it's silence, and it's infinite attentuation. --scale 0.0 is less than -50dB, less than -100dB, less than -200dB, - it's -infinity dB! As such, it's misleading and irelevant. Take -100dB as the limit case: perfectly possible without loss in 32-bit floating point calculations. So, no need to worry there.
We're left with two worries: what lame does, and what happens at the 16-bit output when decoded. OK, forget the second one of these - this is always an issue with Replay Gain, and was discussed before there was even any RG software available (thanks for the quote Hanky!).
So, we have one, and only one issue: what does lame do? IIRC It supposedly looks at the "loudness" of the signal, makes some assumptions from this, and enforces a sensible absolute threshold of hearing. This is most critical for quiet tracks - if it assumed that you will never turn up the volume to hear them, then most of the audio signal is way below hearing threshold, and can be discarded. This used to happen, but people often turn up the volume, and heard problems: so (again, IIRC) for a few years now, lame has taken this into account, and shifted the ath down for quieter tracks. There's a time constant involved, and it may not be a linear process - this is why changing the volume before encoding can change the output file size. If lame were adjusting everything immediately and linearly, scaling a file in the floating point domain would make no difference at all to the resulting filesize.
There are probably other factors at work here. What worries me slightly is that lame --aps has been tuned with tracks ripped from CDs. Not with tracks ripped from CDs and then dropped by 10dB. If you get a smaller file, then by ddefinition it has less information in it. Something has been lost. The question is what? and does it matter?
Maybe it's possible to tell lame that the file has been replay gained, and that you will be listening to this track at a particular loudness - that then fixes the ath. It would be interesting to do this, so fixing the ath at the "correct" value, to see what kind of bitrates this yields.
Replay gain originally targetted 83dB. This calibration assumes that a full scale sine wave will give 103dB SPL. Most of the current implementations target 89dB. This calibration assumes that a full scale sine wave will give 97dB SPL.
I bet that lame is guessing that a full scale sine wave would give about 85-90dB SPL for highly compressed tracks. What switches can people use to turn off lame's automatic ath adjustment, and to enforce it at a level which makes sense if a full scale sine wave = 97 dB SPL?
(I'm assuming that no one will listen louder than this, and that listening quieter than this will not break the psychoacoustics. Both these assumptions are false, but so are any assumptions that assume how loud people will listen - including the current automatic one in lame - so I'm hoping this doesn't cause any problems).
John - is Wavgain using 83 or 89dB target?
Lame experts - which switches can fix the ath where we want it?
Jebus - can you try whatever gets suggested with your test track and report back please?
Dibrom/other devs = if you're reading this, can you see any way in which this would break --aps or lame in general?
Cheers,
David.
john33
Jun 23 2003, 04:10
QUOTE(2Bdecided @ Jun 23 2003 - 09:57 AM)
John - is Wavgain using 83 or 89dB target?
Cheers,
David.
89dB.
ibm2080
Jun 23 2003, 06:14
QUOTE(2Bdecided @ Jun 23 2003 - 01:57 AM)
There are probably other factors at work here. What worries me slightly is that lame --aps has been tuned with tracks ripped from CDs. Not with tracks ripped from CDs and then dropped by 10dB. If you get a smaller file, then by ddefinition it has less information in it. Something has been lost. The question is what? and does it matter?
Cheers,
David.
That
is the heart of the problem as I see it.
Let's just take a look at the current possibilities to apply ReplayGain to an MP3 (as pointed out in this thread; excluding fb2k's implementation):
Original -> lame -> MP3Gain
Original -> Wavegain (+ Dither/NS?) -> lame
Original - > (Wavegain to calc. scale) -> lame + scale
Another interesting point to consider is, how the Dithering/NoiseShaping of Wavegain influences/confuses the behaviour of lame (ATH?).
dev0
2Bdecided
Jun 23 2003, 08:33
QUOTE(dev0 @ Jun 23 2003 - 01:45 PM)
1. Original -> lame -> MP3Gain
2. Original -> Wavegain (+ Dither/NS?) -> lame
3. Original - > (Wavegain to calc. scale) -> lame + scale
Another interesting point to consider is, how the Dithering/NoiseShaping of Wavegain influences/confuses the behaviour of lame (ATH?).
The thing is, 2 can never be better than 3 - as you suggest, there could be interesting consequences of using it, but because 3 must be better than 2, I'd rather forget 2 and explore 3.
3 may be more useful, because the only quality concerns are within Lame itself.
Cheers,
David.
john33
Jun 23 2003, 09:59
These are the results from using one track only. It may, or may not, be representative, but the results are fairly interesting. I make no comment, I am posting for others to comment.
CODE
******** WaveGain, NO DITHER
D:\testdir>lame --preset standard 132.wav 132.mp3
LAME version 3.90.3 MMX (http://www.mp3dev.org/)
CPU features: i387, MMX (ASM used), 3DNow!, SIMD
Using polyphase lowpass filter, transition band: 18671 Hz - 19205 Hz
Encoding 132.wav to 132.mp3
Encoding as 44.1 kHz VBR(q=2) j-stereo MPEG-1 Layer III (ca. 7.4x) qval=2
Frame | CPU time/estim | REAL time/estim | play/CPU | ETA
8206/8208 (100%)| 0:35/ 0:35| 0:35/ 0:35| 6.0438x| 0:00
32 [ 151] ****
128 [ 790] %%***************
160 [2880] %%%%%********************************************************
192 [3157] %%%%%%%%%%%%%%%%%*************************************************
224 [ 803] %%%%%************
256 [ 247] %%****
320 [ 180] %***
average: 179.5 kbps LR: 1384 (16.86%) MS: 6824 (83.14%)
Writing LAME Tag...done
******** WaveGain, DITHER with NO NOISE SHAPING
D:\testdir>lame --preset standard 132.wav 132.mp3
LAME version 3.90.3 MMX (http://www.mp3dev.org/)
CPU features: i387, MMX (ASM used), 3DNow!, SIMD
Using polyphase lowpass filter, transition band: 18671 Hz - 19205 Hz
Encoding 132.wav to 132.mp3
Encoding as 44.1 kHz VBR(q=2) j-stereo MPEG-1 Layer III (ca. 7.4x) qval=2
Frame | CPU time/estim | REAL time/estim | play/CPU | ETA
8206/8208 (100%)| 0:36/ 0:36| 0:36/ 0:36| 5.9545x| 0:00
32 [ 150] %%%*
128 [ 801] %%***************
160 [2848] %%%%%******************************************************
192 [3201] %%%%%%%%%%%%%%%%%%************************************************
224 [ 782] %%%%%************
256 [ 240] %%***
320 [ 186] %%**
average: 179.5 kbps LR: 1520 (18.52%) MS: 6688 (81.48%)
Writing LAME Tag...done
******** WaveGain, DITHER with LIGHT NOISE SHAPING
D:\testdir>lame --preset standard 132.wav 132.mp3
LAME version 3.90.3 MMX (http://www.mp3dev.org/)
CPU features: i387, MMX (ASM used), 3DNow!, SIMD
Using polyphase lowpass filter, transition band: 18671 Hz - 19205 Hz
Encoding 132.wav to 132.mp3
Encoding as 44.1 kHz VBR(q=2) j-stereo MPEG-1 Layer III (ca. 7.4x) qval=2
Frame | CPU time/estim | REAL time/estim | play/CPU | ETA
8206/8208 (100%)| 0:36/ 0:36| 0:36/ 0:36| 5.9339x| 0:00
32 [ 14] %
40 [ 1] *
128 [ 916] %%%%****************
160 [2877] %%%%%*******************************************************
192 [3172] %%%%%%%%%%%%%%%%%%************************************************
224 [ 797] %%%%%************
256 [ 249] %%****
320 [ 182] %***
average: 181.2 kbps LR: 1534 (18.69%) MS: 6674 (81.31%)
Writing LAME Tag...done
******** WaveGain, DITHER with MEDIUM NOISE SHAPING
D:\testdir>lame --preset standard 132.wav 132.mp3
LAME version 3.90.3 MMX (http://www.mp3dev.org/)
CPU features: i387, MMX (ASM used), 3DNow!, SIMD
Using polyphase lowpass filter, transition band: 18671 Hz - 19205 Hz
Encoding 132.wav to 132.mp3
Encoding as 44.1 kHz VBR(q=2) j-stereo MPEG-1 Layer III (ca. 7.4x) qval=2
Frame | CPU time/estim | REAL time/estim | play/CPU | ETA
8206/8208 (100%)| 0:36/ 0:36| 0:36/ 0:36| 5.9831x| 0:00
32 [ 10] %
128 [ 916] %%%%****************
160 [2895] %%%%%********************************************************
192 [3156] %%%%%%%%%%%%%%%%%*************************************************
224 [ 810] %%%%%************
256 [ 245] %%****
320 [ 176] %%**
average: 181.2 kbps LR: 1485 (18.09%) MS: 6723 (81.91%)
Writing LAME Tag...done
******** WaveGain, DITHER with HEAVY NOISE SHAPING
D:\testdir>lame --preset standard 132.wav 132.mp3
LAME version 3.90.3 MMX (http://www.mp3dev.org/)
CPU features: i387, MMX (ASM used), 3DNow!, SIMD
Using polyphase lowpass filter, transition band: 18671 Hz - 19205 Hz
Encoding 132.wav to 132.mp3
Encoding as 44.1 kHz VBR(q=2) j-stereo MPEG-1 Layer III (ca. 7.4x) qval=2
Frame | CPU time/estim | REAL time/estim | play/CPU | ETA
8206/8208 (100%)| 0:36/ 0:36| 0:36/ 0:36| 5.9390x| 0:00
40 [ 1] *
128 [1115] %%%%%*******************
160 [3176] %%%%%%%%**********************************************************
192 [2890] %%%%%%%%%%%%%%%%*********************************************
224 [ 653] %%%%**********
256 [ 222] %%***
320 [ 151] %***
average: 177.5 kbps LR: 1583 (19.29%) MS: 6625 (80.71%)
Writing LAME Tag...done
******** ORIGINAL WAVE FILE + LAME --scale
D:\testdir>D:\testdir>lame --preset standard --scale 0.691 13.wav 13.mp3
LAME version 3.90.3 MMX (http://www.mp3dev.org/)
CPU features: i387, MMX (ASM used), 3DNow!, SIMD
Using polyphase lowpass filter, transition band: 18671 Hz - 19205 Hz
Encoding 13.wav to 13.mp3
Encoding as 44.1 kHz VBR(q=2) j-stereo MPEG-1 Layer III (ca. 7.4x) qval=2
Frame | CPU time/estim | REAL time/estim | play/CPU | ETA
8206/8208 (100%)| 0:36/ 0:36| 0:36/ 0:36| 6.0251x| 0:00
32 [ 152] %***
128 [ 810] %*****************
160 [2876] %%%%%********************************************************
192 [3130] %%%%%%%%%%%%%%%%%*************************************************
224 [ 815] %%%%%*************
256 [ 238] %%****
320 [ 187] %***
average: 179.5 kbps LR: 1372 (16.72%) MS: 6836 (83.28%)
Writing LAME Tag...done
******** ORIGINAL WAVE FILE + LAME AS REFERENCE
D:\testdir>lame --preset standard 13.wav 13.mp3
LAME version 3.90.3 MMX (http://www.mp3dev.org/)
CPU features: i387, MMX (ASM used), 3DNow!, SIMD
Using polyphase lowpass filter, transition band: 18671 Hz - 19205 Hz
Encoding 13.wav to 13.mp3
Encoding as 44.1 kHz VBR(q=2) j-stereo MPEG-1 Layer III (ca. 7.4x) qval=2
Frame | CPU time/estim | REAL time/estim | play/CPU | ETA
8206/8208 (100%)| 0:35/ 0:35| 0:35/ 0:35| 6.0544x| 0:00
32 [ 151] ****
128 [ 779] %****************
160 [2771] %%%%%*****************************************************
192 [3188] %%%%%%%%%%%%%%%%%*************************************************
224 [ 872] %%%%%**************
256 [ 251] %%****
320 [ 196] %%***
average: 180.6 kbps LR: 1373 (16.73%) MS: 6835 (83.27%)
Writing LAME Tag...done
D:\testdir>
Could you please decode the test files and verify if any of these are the same? If they are we could try grouping some options.
dev0
QUOTE(dev0 @ Jun 23 2003 - 08:07 AM)
Could you please decode the test files and verify if any of these are the same? If they are we could try grouping some options.
dev0
None of them are exactly the same as you'll see e.g. by comparing the numbers of 160 kbps frames for each file.
mrosscook
Jun 23 2003, 10:32
In light of 2Bdecided's comments about the possible role of psymodels and ath in this effect, I thought it might be useful to see what happens if we compress both an original file and a wavegained version using ZIP (which of course has no psychoacoustics in it).
I used the track, "When I Paint My Masterpiece", from Rock of Ages, Disc 2, by Bob Dylan and the Band. The original is a 44.38 MB wav file. This was transformed into a gained file using wavegain with a -18.95 dB setting -- the -6.95 dB recommended by the program itself, plus another -12 dB for emphasis. Using ZIP at max compression,
Original compresses 44.38 MB to 42.27 MB (95%)
Gained compresses 44.38 MB to 35.63 MB (80%)
Certainly, no psymodel has a role here. I also thought it might be interesting to compress the original and gained files using a lossless audio codec (FLAC high) and two lossy codecs (LAME v.3.92 -aps, and MPC -q5); that gives the little table,
CODE
ZIP FLAC LAME MPC
Original 42.27(95%) 27.70(62%) 4.79(11%) 4.51(10%)
Gained 35.63(80%) 18.98(43%) 4.92(11%) 4.25(10%)
FLAC, like ZIP, gets a big boost in compression by aggressive wavegaining. The lossy codecs don't. LAME actually makes the gained file a little bit
larger than the original, but I doubt that is a meaningful difference.
What is the moral of the story? Maybe that wavegaining does produce loss (as suggested by ZIP and FLAC) but that the lossy-mode psychoacoustics are much larger effects and swamp out wavegain? Any comments would be welcome.
Lets forget the actual wavegaining, as 2bdecided said. I would like to hear from Gabriel or maybe Dibrom regarding the issues he brought up with --scale and ATH.
Fellas?
In addition, what this eventually comes down to is an ABX test. I can post some highly compressed tracks done with both mp3gain and --scale if you'd like, but I think you may be more successful with tracks you know well. I for one cannot identify a difference, beyond the slight volume difference due to the 89.0dB accuracy of the --scale method.
Seriously though, forget the wavegain/dithering method - there is no point in pursuing that - it will just confuse matters.
From a simple numeric comparison (considering the frame breakdown as well as the average bitrate) of john33's examples, it seems the following two are the closest to each other:
CODE
******** WaveGain, NO DITHER
D:\testdir>lame --preset standard 132.wav 132.mp3
LAME version 3.90.3 MMX (http://www.mp3dev.org/)
CPU features: i387, MMX (ASM used), 3DNow!, SIMD
Using polyphase lowpass filter, transition band: 18671 Hz - 19205 Hz
Encoding 132.wav to 132.mp3
Encoding as 44.1 kHz VBR(q=2) j-stereo MPEG-1 Layer III (ca. 7.4x) qval=2
Frame | CPU time/estim | REAL time/estim | play/CPU | ETA
8206/8208 (100%)| 0:35/ 0:35| 0:35/ 0:35| 6.0438x| 0:00
32 [ 151] ****
128 [ 790] %%***************
160 [2880] %%%%%********************************************************
192 [3157] %%%%%%%%%%%%%%%%%*************************************************
224 [ 803] %%%%%************
256 [ 247] %%****
320 [ 180] %***
average: 179.5 kbps LR: 1384 (16.86%) MS: 6824 (83.14%)
Writing LAME Tag...done
******** ORIGINAL WAVE FILE + LAME --scale
D:\testdir>D:\testdir>lame --preset standard --scale 0.691 13.wav 13.mp3
LAME version 3.90.3 MMX (http://www.mp3dev.org/)
CPU features: i387, MMX (ASM used), 3DNow!, SIMD
Using polyphase lowpass filter, transition band: 18671 Hz - 19205 Hz
Encoding 13.wav to 13.mp3
Encoding as 44.1 kHz VBR(q=2) j-stereo MPEG-1 Layer III (ca. 7.4x) qval=2
Frame | CPU time/estim | REAL time/estim | play/CPU | ETA
8206/8208 (100%)| 0:36/ 0:36| 0:36/ 0:36| 6.0251x| 0:00
32 [ 152] %***
128 [ 810] %*****************
160 [2876] %%%%%********************************************************
192 [3130] %%%%%%%%%%%%%%%%%*************************************************
224 [ 815] %%%%%*************
256 [ 238] %%****
320 [ 187] %***
average: 179.5 kbps LR: 1372 (16.72%) MS: 6836 (83.28%)
Writing LAME Tag...done
Now, most of the time when I use WaveGain I don't use dither anyway (look, if it were really an
essential step it should be the default, no?), but I know a lot of folks here do use it - all the time. However, from the bitrate counts it would appear
--scale is not pre-dithering anything... so the question becomes, how essential is dither to producing decent-sounding MP3s? Or does it even affect the sound to a measurable degree, since the process of psychoacoustic modeling is later in the chain?
- M.
This is a "lo-fi" version of our main content. To view the full version with more information, formatting and images, please
click here.