Help - Search - Members - Calendar
Full Version: Improving ReplayGain
Hydrogenaudio Forums > Hydrogenaudio Forum > General Audio
Pages: 1, 2
2Bdecided
Every now and again I wish I had the time to update the ReplayGain website and add some new ideas, and maybe even clarify some old ones. I don't, so this thread will have to do.


Firstly, the format used to store ReplayGain info in files is not documented correctly on the ReplayGain website, and it would be good to "publish" what has emerged as the standard for each format.

Secondly, what is stored is not documented correctly on the ReplayGain website, and I'd like to re-examine what is stored...


One change has already happened, and I think it's a good change:

Forget Radio and Audiophile - Track and Album are much better names.
(that's an open admission of me being wrong, for anyone who discussed this with me previously!)


So, we store:

ReplayGain Track adjustment
ReplayGain Album adjustment
(ReplayGain) Track peak
(ReplayGain) Album peak

(this last one wasn't in the original proposal, but it has been widely used - I've put it in bold to remind me to include it in the update)

That makes sense, and most software supports this. I'd like to formalise some extensions, some of which were there from the start, and others that have cropped up more recently:


1. (ReplayGain) undo adjustment
- this is written when the gain of the file is changed (e.g. by mp3gain, or by decoding with ReplayGain enabled), and is the gain change required to put the file back to where it started.

e.g. If I apply -8dB gain change using mp3gain, then
(ReplayGain) undo adjustment = +8dB

e.g. If I use --scale 0.5 when encoding (for whatever reason?!), then
(ReplayGain) undo adjustment = +6dB

If the gain of an already ReplayGained file is changed, the original four values (Track and Album adjustment and peak) should be updated so that they are correct for the new audio data. (see an example in this thread: http://www.hydrogenaudio.org/forums/index....topic=15412&hl= )

I can't see any argument against defining this field. It would be zero (or absent) if the audio file hasn't been altered. It's useful in all formats because you can always apply wavgain before encoding, and it would be nice to know that this has been done.


2. ReplayGain calculation method

OK - I've had this argument before, but this really is important. ReplayGain can be improved, but you'll never know whether files are tagged using the old or new ReplayGain calculation unless the calculation method (actually a number which corresponds to the method) is stored. This doesn't increase the complexity of players, as they won't care - it just makes it very easy to pick out files that were tagged with the old version, and update them.


3. ReplayGain lossy approximation

This is just a single bit: 0 or 1.

0= this ReplayGain info has been calculated from the data in this file
1=this file has been lossily encoded/transcoded since this ReplayGain info was calculated.

What's the point of this? If you have a file with ReplayGain info, you can transcode it and copy the RG info across. It'll be close enough to give you excellent loudness equalisation, and you won't have to re-calculate it. Yet they'll be a label there to tell all you anal retentives that it's not quite right, and should be recalculated if you want to be 100% sure (especially important for peak amplitude).

You could (should?) have one “ReplayGain lossy approximation” bit for each of the four values, which gives you the chance (for example) of re-calculating the peak values (quick, and important - so let's do it), but leaving the ReplayGain values (slow, and unimportant - so let's not do it).


4. ReplayGain user adjustment

Instead of suggesting that users should change the calculated values if they wish, give them a field to enter their own value if they really have to. Players should give the option to read the user value in preference to any others (i.e. let it act as an over-ride), and taggers should give the option of removing the user values from all (downloaded) files.


5. ReplayGain RealLife adjustment

The gain required to give the actual SPL of the original event (in a calibrated system), or a human judged sensible replay level (see the explanation behind the original "Audiophile" level and the work of Bob Katz if you think this is an impossible idea). I've found a few DVD-A discs that have this information (it's in the MLP stream), so it would be nice to have somewhere to store it. It's unlikely to get used much, but it would be a useful thing to have. It would be the last link in some of the best recordings out there.



I'd like to come to a consensus of which ones of these (if any/all) should be included, and then get some specs as to how they are/should be stored in each file format (especially APE2.0 tags) finalised and published on-line.

Comments? Suggestions? Offers of help?


btw I've received a couple of suggestions for improving the ReplayGain calculation. One is trivial, and seems like a great idea. I'll post it for testing when the problem of version numbering is solved. If anyone else has slightly or totally re-worked the ReplayGain algorithm/concept, now would be a good time to step forward! We could do listening tests to find the best candidate for "calculation version 2".


Cheers,
David.

Newbie warning: this thread is not for asking questions about ReplayGain that are already answered on www.replaygain.org or in previous threads on HA. (I'm always happy to answer "silly" questions via email – half of them aren't silly at all.)

However, if you do already have some understanding of ReplayGain then this thread is the perfect place for clarifying anything to do with the above proposals which is not clear.
Gabriel
I think that a point that should be clarified/update is in which format to store the rg values.

In the current Lame header, it is stored as floating point data. However, this could be a source of problems on some platforms. It appeared that we probably need an integer representation.
2Bdecided
Frank heavily criticised my proposal for storing RG info in .wav files because I used floating-point representation for the peak values; in what was basically a fixed point format.

I agree - there needs to be a resolution of this problem. Frank's idea was to use fixed point 16-bit, representing 0-65535 (i.e. 0-200% peak). It's one solution, with its own advantages and disadvantages: You can't store peaks above 200% (which do happen! Lossy encoding of modern CDs), and you can't do perfectly accurate 100% normalisation or clip prevention on 24-bit decodes (though you can get close enough - depends how anal you are).

A 32-bit INT would offer greater flexibility, but would it bring its own problems? I'm thinking: middle 16-bit as normal, lower 8 bits for increased resolution, upper 8 bits for >100%. Would this be difficult to program?


EDIT: Aren't the RG values themselves fixed point, using that horrible binary format I invented for the task? I don't propose using that binary format anymore, but fixed point would be good.

Cheers,
David.
Gabriel
You might be interested by:
http://sourceforge.net/mailarchive/forum.p...8&forum_id=5500
2Bdecided
Moderators:

On this page:
http://replaygain.hydrogenaudio.org//typic...al_results.html

The downloadable files aren't there. Moved? Deleted? Never there? I can find them if you want them, but it'll take a while.


Developers:

Almost everyone is using a reference level of 89dB, rather than the 83dB in the original ReplayGain proposal. Unless there are any objections, I'll change the official reference level to 89dB.

(It's a pity I didn't stick with the original idea of storing the ReplayGain level in the file e.g. 92dB instead of -3dB, because then the reference level wouldn't matter. Too confusing to change back now I think)

Cheers,
David.
2Bdecided
QUOTE(Gabriel @ Nov 18 2003, 04:32 PM)

Thanks - I agree. So - what range? And how?

It certainly needs to hold values above 100%, otherwise it's useless.

Cheers,
David.
Digga
QUOTE(2Bdecided @ Nov 18 2003, 05:35 PM)
(It's a pity I didn't stick with the original idea of storing the ReplayGain level in the file e.g. 92dB instead of -3dB, because then the reference level wouldn't matter. Too confusing to change back now I think)

I think it's a good idea to store the actual number, instead of the adjustment (just in terms looking good and beeing more clear, don't realy know about the technical problems included).
why too confusing? because ppl have got used to the method how it is now?
ppl can change. for me, this change would be something nice.
Mike Giacomelli
I would just like a way to write replaygain info into the gain/volume/whatever its called field on MP4 files. That way i could get some hardware support (Ipod).
Gabriel
QUOTE
Almost everyone is using a reference level of 89dB, rather than the 83dB in the original ReplayGain proposal. Unless there are any objections, I'll change the official reference level to 89dB.

Agree.


QUOTE
So - what range? And how?

Our problem is only with the peak value. In the Lame tag, it is stored using 32bits, so we have 32bits to define a format.

I would suggest just using an unsigned integer. Our needs are:
*beeing able to have enough precision for 0-100% range
*beeing able to store values higher than 100% (btw, how much higher?)

Ideally, a 24bits precision for the 0-100%range would be nice.

First proposal:
Use 0 - 100 000 as 0 - 100% range.
Precision is more than 24 bits (a little more than 26bits), and this would allow for about up to 4000% (considering that the maximum unsigned int value is 4G). Moreover, it is quite simple, just a linear scale.
2Bdecided
That would be fine.

Or... wink.gif


Would the following work. It's the same, but using a different linear scale factor, which fits in neatly with 16- and 24-bit data, like this:

Field = 32-bit INT.


For 16-bit audio data, use

00000000xxxxxxxxxxxxxxxx00000000

Where xxxxxxxxxxxxxxxx is the peak value.
(1000000000000000 is the largest possible value for linear 16-bit data, e.g. a .wav file)


For 24-bit audio data

00000000xxxxxxxxxxxxxxxxxxxxxxxx

Where xxxxxxxxxxxxxxxxxxxxxxxx is the peak value.

etc

If the peaks are greater than 200% then obviously the leading 0s would be used to indicate this. So, in the mp3 case, you find the peak using a decoder which allows headroom, and muliply the normalised result by (2^23).

Using (2^23) rather than 100000 (which you suggested) as the scale factor sounds strange, but it means 16 and 24-bit data can simply be pasted into the field just by shifting the bits, which would avoid multiplication and rounding errors.


digital full scale is
00000000100000000000000000000000
i.e. 2^23

You get exactly 24-bit accuracy, and 54dB of headroom (i.e. 51200%, I think!)


Would this be easy to program?


Should we change peak values to fixed point in all implementations?

Would it be easy for players to use, because I'm thinking about this being a useful convention to employ in all formats, since floating point isn't strictly needed, and is causing rounding confusion.

Or would it be stupid to change to fixed point for the peak value in other formats, because this would break compatibility with old players?

Cheers,
David.
Gabriel
Seems interesting. It would be nice to hear other opinions.
2Bdecided
Do none of the developers have any comments?


Two more issues:

1. Is there any chance of a service like freedb storing the replay gain values for tracks and albums to save us all a lot of time?

2. MTRH has reminded me that a ReplayGain logo is long overdue. Shall I launch a competition? If so, I'll wait until the HA one is well out of the way.

Cheers,
David.
phwip
QUOTE(2Bdecided @ Nov 21 2003, 12:24 PM)
1. Is there any chance of a service like freedb storing the replay gain values for tracks and albums to save us all a lot of time?

Would people really trust replay gain values stored on freedb to be correct? I use freedb with EAC to get track titles, etc. But this is because I know I can check these titles against the correct ones on the CD cover and change them where appropriate. Often there are spelling errors or other issues.

With replay gain values the only way I would know whether they are are correct would be to scan the files, and if I'm going to do that I don't need freedb anyway.
2Bdecided
QUOTE(phwip @ Nov 21 2003, 11:31 AM)
QUOTE(2Bdecided @ Nov 21 2003, 12:24 PM)
1. Is there any chance of a service like freedb storing the replay gain values for tracks and albums to save us all a lot of time?

Would people really trust replay gain values stored on freedb to be correct? I use freedb with EAC to get track titles, etc. But this is because I know I can check these titles against the correct ones on the CD cover and change them where appropriate. Often there are spelling errors or other issues.

Certainly the information on there has many errors. But these are human errors, and there's no room for human error when calculating and automatically submitting ReplayGain values.

There could be other problems:

1. Different releases of the same CD with different loudnesses.
Hopefully the different mastered versions will have slightly different TOCs. This is usually the case. In which case, they can be detected and catalogued as different versions by freedb.

2. The values are calculated from a different format (e.g. .wav when you have mp3, mp3 when you have mpc etc etc)
That's one reason for suggestion 3 in my first post. See there.

3. Someone has intentionally submitted incorrect values / someone changed the gain of an album before calculating the ReplayGain
Yes - that's a problem. As with other fields, people can correct the data, and/or the server can weed out erroneous entries because they'll be swamped with correct ones.


QUOTE
With replay gain values the only way I would know whether they are correct would be to scan the files, and if I'm going to do that I don't need freedb anyway.


You would have to calculate the peak values yourself anyway (easier and quicker than the ReplayGains) because they're encoding dependent, so the accuracy of the peak values is not an issue. (freedb should hold the peak values for the lossless versions).

For the actual ReplayGain values (Track and Album), if they make the tracks sound the same loudness as other tracks on playback, then it's doing its job, and that's fine. If they don't, then you'll notice, and you can recalculate them if you want.

But it doesn't matter if they're "correct" to how ever many decimal places, because ReplayGain is just an estimate. What matters is that the ReplayGain values work. If you want to, you can check if they work or not very quickly just by skipping through the album. If it's too loud or too quiet, they're wrong!

So there's no reason to recalculate them all to check their accuracy. If it was me, I'd happily grab all the ReplayGain values I needed from freedb, and only re-tag them myself if I heard a problem.

But maybe that's just me?

Cheers,
David.
n68
QUOTE(2Bdecided @ Nov 21 2003, 02:04 PM)
QUOTE


1. Different releases of the same CD with different loudnesses.
Hopefully the different mastered versions will have slightly different TOCs. This is usually the case. In which case, they can be detected and catalogued as different versions by freedb.



gday..

i guess the UPC code will take care of that..
(assuming there is a original rip)


smile.gif
robUx4
Just a question from a user point of view. iTunes has the ability to calculate the replain gain of a track. Is it the same base as for the ReplayGain values ? (I never checked if iTunes store the value in the file or not)
robUx4
QUOTE(2Bdecided @ Nov 18 2003, 05:35 PM)
Almost everyone is using a reference level of 89dB, rather than the 83dB in the original ReplayGain proposal. Unless there are any objections, I'll change the official reference level to 89dB.

Instead of storing +1dB compared to a reference (83dB). Why don't you store 84dB directly ? This way anyone can decide for his/her reference playback loudness.
guruboolez
QUOTE(robUx4 @ Nov 23 2003, 05:03 PM)
Just a question from a user point of view. iTunes has the ability to calculate the replain gain of a track. Is it the same base as for the ReplayGain values ? (I never checked if iTunes store the value in the file or not)

I played a short and very quiet sample (part of an orchestral recording) in iTunes : it was terribly much quieter than RG recommandations (+20 dB).
Can't sure that we could extrapolate this difference, but I suppose that iTunes gain system is different (less accurate too, according to the calculation speed).
2Bdecided
QUOTE(robUx4 @ Nov 23 2003, 04:05 PM)
QUOTE(2Bdecided @ Nov 18 2003, 05:35 PM)
Almost everyone is using a reference level of 89dB, rather than the 83dB in the original ReplayGain proposal. Unless there are any objections, I'll change the official reference level to 89dB.

Instead of storing +1dB compared to a reference (83dB). Why don't you store 84dB directly ? This way anyone can decide for his/her reference playback loudness.

If you have time, please read the entire thread wink.gif

You'll see I suggest switching back to this method.

However, I don't think it's realistic to switch now, because it would dramatically break compatability with existing players. This would be a very bad thing, unless someone can see a way around it.

The other additions will not break compatability with existing players, so it's just a question of whether developers want to implement them.

Cheers,
David.
Case
QUOTE(robUx4 @ Nov 23 2003, 06:05 PM)
QUOTE(2Bdecided @ Nov 18 2003, 05:35 PM)
Almost everyone is using a reference level of 89dB, rather than the 83dB in the original ReplayGain proposal. Unless there are any objections, I'll change the official reference level to 89dB.

Instead of storing +1dB compared to a reference (83dB). Why don't you store 84dB directly ? This way anyone can decide for his/her reference playback loudness.

This would not work. Players would still need to figure how much the gain needs to be changed since playback loudness isn't calibrated in any way. Media Jukebox would calculate volume change need with formula 83dB - 84dB = -1dB when others would calculate it with 89dB - 84dB = +5dB.
PS. your example is incorrect, 82dB + +1dB = 83dB, thus value to store would be 82 and not 84.
dev0
3. ReplayGain lossy approximation

Storing this seems pointless to me, since ReplayGain calculations will become inaccurate after transcoding and no tool should be copying ReplayGain values when transcoding.
/\/ephaestous
QUOTE(2Bdecided @ Nov 21 2003, 06:24 AM)
1. Is there any chance of a service like freedb storing the replay gain values for tracks and albums to save us all a lot of time?

I RG my discs before burning a backup, so if a friend pops that copy, the TOC will match and give back erroneous RG info.

To solve this we could store the RG info, plus a RG value for, say, the first 30 seconds of the album, so the RG info for that part of the disc is calculated and sent with the query (generated Disc ID). This way one could be sure the RG info is correct if the sent value and the value in the db match (with a +-5% confidence).
/\/ephaestous
QUOTE(dev0 @ Nov 23 2003, 12:36 PM)
3. ReplayGain lossy approximation

Storing this seems pointless to me, since ReplayGain calculations will become inaccurate after transcoding and no tool should be copying ReplayGain values when transcoding.

the inaccurancy is minimal enough to be dismissed. This is a small test I made:

Iron Maiden - [Dance of Death #04] Montségur [5:48]

PCM
-10.54 dB

PCM --> MP3
-10.55 dB

PCM --> MP3 -->Musepack
-10.53 dB

PCM --> MP3 -->Musepack --> Vorbis
-10.57 dB

PCM --> MP3 -->Musepack --> Vorbis --> Wavpack (Lossy)
-10.57 dB

PCM --> MP3 -->Musepack --> Vorbis --> Wavpack (Lossy) --> Nero MP4
-10.54 dB

The biggest difference was -0.03dB which is a -0.284% diff from the original, I picked this track because is loud enough to make most lossy encoders go beyond full scale.
guruboolez
Try lame abr for exemple (there's a --scale 0.98 included in the preset). Difference will be higher.
2Bdecided
QUOTE(Case @ Nov 23 2003, 04:55 PM)
QUOTE(robUx4 @ Nov 23 2003, 06:05 PM)
QUOTE(2Bdecided @ Nov 18 2003, 05:35 PM)
Almost everyone is using a reference level of 89dB, rather than the 83dB in the original ReplayGain proposal. Unless there are any objections, I'll change the official reference level to 89dB.

Instead of storing +1dB compared to a reference (83dB). Why don't you store 84dB directly ? This way anyone can decide for his/her reference playback loudness.

This would not work. Players would still need to figure how much the gain needs to be changed since playback loudness isn't calibrated in any way. Media Jukebox would calculate volume change need with formula 83dB - 84dB = -1dB when others would calculate it with 89dB - 84dB = +5dB.

Sorry Case, but I think you're wrong.

At the moment, people store the gain change needed to match a standard loudness. Most use 89dB as that standard, but some use 83dB. So, there's confusion.

But they all measure the "perceived" loudness of the track the same way. (They're all taking my "pink_ref.wav" file, or whatever it was called, to be 83dB, after SMPTE RP-200 - after a real, and long existing standard). So if you store the "perceived" loudness, there's no confusion.

e.g. perceived loudness of track = 93dB SPL.
Musepask relates this to 89dB, and stores a ReplayGain of -4dB
MediaPlayer relates this to 83dB, and stores a ReplayGain of -10dB
But in both cases, the perceived loudness of the original track is 93dB.

It should be apparent that by just storing 93dB, any player can figure out what to do. (target volume - 93dB = required gain change, e.g. 89-93=-4dB).


BUT, though I think it would be nice to do this, I'm not saying we should; it would break compatibility with existing players, wouldn't it? They're expecting the gain changed in the tag, and would read it as a +93dB ReplayGain - that's just a bit too loud!

Cheers,
David.
2Bdecided
QUOTE(dev0 @ Nov 23 2003, 05:36 PM)
3. ReplayGain lossy approximation

Storing this seems pointless to me, since ReplayGain calculations will become inaccurate after transcoding and no tool should be copying ReplayGain values when transcoding.

The ReplayGain values will be close enough. The peak values may not be, but they are much quicker to re-calculate.

Cheers,
David.
2Bdecided
QUOTE(guruboolez @ Nov 24 2003, 10:18 AM)
Try lame abr for exemple (there's a --scale 0.98 included in the preset). Difference will be higher.

But in the case where an intentional gain change is applied, it's each to correct the ReplayGain values.

No software current "transcode" the ReplayGain values by default, so nothing is copying over incorrect values.

I'd suggest that any software that does "transcode" the ReplayGain values should
a) set the "lossy" (or whatever it gets called) flag, and
b) correct the values for any known gain change applied during the process

Both are much much quicker and easier than re-calculating the ReplayGain values.



I'd better mention something here: All this should make things a lot easier for the user. Any extra complexity introduced by these additions will go into the software, and be hidden from the user. The result should be that the software is able to do the "right thing" by default. Very simple.

Cheers,
David.
guruboolez
And what about the idea of a personnal track/album gain, set by the user? Purpose:
- useful if flaws in the RG calculation model
- useful if an audiophile want to keep a better coherency between different albums. For exemple, RG makes a gamba, an harpsichord, a flute sound as loud as an orchestra or a heavy metal band. It's the purpose of RG to do it. The idea is nice, but in some case, it doesn't have sense. I've recently bought a CD, anthology of the best sound recording of te year. The booklet is clear: some instrumental tracks have to be played much quieter than others in order to maintain high-fidelity principles. I've another disc, with an instrument called "clavicorde" (small harpsichord). The mastering level is very quiet; why? Because instrument sound is covered by human voice. RG will explode the volume (and background noise), and ruin the engineer and artist's will.

I suppose that RG can't determine if an instrument should be louder than another. Therefore, manual correction (and software tool for batch correction) is really needed.
2Bdecided
See my reply in your other thread, and my suggestion for "user" and "real" ReplayGains in my first post in this thread.

Please reply in this thread.

Cheers,
David.
Case
QUOTE(2Bdecided @ Nov 24 2003, 12:50 PM)
Sorry Case, but I think you're wrong.

Yup, I realized it seconds after posting. I had reference levels and calibrations in my mind and didn't consider the possibility of skipping all that during scanning.
2Bdecided
Another suggestion (this isn't fundamental)...

It would be useful to copy over the DialNorm and MixLev values from Dolby Digital (AC-3) data when it's transcoded.

MixLev should go into the new "ReplayGain Real" field, and DialNorm could probably go into the existing Album Gain field (in which case the new field to indicate how the gain was calculated would be useful).

I'll figure out appropriate conversion factors, and maybe seek help from the Doom9 crowd.

Cheers,
David.
andyh
I'm confused as to how the RealLife level is different than the artist/producer origin code in the id3 proposal. Would it be more consistent to include a separate track and album setting for this setting? I also don't understand how storing the calculation method would work. Theoretically the track gain could have been calculated by version 1 of the algorithm and the album gain could have been read from the cd. Which value would be stored in the calculation method field? Is it stored seperately for the track and the album field?

I think it would be a good idea to keep the id3 tag spec up to date with these suggestions. Since nobody has implemented it yet, I think that we should not worry about keeping it compatible.

Since David has said that he doesn't like the format of the gain values, I would like to change those as well. I think that the gain values should be stored as signed integers by simply multiplying the value by ten(or one hundred if the extra precision is usefull). Information about which values are set could be stored in a bitfield along with the lossy bit.

If the lame header is going to be changed to use an int for the peak value, now would probably be the best time to change the formats of the gain values as well. It might be nice if they would allocate space for the album peak value as well.

Here is my proposal for the contents of the id3 frame:

#define LOSSY 0x1
#define HAS_AUTO_TRACK_GAIN 0x2
#define HAS_AUTO_ALBUM_GAIN 0x4
#define HAS_USER_TRACK_GAIN 0x8
#define HAS_USER_ALBUM_GAIN 0xf
#define HAS_PRODUCER_TRACK_GAIN 0x10
#define HAS_PRODUCER_ALBUM_GAIN 0x20

struct {
long track_peak;
long album_peak;
char calculation_method;
short reference_gain;
short bitfield;
signed short auto_track_gain;
signed short auto_album_gain;
signed short user_track_gain;
signed short user_album_gain;
signed short producer_track_gain;
signed short producer_album_gain;
short right_undo;
short left_undo;
};

I have included both left and right undo values because mp3gain is storing both in the APE tags. I don't think anybody really scales the channels seperately, but I think that it would be good to store the same data in the different tag formats.

I would also like to know whether anyone intends to request that the new frame be added to the id3 spec. Section 3.3 of the 2.3.0 spec says:

The frame ID made out of the characters capital A-Z and 0-9. Identifiers beginning with "X", "Y" and "Z" are for experimental use and free for everyone to use, without the need to set the experimental bit in the tag header. Have in mind that someone else might have used the same identifier as you. All other identifiers are either used or reserved for future use.

If no one intends to propose adding replaygain to id3, we will need to rename the frame. Would "XRGA" be acceptable?

Any comments or suggestions would be welcome.
Lear
QUOTE(2Bdecided @ Nov 18 2003, 05:35 PM)
(It's a pity I didn't stick with the original idea of storing the ReplayGain level in the file e.g. 92dB instead of -3dB, because then the reference level wouldn't matter. Too confusing to change back now I think)

Interesting... I suggested changing it like that a year and a half ago, but you weren't too fond of the idea then (see here)... tongue.gif

QUOTE(2Bdecided @ Nov 24 2003, 11:50 AM)
At the moment, people store the gain change needed to match a standard loudness. Most use 89dB as that standard, but some use 83dB. So, there's confusion.

But they all measure the "perceived" loudness of the track the same way. (They're all taking my "pink_ref.wav" file, or whatever it was called, to be 83dB, after SMPTE RP-200 - after a real, and long existing standard). So if you store the "perceived" loudness, there's no confusion.

And this is the very reason why I suggested the change! biggrin.gif

(Btw, I must've missed this thread when it started... I should read through it, in case I have any comments.)

(Edit: Added second quote.)
Lear
QUOTE(2Bdecided @ Nov 20 2003, 12:43 PM)
Field = 32-bit INT.


For 16-bit audio data, use

00000000xxxxxxxxxxxxxxxx00000000

Where xxxxxxxxxxxxxxxx is the peak value.
(1000000000000000 is the largest possible value for linear 16-bit data, e.g. a .wav file)


For 24-bit audio data

00000000xxxxxxxxxxxxxxxxxxxxxxxx

Where xxxxxxxxxxxxxxxxxxxxxxxx is the peak value.

One problem is that you can't differentiate "24 bit where the low 8 bits just happen to be 0" from "16 bit". So why not keep it simple, i.e. fixed point, where 1.0 is full scale. 23 bits fraction is enough, but I think 24 bits would be "cleaner" (e.g., 1.0 would then be 0x01000000). Allowing 256 times full scale ought to be enough... cool.gif

QUOTE
Should we change peak values to fixed point in all implementations?

Would it be easy for players to use, because I'm thinking about this being a useful convention to employ in all formats, since floating point isn't strictly needed, and is causing rounding confusion.

Or would it be stupid to change to fixed point for the peak value in other formats, because this would break compatibility with old players?

Doing it only for consistency isn't that important, IMO. Both are about as easy, I'd say (not that I've done much fixed-point stuff). It could be good to keep the precision about the same though (VorbisGain does that).

If they are stored in human readable format (i.e. Vorbis or APE tags), I'd say floating point is preferable, as it is easier to understand, even if it would require a bit more code on (embedded) systems without an FPU.
SamK
I think it's the right time to switch to absolute replaygain value (90dB instead of +1dB).
Most the programs that support replaygain atm are frequent updates program, so backward player compatibility shouldnt be a problem too long.
If some player take months to update, its users would just have to stick to relative gain values.

Anyway, changing the representation of the number (fixed / float / ..) would break the compatibility all the same, wouldnt it ? So it's definitely the right time to do both changes at once.

I don't think it's a problem as long there is backward compatability for the files themselves, ie an old file with replaygain value should still be supported by new-replaygain supporting players.

if both value meaning and value encoding are to be changed, it sounds safer to choose between old and new meaning from another data. And the proposed 'method calculation Version' field presence would be enough to know it's a new gain tag.

I'm for applying all the good changes at once.

If you're really concerned about the risk of someone sueing you after playing a new replaygained file with an old player and blowing his ears up due to ludicrous pre-amping , let's just use another name for the gain value. RG2, whatever, and this wont be a risk anymore.

--
SamK
knik
QUOTE(2Bdecided @ Nov 18 2003, 07:35 PM)
Almost everyone is using a reference level of 89dB, rather than the 83dB in the original ReplayGain proposal. Unless there are any objections, I'll change the official reference level to 89dB.

(It's a pity I didn't stick with the original idea of storing the ReplayGain level in the file e.g. 92dB instead of -3dB, because then the reference level wouldn't matter. Too confusing to change back now I think)

Does the '+92dB' approach use 16-bit min RMS (+-1 samples) as a reference or am I missing something?
I think reference level should be bit depth independent e.g. max RMS.
If the current ref level is some (maxrms - 7dB) then I think it's not bad.

Edit:
After closer look:
83dB = 14125.4 and 16-bit maxrms = 32768, hence 83dB = maxrms - 7.3dB

I would suggest to redefine reference level from 83dB to maxrms-7dB. It would be much less confusing.
Mike Giacomelli
Stupid question: Is 0dB relative also 96 dB in 16 bit? I'm not sure what it means when i set the volume to -89dB.
SamK
QUOTE(knik @ Jan 5 2004, 12:16 PM)
After closer look:
83dB = 14125.4 and 16-bit maxrms = 32768, hence 83dB = maxrms - 7.3dB

I would suggest to redefine reference level from 83dB to maxrms-7dB. It would be much less confusing.

ah ok, I see what you mean.
Considering a signal as a flow of unitless, infinite precision numbers.
ReplayGain computes a reference level (95-th percentile of all 0.05s frames RMS values). this unitless number is turned into a dB, let's call it absRL.

If a signal in [-1, 1] is multiplied by 2^depth, the absRL is shifted :
8 bit : (max/oldmax)^2 =(2^8)^2 ~= 6.5 *10^5 ~= 10^4.8 => absRL += 48dB
16 bit : (max/oldmax)^2=(2^16)^2 ~= 4.2 *10^9 ~= 10^9.6 => absRL += 96dB
24bit : (max/oldmax)^2=(2^24)^2 => absRL +=144dB

so let"s call those values:
fullScaleDB(bit_depth) = (bit_depth /8) * 10*log(2^16)
(adds 48.165dB every 8bit..)

If files of varying bitdepths were common, someone looking at their absRL would need to substract them with this 48.165*bit_depth/8 in order to know which one sounds louder when played at full volume.

So you're right, it's better to store :

(absRL(song) - fullScale_dB(bit_depth) )

which is in the fact the absRL of the songs if its samples are scaled back to [-1, 1].
it would be the 'absolute normalized Reference Level', ANRL.

btw I think the ANRL can still be positive, due to the filtering done before computing RMSs - which boosts human-sensitive frequencies and dampens others, so it can produce some samples > 1.0 from a [-1,1]-normalized signal.
(a song in 16 bit can be at absRL=100 dB or even a bit more)
2Bdecided
QUOTE(Lear @ Jan 2 2004, 06:10 PM)
QUOTE(2Bdecided @ Nov 18 2003, 05:35 PM)
(It's a pity I didn't stick with the original idea of storing the ReplayGain level in the file e.g. 92dB instead of -3dB, because then the reference level wouldn't matter. Too confusing to change back now I think)

Interesting... I suggested changing it like that a year and a half ago, but you weren't too fond of the idea then (see here)... tongue.gif

QUOTE(2Bdecided @ Nov 24 2003, 11:50 AM)
At the moment, people store the gain change needed to match a standard loudness. Most use 89dB as that standard, but some use 83dB. So, there's confusion.

But they all measure the "perceived" loudness of the track the same way. (They're all taking my "pink_ref.wav" file, or whatever it was called, to be 83dB, after SMPTE RP-200 - after a real, and long existing standard). So if you store the "perceived" loudness, there's no confusion.

And this is the very reason why I suggested the change! biggrin.gif

(Btw, I must've missed this thread when it started... I should read through it, in case I have any comments.)

(Edit: Added second quote.)

Hi Lear!

I remember that thread! There was no way I was going to change it back again and confuse everyone again, since the argument was basically about whether or not to add 83dB at the end. I naively assumed that everyone would follow the suggestion, and there would be no confusion. Ha - some chance! rolleyes.gif laugh.gif

It's reminded me of something though: I expected people to think that things were too quiet, so suggested the player should default to adding 6dB to the values. What people chose to do instead was to make the calculation add 6dB to the values (if you think about it, the values stored in every file are 6dB greater than I suggested - because they get you to 89dB, not 83dB).

I wonder if I'd stuck with my original thought (what you proposed) if there still would have been confusion because someone would get the calculation to add 6dB to the value to have the same effect. Or else they would see that all the players used 89dB as a reference, but their calculator used 83dB as a reference, and change it. Or they'd just take the ref_pink.wav file and boost it by 6dB.

I do, in retrospect, think adding 83dB (and hence storing 92dB instead of -3dB or -9dB) is a better solution. But I have a feeling that someone would still have managed to mess it up!
knik
QUOTE(2Bdecided @ Jan 6 2004, 05:12 PM)
I do, in retrospect, think adding 83dB (and hence storing 92dB instead of -3dB or -9dB) is a better solution. But I have a feeling that someone would still have managed to mess it up!

I really think we should forget about 16-bit dynamic range and use maxrms as a reference otherwise we will always have some confusion.
knik
QUOTE(SamK @ Jan 6 2004, 04:52 PM)
So you're right, it's better to store :

(absRL(song) -  fullScale_dB(bit_depth) )

which is in the fact the absRL of the songs if its samples are scaled back to [-1, 1].
it would be the 'absolute normalized Reference Level', ANRL.

Yes, that's the point. We should use 1.0 as a reference for [-1,1] samples and we don't need any sample bit-depth assumption here.
It can always be rescaled to the actual output sample depth.
2Bdecided
QUOTE(knik @ Jan 6 2004, 07:56 PM)
QUOTE(SamK @ Jan 6 2004, 04:52 PM)
So you're right, it's better to store :

(absRL(song) -  fullScale_dB(bit_depth) )

which is in the fact the absRL of the songs if its samples are scaled back to [-1, 1].
it would be the 'absolute normalized Reference Level', ANRL.

Yes, that's the point. We should use 1.0 as a reference for [-1,1] samples and we don't need any sample bit-depth assumption here.
It can always be rescaled to the actual output sample depth.

knik,

I didn't get around to replying to your (and other people's) posts because I didn't have the time, but I'd better squash this idea before it goes any further.

ReplayGain is referenced to SMPTE RP 200, a calibration by which a -20dB FS RMS pink noise signal will give a real world SPL of 83dB. All RG figures come from this concept, and all ReplayGain values are the gain adjustments needed to make that track (or album) match the perceived loudness of that test signal. (+6dB in most implementations)

The values are not based on bit depth. The notion of "how loud" a full scale sine wave is flows from SMPTE RP 200, and it is not 90dB, 96dB or 144dB. It's frequency dependent, but will be 103dB SPL for 2kHz (IIRC in the calculations I originally proposed).

The exact values depend on the "psychoacoustic" model used to determine the loudness of a given track or album. Different psychoacoustic models can be calibrated to the SMPTE RP 200 standard and used interchangeably (This means people can improve or change the ReplayGain calculation without messing everything up - compatibility and interchangeability is ensured).

Taking a non psychoacoustic standard (i.e. choosing digital full scale to equal some dB value) would make it very difficult to update the psychoacoustic model and calibrate it with previous versions. There are already several incompatible, uncalibrated, and largely unused methods for “correcting the loudness differences between tracks or albums”. I didn’t want to create yet another one!

The common sense approach to calibrating a system which judges perceived loudness is to define a specific test signal, and how loud this signal should be. As the industry has already done this, it made sense to follow this existing calibration.

Hope this helps. Please read http://www.replaygain.org/ for more information.

Cheers,
David.
2Bdecided
QUOTE(Lear @ Jan 2 2004, 09:46 PM)
QUOTE(2Bdecided @ Nov 20 2003, 12:43 PM)
Field = 32-bit INT.


For 16-bit audio data, use

00000000xxxxxxxxxxxxxxxx00000000

Where xxxxxxxxxxxxxxxx is the peak value.
(1000000000000000 is the largest possible value for linear 16-bit data, e.g. a .wav file)


For 24-bit audio data

00000000xxxxxxxxxxxxxxxxxxxxxxxx

Where xxxxxxxxxxxxxxxxxxxxxxxx is the peak value.

One problem is that you can't differentiate "24 bit where the low 8 bits just happen to be 0" from "16 bit". So why not keep it simple, i.e. fixed point, where 1.0 is full scale. 23 bits fraction is enough, but I think 24 bits would be "cleaner" (e.g., 1.0 would then be 0x01000000). Allowing 256 times full scale ought to be enough... cool.gif

But it is fixed point, and I don't see why you'd need to "differentiate" between 24-bits (last 8 bits zero) and 16-bits. Can you explain?


QUOTE
QUOTE

Should we change peak values to fixed point in all implementations?

Would it be easy for players to use, because I'm thinking about this being a useful convention to employ in all formats, since floating point isn't strictly needed, and is causing rounding confusion.

Or would it be stupid to change to fixed point for the peak value in other formats, because this would break compatibility with old players?

Doing it only for consistency isn't that important, IMO. Both are about as easy, I'd say (not that I've done much fixed-point stuff). It could be good to keep the precision about the same though (VorbisGain does that).

If they are stored in human readable format (i.e. Vorbis or APE tags), I'd say floating point is preferable, as it is easier to understand, even if it would require a bit more code on (embedded) systems without an FPU.


You might say that, but Frank Klemm simply said "Floating point is a stupid idea" and coded it fixed point, 16-bit, with 6dB headroom above digital full scale. And he did that on the format "MusePack" which has 24-bit encoders and decoders, and can easily peak above 6dB above digital full scale. His argument was that he had 16 bits spare, he didn't want to use floating point, and what he stored should be enough to prevent clipping in all but the most severe situations.

When other people are coding it, you have to try to please them as well as yourself!

Cheers,
David.
Gabriel
Lame is using the fixed point representation from David since 3.94b
SamK
QUOTE(2Bdecided @ Jan 7 2004, 02:07 PM)
ReplayGain is referenced to SMPTE RP 200, a calibration by which a -20dB FS RMS pink noise signal will give a real world SPL of 83dB. All RG figures come from this concept, and all ReplayGain values are the gain adjustments needed to make that track (or album) match the perceived loudness of that test signal. (+6dB in most implementations)

The values are not based on bit depth. The notion of "how loud" a full scale sine wave is flows from SMPTE RP 200, and it is not 90dB, 96dB or 144dB. It's frequency dependent, but will be 103dB SPL for 2kHz (IIRC in the calculations I originally proposed).

ah ok, the replaygain is already bitdepth independant. I had read most of replaygian documents, but this wasnt clearly stated anywhere.
If I had known matlab's wavread function returns an array of numbers in [-1, 1], I would have gotten the clue from the matlab demonstration code..
Maybe you should add a first step in the 4-step "General Concept" at http://replaygain.hydrogenaudio.org/rms_energy.html, like :
0. the signal is converted to floating point numbers, and divided by the full scale of the original format. (which is 2^15 for 16 bit integer encoding)

or something, to insure everyone gets this point.

To sum up what I understood,
replaygain computations are bitdepth independant from the start,
and the proposal is to store

Vrms = 83+ (replaygain(filename) - ref_Vrms);

(with ref_Vms being the gain of the standard digital signal corresponding to 83db SPL
ref_Vrms = replaygain("pink_ref.wav"); )

instead of previous :
Vrms = - (replaygain(filename) - ref_Vrms);

Then players would now use the stored value like that :
average_song_Vrms = 89; // user setting
rel_gain = average_song_Vrms - Vrms;
ratio = 10^(rel_gain/20);
// multiplies decoded samples by ratio.
SamK
reading http://home.earthlink.net/~bobkatz24bit/integrated.html, and the K-N VU meters, I realized there is no reason why the magic 83dB number from SMTPE RP200 standard should appear in ReplayGain. The computation is all in the digital domain, no SPL number should arise.

What SMTPE RP200 brings to us is only a standard -20dBFS signal to calibrate measures on.
The fact that this signal is supposed to actually produce sound at 83dB SPL in a calibrated hi-fi system is of no importance here, as we're only doing things *before* the actual sound system.

in fact, if you have a song with replaygain = +20 dB (relative to the original 83dB reference), it really means it is measured to perceptually sound 10 times louder overall than the reference pink noise signal (which is used as calibration reference for replaygain = +0dB)

That's all.

The real point of Replaygain is to compute
HR = replaygain -20
- aka : (AbsReplaygain-83) - 20
as a good measure of the overall headroom of the song. (ratio between peak capability of medium and "average level").

Indeed, if you take the -20dB FS standard pink noise sound, whose replaygain is exactly 83dB (by definition), HR will be exactly -20dB.
translate that to any signal, and you get HR to be indicative of the overall headroom.
It will be slightly negative values for most pop songs (maybe possibly slightly positive for a real loud sound concentrated in frequencies boosted by the psychoacoustics filter in use)
And could be lower than -10dB for classical music or anything with a bit more dynamic range.

So, if it is decided to switch to storing an absolute value, I'm suggesting storing the value HR.
(which is in fact the relative value minus 20 .. )
It gives all the info replaygain has to give, is independent to bit_depth AND the psychoacoustics filter used just as well as current replaygain is.
Plus it only takes from the SMTPE RP200 standard what it really uses : the choice of a reference signal so that different psychoacoustic implementations can calibrate on it.

And its value is much more intuitive, much less confusing than expressing the value in terms of SPL produced by calibrated system, which does not belong here.

Is it not ?
Lear
QUOTE(2Bdecided @ Jan 7 2004, 02:16 PM)
QUOTE(Lear @ Jan 2 2004, 09:46 PM)

One problem is that you can't differentiate "24 bit where the low 8 bits just happen to be 0" from "16 bit". So why not keep it simple, i.e. fixed point, where 1.0 is full scale. 23 bits fraction is enough, but I think 24 bits would be "cleaner" (e.g., 1.0 would then be 0x01000000). Allowing 256 times full scale ought to be enough...  cool.gif

But it is fixed point, and I don't see why you'd need to "differentiate" between 24-bits (last 8 bits zero) and 16-bits. Can you explain?


If you decode the value in the same way, regardless of bit depth, you'll get a kind of rounding error (or whatever it should be called) when dealing with the 16-bit value. E.g., 0x3FFF00 (half scale in 16 bit) is not the same as 0x3FFFFF (half scale in 24 bit). Sure, the error will be small, but it'll be there. smile.gif (Of course, if the processing is all done in 16 bits it doesn't matter, as the low bits will be thrown away.)

QUOTE
You might say that, but Frank Klemm simply said "Floating point is a stupid idea" and coded it fixed point, 16-bit, with 6dB headroom above digital full scale. And he did that on the format "MusePack" which has 24-bit encoders and decoders, and can easily peak above 6dB above digital full scale. His argument was that he had 16 bits spare, he didn't want to use floating point, and what he stored should be enough to prevent clipping in all but the most severe situations.


I'd guess he did it that way because there were 16 bits of reserved space in the file format he could use, so he squeezed in what he could. But that doesn't mean other file formats should do it like that. Still, the actual format in the tag isn't very important, IMO, as long as the necessary resolution is there.
knik
Thanks for explanation, 2Bdecided. It really helped.
Now I see RG reference level is well defined.
2Bdecided
QUOTE(SamK @ Jan 7 2004, 02:57 PM)
QUOTE(2Bdecided @ Jan 7 2004, 02:07 PM)
ReplayGain is referenced to SMPTE RP 200, a calibration by which a -20dB FS RMS pink noise signal will give a real world SPL of 83dB. All RG figures come from this concept, and all ReplayGain values are the gain adjustments needed to make that track (or album) match the perceived loudness of that test signal. (+6dB in most implementations)

The values are not based on bit depth. The notion of "how loud" a full scale sine wave is flows from SMPTE RP 200, and it is not 90dB, 96dB or 144dB. It's frequency dependent, but will be 103dB SPL for 2kHz (IIRC in the calculations I originally proposed).

ah ok, the replaygain is already bitdepth independant. I had read most of replaygian documents, but this wasnt clearly stated anywhere.
If I had known matlab's wavread function returns an array of numbers in [-1, 1], I would have gotten the clue from the matlab demonstration code..
Maybe you should add a first step in the 4-step "General Concept" at http://replaygain.hydrogenaudio.org/rms_energy.html, like :
0. the signal is converted to floating point numbers, and divided by the full scale of the original format. (which is 2^15 for 16 bit integer encoding)

or something, to insure everyone gets this point.

I think, if you follow it through, it doesn't matter whether wavread returns [1,-1] or [-32768,32767] (you're right saying that it returns the former). As long as the value "ref_Vrms" has been calculated by the same method (which is essential anyway), then calibrating to it (i.e. subtracting it at the end) will cancel out whatever scaling or units or whatever are used at the input. That's because both the file in question, and the ref_pink.wav file will be scaled the same on the way in (to [1,-1] or [-32768,32767] or whatever). Subtracting in the logarithmic domain (which dB is) is the same as dividing in the linear domain. So any scaling is cancelled in this last step.


QUOTE

To sum up what I understood,
replaygain computations are bitdepth independant from the start,
and the proposal is to store

Vrms = 83+ (replaygain(filename) - ref_Vrms);

(with ref_Vms being the gain of the standard digital signal corresponding to 83db SPL
ref_Vrms = replaygain("pink_ref.wav");  )

instead of previous :
Vrms = - (replaygain(filename) - ref_Vrms);

Then players would now use the stored value like that :
average_song_Vrms = 89; // user setting
rel_gain = average_song_Vrms - Vrms;
ratio = 10^(rel_gain/20);
// multiplies decoded samples by ratio.


Yes exactly - though I'm not strongly suggesting we change it. I was saying it's a pity it isn't like this already, but should it be changed now?

I'll answer your other post, and them expand on that point...


EDIT: 1000th post! Should have made it better! laugh.gif
2Bdecided
QUOTE(SamK @ Jan 7 2004, 07:34 PM)
reading http://home.earthlink.net/~bobkatz24bit/integrated.html, and the K-N VU meters, I realized there is no reason why the magic 83dB number from SMTPE RP200 standard should appear in ReplayGain. The computation is all in the digital domain, no SPL number should arise.

[snip]

And its value is much more intuitive, much less confusing than expressing the value in terms of SPL produced by calibrated system, which does not belong here.

Is it not ?

No, because perceived loudness depends on loudness!

This isn't built into the current psychoacoustic model, but could well be implemented in a future improvement....

If you're listening to a bass heavy track at 60dB, you'll hear much less bass (relatively) than you will at 80dB. This means that increasing the gain on a bass heavy track by 20dB will cause its subjective loudness to be increased more than a 20dB boost to a bass light track. What's more, the perceived loudness increase of that 20dB boost will be different if it's a boost from 40dB to 60dB than if it's a boost from 80dB to 100dB.

If the equal loudness curves were parallel lines, then we wouldn't really have to worry about real world sound pressure. They're not, so it's an issue, and it can only be solved if we make some kind of guess (like the floating ATH in the lame encoder), or calibrate the system properly to a real world loudness - which is what I've chosen to do.

Hope this makes sense.

Cheers,
David.

EDIT: plus see my previous response about how many other schemes exist which are unused because no one knows how they are supposed to be calibrated, or re-calibrated.
This is a "lo-fi" version of our main content. To view the full version with more information, formatting and images, please click here.
Invision Power Board © 2001-2008 Invision Power Services, Inc.