R128GAIN: An EBU R128 compliant loudness scanner |
![]() ![]() |
R128GAIN: An EBU R128 compliant loudness scanner |
Jan 7 2011, 19:56
Post
#51
|
|
![]() Group: Members Posts: 395 Joined: 13-June 10 Member No.: 81467 |
Anyhow, great work in progress Thanks I am excited to see further developement and how it coud work with RG. R128GAIN writes the RG tags (currently only for FLAC). I've re-tagged all my FLACs and listen (shuffling) to them all the time with RG enabled. That's my most important test case |
|
|
|
Jan 7 2011, 21:34
Post
#52
|
|
|
Group: Members Posts: 698 Joined: 6-March 10 Member No.: 78779 |
What R128GAIN does is the following (in principle):
I agree with C.R.H. that this interpretation doesn't seem to follow Tech 3341, Annex 1.
I think it would be simpler to work with block indices (a list, array, or bitmap) than a ring buffer.* Read the input stream two times block by block and skip the calculated indices. Usually, with the buffering left to the OS, you should be reading the second pass from memory automatically. PS The wording in Annex 1 could be better. Especially the "gated loudness" LKG in (6) and (8) should use different symbols. But mathematically it is not ambiguous. It took me over an hour to crunch the whole thing, though. * Of course, you can still use something as a ring buffer for I/O. But I would put a block-wise abstraction layer on top of it to make the overall design simpler. This post has been edited by googlebot: Jan 7 2011, 22:14 |
|
|
|
Jan 7 2011, 22:17
Post
#53
|
|
![]() Group: Members Posts: 395 Joined: 13-June 10 Member No.: 81467 |
I agree with C.R.H. that this interpretation doesn't seem to follow Tech 3341, Annex 1. Probably you're both are right, I have to think about it. What I have in mind (probably not correct) is the following:
What do you mean by "loudness of a set of blocks"? Doesn't it imply to count samples more than once? It seems to me that what I've implemented is the limit of what you get if you let go the overlap to 100%. If this is true than it would be fully compliant because they require 50% at a minimum. This post has been edited by pbelkner: Jan 7 2011, 22:21 |
|
|
|
Jan 7 2011, 22:35
Post
#54
|
|
|
Group: Members Posts: 698 Joined: 6-March 10 Member No.: 78779 |
It seems to me that what I've implemented is the limit of what you get if you let go the overlap to 100%. No. Because, for true overlap, the answer to this Doesn't it imply to count samples more than once? is "Yes!". Within the overlapping area the same sample can be part of both, zero or more eliminated blocks and zero or more non-eliminated blocks. All non-eliminated blocks are part of the final calculation. What do you mean by "loudness of a set of blocks"? Conceptually: Concatenate all non-eliminated blocks and calculate the "ungated" loudness for the whole interval. In practice you basically average the pre-calculated loudness values of all non-eliminated blocks, see step (8). PS My comments are not supposed to curtain the fact that you have done a great job so far! This post has been edited by googlebot: Jan 7 2011, 23:05 |
|
|
|
Jan 7 2011, 23:21
Post
#55
|
|
![]() Group: Members Posts: 395 Joined: 13-June 10 Member No.: 81467 |
PS My comments are not supposed to curtain the fact that you have done a great job so far! Many thanks to C.R.Helmrich and you for the great comments! Meanwhile I've taken another look at the papers and I think the point is clear now. Probably the next version will offer the two pass approach (and may leave the current one pass as a very good approximation). |
|
|
|
Jan 8 2011, 00:28
Post
#56
|
|
|
Group: Members Posts: 11 Joined: 6-January 11 Member No.: 87101 |
Please pardon the noob here; hopefully I'm keeping up with the discussion even though most of this is far outside my normal domain. I'm sure you'll all set me straight if I'm on the wrong track!
Is a two-pass approach over the input really required? While I'm sure it's a reasonable approach; from a library perspective a single pass interface seems convenient (like the common ReplayGainAnalysis C code). In googlebot's steps 1 through 6; the loudness per block is calculated implicitly during step #2 and if I'm understanding correctly only that per-block loudness is needed for all of the remaining steps. Now in a "maximum overlap" approach as suggested by pbelkner each input sample results in a block so the block count per second is of course very high (equal to the sample rate). In this case buffering the per-block loundness in a single-pass approach sounds ridiculous compared to a two-pass algorithm. But in the minimum 50% overlap standard laid out by Tech 3341, Annex 1; the block count per second is fixed at 5 independent of the sample rate. If I'm understanding this correctly it means that buffering the per-block loudness would "only" require 18K samples per hour (versus 172 million for near 100% overlap). If the loudness samples are stored in 64-bits that's only a little over 700 KiB an hour of buffering. While it isn't bounded; it sounds reasonable for in memory buffering this application on modern hardware (considering tyipcal PC applications at this point, not embedded devices, etc). I looks to me like there is a good reason to stay near the 50% minimum overlap. -Jeff |
|
|
|
Jan 8 2011, 01:51
Post
#57
|
|
|
Group: Members Posts: 698 Joined: 6-March 10 Member No.: 78779 |
Completely agree! One doesn't really have to pass two times over the whole input. Only the loudness values of non-eliminated blocks need to be saved during the first pass. The second "pass" can then just further decimate those (in the same loop as the final averaging).
I often do not start to look for speed optimization potential before I have a simple to understand and correctly working first sketch. In my experience this leads to better code in the long run. But you are right: 2 passes over the whole input are overkill, probably even for a first sketch... This post has been edited by googlebot: Jan 8 2011, 01:52 |
|
|
|
Jan 8 2011, 09:32
Post
#58
|
|
![]() Group: Members Posts: 395 Joined: 13-June 10 Member No.: 81467 |
But in the minimum 50% overlap standard laid out by Tech 3341, Annex 1; the block count per second is fixed at 5 independent of the sample rate. If I'm understanding this correctly it means that buffering the per-block loudness would "only" require 18K samples per hour (versus 172 million for near 100% overlap). If the loudness samples are stored in 64-bits that's only a little over 700 KiB an hour of buffering. While it isn't bounded; it sounds reasonable for in memory buffering this application on modern hardware (considering tyipcal PC applications at this point, not embedded devices, etc). I looks to me like there is a good reason to stay near the 50% minimum overlap. Thanks a lot for this estimation. For album gain calculation we have to buffer "loudness samples" in this order of magnitude. |
|
|
|
Jan 8 2011, 14:29
Post
#59
|
|
|
Group: Developer Posts: 618 Joined: 6-December 08 From: Erlangen Germany Member No.: 64012 |
Many thanks to C.R.Helmrich and you for the great comments! Gern geschehen. Thank you for taking the implementation initiative! I also agree with Jeff and googlebot and suggest to do it exactly like they proposed: compute a new block loudness measure every 9600 samples (at 48 kHz) and store all blocks with loudness > -70 LUFS in a linked list (or array if you know the track/album length ahead of time... which you do in our scenario, I guess). Then you can apply the relative gate on this list. Actually, I think to avoid calculating the logarithm and division by T every 200 ms you can simply store the block energies in your list, because the comparison block loudness > -70 LUFS is, assuming your block energy = left energy + right energy + center energy + 1.41* ..., equivalent to block energy > 0.4 * sample rate * 10^((-70+0.691)/10), with the right-hand term being a constant (0.00225113 for 48 kHz, 0.00206823 for 44.1 kHz). Then you can work analogously for the relative gating: simply sum up all the block energies in your 70-gated list, divide by the number of energies in the list to get the average 70-gated energy, and apply the relative gating threshold by block energy > 0.1584893 * average 70-gated energy Chris This post has been edited by C.R.Helmrich: Jan 8 2011, 15:13 -------------------- If I don't reply to your reply, it means I agree with you.
|
|
|
|
Jan 8 2011, 18:42
Post
#60
|
|
|
Group: Members Posts: 581 Joined: 17-August 09 Member No.: 72373 |
I have it on good authority that the calculation can be done in a single pass. This was a design requirement as R128 was designed to be workable for live broadcast applications. I will make inquiries and try and scare up the technical details. If anyone happens to be in Switzerland in February all will be revealed.
|
|
|
|
Jan 8 2011, 19:40
Post
#61
|
|
|
Group: Members Posts: 698 Joined: 6-March 10 Member No.: 78779 |
A fully standard compliant single-pass outline is on the table since at least post #54 (bottom). For I-scale measurements, some state has to be accumulated, though, because the loudness of a programme's last block can in principle decide whether its first block gets gated or not. Hardware with limited memory will have to be subject to limits for the maximum integrable time span (which can be huge at moderate cost if you look at Jeff's post). The S- and M- scales, on the other hand, are suited for measurements of infinite length.
I'm looking forward, however, to what you can dig up at the workshop and share here! Great optimization by C.R.Helmrich, btw, this should save several orders of magnitude CPU time! This post has been edited by googlebot: Jan 8 2011, 19:47 |
|
|
|
Jan 9 2011, 18:47
Post
#62
|
|
![]() Group: Members Posts: 395 Joined: 13-June 10 Member No.: 81467 |
v0.3 released
I've just uploaded the new version and it's available at http://sourceforge.net/projects/r128gain/files/What's new?
|
|
|
|
Jan 10 2011, 09:58
Post
#63
|
|
|
Group: Members Posts: 698 Joined: 6-March 10 Member No.: 78779 |
Works perfectly, great job! Even for multichannel and high resolution files.
I'm wondering why the EBU provided test sample don't match their own descriptions in tech 3341. That should be fixed. |
|
|
|
Jan 10 2011, 17:38
Post
#64
|
|
![]() Group: Members Posts: 395 Joined: 13-June 10 Member No.: 81467 |
Actually, I think to avoid calculating the logarithm and division by T every 200 ms you can simply store the block energies in your list, because the comparison block loudness > -70 LUFS is, assuming your block energy = left energy + right energy + center energy + 1.41* ..., equivalent to block energy > 0.4 * sample rate * 10^((-70+0.691)/10), with the right-hand term being a constant (0.00225113 for 48 kHz, 0.00206823 for 44.1 kHz). Then you can work analogously for the relative gating: simply sum up all the block energies in your 70-gated list, divide by the number of energies in the list to get the average 70-gated energy, and apply the relative gating threshold by block energy > 0.1584893 * average 70-gated energy Let me, please, summarize how I understand this:
Pass 1 of the EBU R128 algorithm only has to cache the weighted mean squares wmsq_i of the EBU R128 segmentation. From that all the rest can easily be derived. This post has been edited by pbelkner: Jan 10 2011, 17:44 |
|
|
|
Jan 10 2011, 18:01
Post
#65
|
|
![]() Group: Members Posts: 395 Joined: 13-June 10 Member No.: 81467 |
The BS.1770 loudness measure is defined as -0.691 + 10*lg(wmsq),where wmsq = sum_i_j G_i*x_i_j*x_i_j/n,is the (per channel) weightet mean square of the intervall under consideration. should read QUOTE The BS.1770 loudness measure is defined as
-0.691 + 10*lg(wmsq),where wmsq = sum_i_j G_i*x_i_j*x_i_j/n,is the (per channel) weightet mean square of the intervall under consideration. |
|
|
|
Jan 10 2011, 18:20
Post
#66
|
|
|
Group: Developer Posts: 618 Joined: 6-December 08 From: Erlangen Germany Member No.: 64012 |
Exactly, and if you pull out the "/n" in wmsq = sum_i_j G_i*x_i_j*x_i_j/n, which you can do since n is the same in all blocks and channels, you get what I wrote because n = 0.4 * sample rate and
wmsq = block energy / n and save many divisions. Of course you still need the division when computing the final R128 loudness measure. I think you mean "x_i_j the i-th channel's voltage of the j-th sample" though, right? Chris -------------------- If I don't reply to your reply, it means I agree with you.
|
|
|
|
Jan 10 2011, 18:35
Post
#67
|
|
|
Group: Members Posts: 698 Joined: 6-March 10 Member No.: 78779 |
Just out of curiosity, where does that 0.4 come from?
|
|
|
|
Jan 10 2011, 19:27
Post
#68
|
|
|
Group: Members Posts: 11 Joined: 6-January 11 Member No.: 87101 |
|
|
|
|
Jan 10 2011, 21:40
Post
#69
|
|
|
Group: Members Posts: 698 Joined: 6-March 10 Member No.: 78779 |
Duh. My bad!
|
|
|
|
Jan 11 2011, 23:14
Post
#70
|
|
|
Group: Members Posts: 1540 Joined: 13-August 03 Member No.: 8353 |
I have a proposal.
New standard tag fields: EBU_R128_REFERENCE_LOUDNESS EBU_R128_TRACK_GAIN EBU_R128_TRACK_PEAK EBU_R128_ALBUM_GAIN EBU_R128_ALBUM_PEAK or R128GAIN_*, EBUR128_*, ... Replay Gain tag fields should become optional, only activated by a command line option. So no loss there if people want to test your implementation without having an EBU R128 DSP plugin. Without independent tag fields the authors of such plugins cannot start supporting EBU R128 gain control in their Replay Gain plugins. |
|
|
|
Jan 12 2011, 02:02
Post
#71
|
|
|
Group: Members Posts: 1540 Joined: 13-August 03 Member No.: 8353 |
PS: I'd say that using GAIN in your prefix is a bad idea like I had suggested (unfortunately I can't edit the post anymore). The GAIN in REPLAYGAIN_* is part of the proper name of that loudness measurement system. Choose wisely, AFAIK you're the first with such an implementation that uses tag field names or even the first with a PC implementation of EBU R128. The tag fields will probably become the standard (in the PC sound community).
EBUR128_* would be consistent with Replay Gain's omission of the whitespace between the two words, but it is confusing so that people might think the standard's name is Ebur 128 or EBUR 128. Hence I would vote for EBU_R128_* This post has been edited by Fandango: Jan 12 2011, 02:07 |
|
|
|
Jan 12 2011, 04:11
Post
#72
|
|
|
Group: Members Posts: 581 Joined: 17-August 09 Member No.: 72373 |
Without independent tag fields the authors of such plugins cannot start supporting EBU R128 gain control in their Replay Gain plugins. Why not? There is a simple and reasonably accurate mapping between R128 and Replay Gain metrics. This post has been edited by Notat: Jan 12 2011, 04:14 |
|
|
|
Jan 12 2011, 06:40
Post
#73
|
|
|
Group: Members Posts: 11 Joined: 6-January 11 Member No.: 87101 |
I had been hoping that the written tags were being converted into REPLAYGAIN compatible units (although I wondered). How are the flacs being tested being tested; a modified playback program as well? In that case is the correction algorithm applied at playback the same just different units / base?
New tags seems very unfortunate (given hardware device support, etc). New tags for the peak data wouldn't mean anything more than sample peak (ReplayGain) versus true signal peak (EBU R128); right? Would a playback program care about the distinction (would seem unlikely unless a fancy client had some way of estimating the worst-case error in sample-peak based on sampling frequency, etc ... sounds far fetched). In terms of the gain; I had been assuming that it was just a matter of converting units / reference levels. I guess the paper probably answers that. It sound interesting; too bad it's $20. Also note that storing REFERENCE_LOUDNESS for ReplayGain is not a standard and probably doesn't make any more sense here than it does for ReplayGain (current non-standard metaflac behavior notwithstanding). -Jeff |
|
|
|
Jan 12 2011, 10:18
Post
#74
|
|
|
Group: Members Posts: 11 Joined: 6-January 11 Member No.: 87101 |
Just compared r128gain output versus ReplayGain for ref_pink.wav. ReplayGain defines ref_pink.wav as +6.00 dB. This was originally 0 when compared to 83 dB SPL but shifted up when 6 dB was added to make typical music "loud enough" on non-calibrated systems.
CODE C:\development\replaygain>r128gain.exe ref_pink.wav args analyzing ... ref_pink.wav (1/1): -23.4 LUFS, 0.4 LU (peak: 0.292569: -5.3 dBFS) ALBUM: -23.4 LUFS, 0.4 LU (peak: 0.292569: -5.3 dBFS) Since the whole point is to come up with a scaling ratio and relative LU are scaled in dB it looks to me like this algorithm will generate ReplayGain compatible values simply by adding 5.6 to the reported LU to compensate for the different base loudness of ref_pink.wav (-.4 difference due to fundamental differences in algorithms and +6 difference due to the ReplayGain reference point shift). However; this whole comparison is based on my assumption that the goal is to use the new algorithm for computing the loudness and adjustment but calibrating to the original reference sound. I don't know if this is a valid comparison or if for example the new algorithm would specifically not be expected to behave ideally on ref_pink.wav. Anyway, for the few real music files I compared the results were similar enough to the ReplayGain calculated values that this seems plausible but different enough that I don't know if this is a valid conversion method or not. -Jeff |
|
|
|
Jan 12 2011, 10:21
Post
#75
|
|
![]() Group: Members Posts: 395 Joined: 13-June 10 Member No.: 81467 |
I had been hoping that the written tags were being converted into REPLAYGAIN compatible units (although I wondered). How are the flacs being tested being tested; a modified playback program as well? In that case is the correction algorithm applied at playback the same just different units / base? EBU R128 and ReplayGain are two different approaches to reach the same goal: uniform loudness at replay time. Common to both approaches is to define an algorithm in order to determine at scan time
Tests where performed using Winamp in conjunction with my own SoX and FFmpeg based input plugin. Native WA should do as well. New tags seems very unfortunate (given hardware device support, etc). New tags for the peak data wouldn't mean anything more than sample peak (ReplayGain) versus true signal peak (EBU R128); right? Would a playback program care about the distinction (would seem unlikely unless a fancy client had some way of estimating the worst-case error in sample-peak based on sampling frequency, etc ... sounds far fetched). In terms of the gain; I had been assuming that it was just a matter of converting units / reference levels. I guess the paper probably answers that. It sound interesting; too bad it's $20. Plaback software (as e.g. Winamp) makes use of the peak values (e.g. providing a clipping prevention mode). Whether it is amplitude peak or true peak will become intersting in case there is some up-sampling in the playback chain, and propably it is, because each contemporary DAC does it. Hence you should always store true peaks. Also note that storing REFERENCE_LOUDNESS for ReplayGain is not a standard and probably doesn't make any more sense here than it does for ReplayGain (current non-standard metaflac behavior notwithstanding). Maybe it could become part of the RG standard:
|
|
|
|
![]() ![]() |
|
Lo-Fi Version | Time is now: 19th May 2013 - 14:27 |