Improving ReplayGain, some ideas for Devs etc |
![]() ![]() |
Improving ReplayGain, some ideas for Devs etc |
Jul 19 2004, 23:01
Post
#76
|
|
![]() Group: Members Posts: 393 Joined: 23-July 02 From: Blue Grass, IA Member No.: 2760 |
2B > The peak sample value tells you _exactly_ how much headroom the material has - usually none!
...and I am at a loss on how RG actually does this. If I take a 20 kHz sine wave (or whatever that will yield 2 sample points per cycle) and encode at 44.1 kHz, there is no guarantee that the sample points will fall on the amplitude maximums (unless phase locked). They could fall on the x-axis crossing nodes...or anywhere on the waveform up to the peaks. Given my example above, is RG actually reconstructing the wave to determine the peak value...or is it using the data in the file? xen-uno -------------------- No one can be told what Ogg Vorbis is...you have to hear it for yourself
- Morpheus |
|
|
|
Jul 20 2004, 00:52
Post
#77
|
|
|
Group: Members Posts: 57 Joined: 4-January 04 Member No.: 10938 |
QUOTE (Xenno @ Jul 19 2004, 11:01 PM) 2B > The peak sample value tells you _exactly_ how much headroom the material has - usually none! ...and I am at a loss on how RG actually does this. If I take a 20 kHz sine wave (or whatever that will yield 2 sample points per cycle) in fact 2+epsilon sample points are required by shannon theorem in the case of real valued samples, if you look at it closely enough. (ie : notice that cos(Wt) = 0.5*(exp(iWt) + exp(-iWt)), and thus its bandwidth is not in [-W,+W[ ) QUOTE and encode at 44.1 kHz, there is no guarantee that the sample points will fall on the amplitude maximums (unless phase locked). They could fall on the x-axis crossing nodes...or anywhere on the waveform up to the peaks. on a long enough sequence of such a sine wave, some of the sample points will fall very close to maximums of the continuous wave. By a quick estimation, 4000 samples are enough to insure that the discrete peak lies within 100/(4000^2) percents of the continuous wave's real peak for high frequencies up to 22.05/(1+1/4000) = 22.044 kHz. (I'm using 1-x^2/2 as an estimate of the sine wave near the optimums) Even with only 10 samples, you get 1% peak precision for high-frequency sines up to 20.04 kHz. (i.e., from 2.2kHz to 20.04kHz. low frequencies are of no interest here, since they don't show much max difference between discrete and continuous signal) Conclusion : for a sine wave, you don't really have to worry about the difference between the discrete peak and the underlying continuous peak. QUOTE Given my example above, is RG actually reconstructing the wave to determine the peak value...or is it using the data in the file? my opinion is it doesn't matter, even slightly, though I only made my point with sine waves and not the general case of just any sampled sound. This post has been edited by SamK: Jul 20 2004, 00:58 |
|
|
|
Jul 20 2004, 00:55
Post
#78
|
|
|
Group: Members Posts: 57 Joined: 4-January 04 Member No.: 10938 |
QUOTE (SamK @ Jul 20 2004, 12:52 AM) my opinion is it doesn't matter, even slightly, though I only made my point with sine waves and not the general case of just any sampled sound. on top of that, you can consider it's not really clipping as long as the digital signal is conserved. Then, if the DAC chops of the true peaks of the analog signal due to that kind of issue, I'd say it's his fault. |
|
|
|
Jul 20 2004, 13:07
Post
#79
|
|
![]() ReplayGain developer Group: Developer Posts: 4583 Joined: 5-November 01 From: Yorkshire, UK Member No.: 409 |
The RG peak value is the largest absolute value within the digital data.
I'm aware that the true reconstructed peak can be between sample values (and oversampling DACs will often clip this), but isn't the ReplayGain calculation slow enough already - without taking this into account? It's worth remembering that the people who master squashed CDs don't take this into account either. The reason I didn't worry about this with ReplayGain is because the highest reconstructed inter-sample value you can contrive is around 1.5x digital full scale. As ReplayGain drops most over-compressed tracks by 6-12dB, you've got more than enough headroom. I suppose you could store an "analogue" peak value, and use this for clipping prevention. That's a nice project, if anyone wants it! However, ReplayGain will keep most music away from clipping. If you don't use ReplayGain, simply dropping the gain by 3-6dB will keep everything away from clipping. What's more, the existing peak value is more than good enough in most cases, and leaving an extra fraction of a dB headroom will make it fine in all but contrived cases. You've got to wonder: if someone puts a signal onto a CD where the analogue peak is at digital full scale plus 50%, maybe the intention is to make the DAC in your CD player clip? Is so, what's the point in de-clipping it? Cheers, David. This post has been edited by 2Bdecided: Jul 20 2004, 13:08 |
|
|
|
Jul 20 2004, 14:08
Post
#80
|
|
|
Group: Members Posts: 57 Joined: 4-January 04 Member No.: 10938 |
QUOTE (2Bdecided @ Jul 20 2004, 01:07 PM) The RG peak value is the largest absolute value within the digital data. I'm aware that the true reconstructed peak can be between sample values (and oversampling DACs will often clip this) Are there any DACs that can reconstruct the analog signal with full peak above full-scale ? I guess DACs can behave very differently on such digital signals. QUOTE The reason I didn't worry about this with ReplayGain is because the highest reconstructed inter-sample value you can contrive is around 1.5x digital full scale. Wow, 1.5x is much more than possible with sine waves.. can you tell how to make such a signal ? or do you get this value from mathematically bounding the reconstructed signal formula ? |
|
|
|
Jul 20 2004, 14:19
Post
#81
|
|
|
Group: Members Posts: 57 Joined: 4-January 04 Member No.: 10938 |
QUOTE (2Bdecided @ Jul 20 2004, 01:07 PM) isn't the ReplayGain calculation slow enough already - without taking this into account? it might be possible to take it into account without much more computations. From the DCT transform, you can bound the analog peak by adding the moduli of the DCT coefficients. I don't know how unprecise that can be on real music signals, but my guess is it shouldn't be too bad. |
|
|
|
Jul 20 2004, 14:50
Post
#82
|
|
![]() ReplayGain developer Group: Developer Posts: 4583 Joined: 5-November 01 From: Yorkshire, UK Member No.: 409 |
QUOTE (SamK @ Jul 20 2004, 01:08 PM) Wow, 1.5x is much more than possible with sine waves.. can you tell how to make such a signal ? or do you get this value from mathematically bounding the reconstructed signal formula ? OK, it was 1.41, but it's possible with an 11.025kHz sine wave (44.1kHz sampling)... http://www.hydrogenaudio.org/forums/index....ype=post&id=818 I'd imagine that's the maximum you can get from a sine wave, but if you drag samples around in Cool Edit you can get bigger peaks between samples. If you drag 2 more samples high in the above example, you can reach 1.78x digital full scale between samples (verified by resampling to 10x the sample rate and checking the middle sample value). The "true" peak will be slightly higher still. I'll leave you to figure out which two samples you have to drag up! Cheers, David. |
|
|
|
Jul 20 2004, 14:59
Post
#83
|
|
![]() Group: Developer Posts: 1679 Joined: 23-December 01 From: Germany Member No.: 731 |
CODE replaygain_track_gain = -10.99 dB replaygain_track_peak = 1.647949 Transcoded from a Musepack --standard encode to AoTuV b2 -q 0. -------------------- "To understand me, you'll have to swallow a world." Or maybe your words.
|
|
|
|
Jul 20 2004, 15:13
Post
#84
|
|
![]() ReplayGain developer Group: Developer Posts: 4583 Joined: 5-November 01 From: Yorkshire, UK Member No.: 409 |
That's different to the latest issue discussed in this thread, because that Peak value is based on actual samples, not inter-sample reconstructed peaks.
However, it illustrates Pio's earlier point very well! Cheers, David. |
|
|
|
Jul 26 2004, 02:18
Post
#85
|
|
![]() Group: Members Posts: 65 Joined: 19-July 03 Member No.: 7864 |
QUOTE The peak sample value tells you _exactly_ how much headroom the material has - usually none! I'm referring more to peak-to-average ratio here. Headroom is a rather vague term that could mean anything, so I probably shouldn't have used it.QUOTE If you want a pure RMS measurement, then measure the RMS. It's got little to do with judging or matching loudness, so it's not part of ReplayGain. Sorry! Yes, but current methods of calculating RMS are rather cumbersome. You have to open up each individual file in a wave editor, run the analysis feature, write the RMS down, and do that over and over again for every track on an album. Plus the RMS scanners in wave editors don't have the "intelligent" calculation factors that ReplayGain uses; it simply averages all the samples in a selection (unless you specify the scanner to ignore everything under a certain level, which alreeady adds work that shouldn't be neccessary for the user). Adding a non-contour feature to ReplayGain would give people a quick and easy way to measure RMS values.
|
|
|
|
Oct 27 2004, 17:54
Post
#86
|
|
![]() Group: Members Posts: 225 Joined: 19-February 02 From: plymouth, uk Member No.: 1355 |
QUOTE (2Bdecided @ Nov 18 2003, 04:35 PM) Almost everyone is using a reference level of 89dB, rather than the 83dB in the original ReplayGain proposal. Unless there are any objections, I'll change the official reference level to 89dB. (It's a pity I didn't stick with the original idea of storing the ReplayGain level in the file e.g. 92dB instead of -3dB, because then the reference level wouldn't matter. Too confusing to change back now I think) Could this be solved by storing the reference level in the file as well as the replaygain? Edit: Didn't realise I was such a late comer to this thread! This post has been edited by danbee: Oct 27 2004, 17:56 -------------------- :: danbee :: pixelhum.com ::
|
|
|
|
Oct 27 2004, 18:23
Post
#87
|
|
|
FLAC Developer Group: Developer Posts: 1526 Joined: 27-February 02 Member No.: 1408 |
not a bad idea, e.g.
replaygain_reference_level=90dB absence of the tag implies 89dB |
|
|
|
![]() ![]() |
|
Lo-Fi Version | Time is now: 19th May 2013 - 07:05 |