Help - Search - Members - Calendar
Full Version: Why RG 89.0 dB when it doesn't clip?
Hydrogenaudio Forums > Hydrogenaudio Forum > Scientific Discussion
udauda
I have a question regarding ReplayGain.

I thoroughly read the RG database and all, and understood the meaning behind 89.0 dB.

It is certainly a very reasonable proposition, and I do believe it is our only way to survice this insane 'loudness war'. tongue.gif

But what if your DAPs(iPod, etc)' output is not high enough to compensate 89.0 dB? Sometimes 89.0 dB is too low for certain DAPs- so you turn up the volume to obtain a sufficient output to your ears, and the DAPs start to introduce a background noise to the music itself. In that case, you are actually sacrificing sound quality in order to prevent clipping. Hmm..


So my question is...

For the low-output DAPs, can you RG beyond 89.0 dB as long as it doesn't clip?

(mp3gain conveniently tells you whether a files clips or not when you analyze gain)





Thank you.



pdq
The number one function of ReplayGain is to make all of your music equal loudness. Secondarily it can be set to avoid clipping on virtually all tracks, but if you have different requirements, such as for your DAP, or you don't mind a little clipping now and then, then feel free to select any level you like.
Mike Giacomelli
Yes. You can pick any reference level you like. 89dB is merely the default.
Martel
QUOTE(udauda @ May 26 2008, 16:23) *
But what if your DAPs(iPod, etc)' output is not high enough to compensate 89.0 dB? Sometimes 89.0 dB is too low for certain DAPs- so you turn up the volume to obtain a sufficient output to your ears, ...

The second sentence doesn't make much sense since 89.0 dB is, well, just 89.0 dB, unless, perhaps, it wasn't really 89.0 dB so you had to crank up the volume to actually reach 89.0 dB. Then, using the reference dB scale (as seems to be the case) makes no sense and moreover you don't get any clue about clipping at all, since there is no "ceiling" on that scale. biggrin.gif
I thought that RG used information relative to digital 0 dB where you get the idea about clipping level (0 dB).
Alex B
QUOTE(Martel @ May 27 2008, 11:21) *
... Then, using the reference dB scale (as seems to be the case) makes no sense and moreover you don't get any clue about clipping at all, since there is no "ceiling" on that scale. biggrin.gif
I thought that RG used information relative to digital 0 dB where you get the idea about clipping level (0 dB).

I think the OP is speaking spefically about the MP3Gain program's GUI.

The scale that the MP3Gain program uses has a "ceiling" of 103 dB. A 0 dBfs peak in a digital file produces 103 dB output on that scale. In addition, MP3Gain displays (in 1.5 dB steps) how big margin (if any) the maximum peak has to the 0 dBfs (=103 dB) level.

The so called "loudness war" has caused most modern popular music albums to have an extremely high average volume levels. For instance, if the measured replay gain correction value is -9 dB (= 98 dB inside MP3Gain's GUI) there is only a 5 dB margin for the maximum peaks. If the albums (or tracks) that the OP listens to generally fall in this "victims of war" category it is unnecessary to process the files with MP3Gain. The peaks are already at maximum possible and MP3Gain cannot increase the volume without causing clipping.

MP3 gain cannot fix the audible effect of overly compressed albums. It can only make them quieter so that they can be listened to at the same average volume level that old less compressed albums have. For practical purposes the same effect can be achieved by adjusting the output volume level with the portable device's volume control.

In the rare case of an old quieter release that also has relatively low peak volume levels it is possible to increase the volume level with MP3Gain (though, usually the older quieter releases are more dynamic and don't have much margin for increasing the average volume level).
2Bdecided
QUOTE(Alex B @ May 27 2008, 10:59) *
The scale that the MP3Gain program uses has a "ceiling" of 103 dB. A 0 dBfs peak in a digital file produces 103 dB output on that scale. In addition, MP3Gain displays (in 1.5 dB steps) how big margin (if any) the maximum peak has to the 0 dBfs (=103 dB) level.
Neither of these statements is strictly true. The ReplayGain values (including the one you choose to set as a target, e.g. 89dB) is an estimate of how loud the audio would sound when played back (with the volume control set at a reference level). This isn't really correlated with the peak sample value at all - a single click can hit digital full scale, but it won't sound very loud. A hyper compressed track can sound very loud without ever going near digital full scale.

Of course the "clipping Y/N" indicator is determined by whether the track (with the suggested gain applied) will go above 0dB FS.


In reply to the original question, if you want ReplayGain/mp3gain to do its job (i.e. try to make all your music sound equally loud), you need to pick one target volume and stick to it. In mp3gain, you're free to pick as high or as low a one as you want, depending on how much clipping you're happy to tolerate. Not all clipping is audible.

Hope this helps.

Cheers,
David.
Alex B
QUOTE(2Bdecided @ May 27 2008, 14:43) *

QUOTE(Alex B @ May 27 2008, 10:59) *
The scale that the MP3Gain program uses has a "ceiling" of 103 dB. A 0 dBfs peak in a digital file produces 103 dB output on that scale. In addition, MP3Gain displays (in 1.5 dB steps) how big margin (if any) the maximum peak has to the 0 dBfs (=103 dB) level.
Neither of these statements is strictly true. The ReplayGain values (including the one you choose to set as a target, e.g. 89dB) is an estimate of how loud the audio would sound when played back (with the volume control set at a reference level). This isn't really correlated with the peak sample value at all - a single click can hit digital full scale, but it won't sound very loud. A hyper compressed track can sound very loud without ever going near digital full scale.

Hmm... maybe I couldn't make myself correctly understood. My statements are not about how loud a file with certain measured qualities will sound and I didn't make any correlation between the peak volume level and the perceived loudness. The used terminology seems to always be a problem with these replay gain threads. It doesn't help that different applications use the terms and displayed dB values differently.

As far as I understand, both of my statements are true if we are speaking about decoded PCM audio (and that is what MP3Gain measures) and the scale that is used in the MP3Gain application.

In MP3Gain the reference value of 89 dB has a 14 dB margin to the full scale, i.e. it tries to adjust the perceived average volume level to be 14 dB below the 0 dBfs point. In this case the maximum peaks can be 14 dB louder than the average volume level. This margin may or may not be enough depending on how dynamic the file is. (89+14=103).

Pio2001 has explained the 14 dB margin in this old post:
QUOTE(Pio2001 @ Aug 16 2003, 11:55) *
89 db is also called K-14. It means that the average perceived level is set at -14 db.
89 db refers to professional calibrated studio monitors : when a -14 db digital file is played (the reference is a pink noise), if the ampli gain is properly calibrated, the sound pressure must be 89 db at the listening point.
Above this, you risk clipping, because if somewhere in the track there is a peak that is more than 14 db above the average level of the track, then it will be pushed above 0db and will clip.

If, for example, a 92 dB reference value is used in MP3Gain the margin is reduced to 11 dB.

In theory, if a test signal has no dynamic variation at all and a reference of 103 dB is used the adjusted average volume and the peak level should both be 0 dBfs (within the programs accuracy, which is limited to 1.5 dB steps because of the MP3 format's limitation). And no, I have not actually tried that kind of test signal.

Unfortunately many recent album releases are almost like such a test signal. A -11 dB album gain value is not a rarity


[off topic]

I dind't mention the common MP3 decoder clipping phenomenon to avoid making things more complicated. For the purpose of this thread that is irrevelant and the phenomenon itself has not been proven to be even audible. In my experience the peaks that have increased peak levels (over 0 dBfs) seem to always be too brief to be audible so it really doesn't matter if this artificial peaking is truncated or preserved by reducing the file volume. (I have tried to ABX the difference with my properly prepared "worst case" test samples without success, but I would be happy to receive a sample that would prove me wrong. Also, similarly like it seems to be impossible to hear the effect of a truncated peak, I don't think it is possible to determine how much a preserved "increased" peak audibly differs from the original peak.)

[/off topic]
2Bdecided
QUOTE(Alex B @ May 27 2008, 15:31) *
In MP3Gain the reference value of 89 dB has a 14 dB margin to the full scale, i.e. it tries to adjust the perceived average volume level to be 14 dB below the 0 dBfs point. In this case the maximum peaks can be 14 dB louder than the average volume level. This margin may or may not be enough depending on how dynamic the file is. (89+14=103).
I see what you meant, and yes, that's true - more or less.

It's complicated by the fact that actual digital signal levels (i.e. the ones that can clip), and perceived loudness (i.e. what you hear, or what ReplayGain thinks you hear) are not that simply related.

However, saying that 89dB gives you 14dB headroom is a good enough way to think about it, unless you have the time to read the whole ReplayGain website wink.gif

Just don't come moaning to me if something with 14dB peak-to-average measurement in your audio editor is clipped to **** by ReplayGaining it to 89dB!

Cheers,
David.
Alex B
QUOTE
Just don't come moaning to me if something with 14dB peak-to-average measurement in your audio editor is clipped to **** by ReplayGaining it to 89dB!

smile.gif Yes, the RMS levels in audio editors are often quite different to what foobar2000's or MP3Gain's replain gain implementations measure. I understand that happens because the replay gain analyzers are taking several factors into the account and try to simulate how human hearing actually works.

(I have briefly read most of the Replay Gain site a couple of times, but I'm not saying I fully understand how the calculation is done. Maybe I should revisit the technical explanation...)
greynol
It's worth mentioning that my the level of my iPod's line out is far lower than the line out of any CD player I've ever owned (the spec for my model is 1 Vpp). For this reason I set the target to 92 dB and deal with whatever additional clipping may incur which I find (for my music) is rarely audible, if at all.

As far as the notion that -14 dB is the lowest possible RG value assuming a reference of 89dB, this is not the least bit true. A full-scale sine wave @ 2kHz will give a value of -16.46dB, whereas a full-scale sine wave @ 2.5kHz will give a value of -18.98dB. RG numbers are frequency dependent because perceived loudness is frequency dependent.
lvqcl
Full-scale square wave @ 3670...3680kHz gave me RG value of -24.03dB. Anything louder?
Alex B
QUOTE(greynol @ May 27 2008, 20:18) *
As far as the notion that -14 dB is the lowest possible RG value assuming a reference of 89dB, this is not the least bit true. A full-scale sine wave @ 2kHz will give a value of -16.46dB, whereas a full-scale sine wave @ 2.5kHz will give a value of -18.98dB. RG numbers are frequency dependent because perceived loudness is frequency dependent.

Of course, you are correct. I completely forgot the frequency depency factor. With standard music files my false "89 dB reference value => 14 dB headroom" rule has apparently worked consistently, which lead me to think it always works like that.

I tried a few foobar generated test tones (the window in the background is from foobar):

IPB Image

BTW, the MP3Gain GUI appears to have a minor bug. The -19.6 dB value should be -19.5 dB. Not that it makes any difference. It's just a display thing.
Dynamic
QUOTE(Alex B @ May 27 2008, 20:02) *

BTW, the MP3Gain GUI appears to have a minor bug. The -19.6 dB value should be -19.5 dB. Not that it makes any difference. It's just a display thing.


It's not a bug, it's correctly rounded.

The thing about the 1.5 dB steps you'll have read about is a correct to only 1 decimal place (1.505149978 dB is the figure to 9 dec. places).

As a voltage gain, doubling voltage (i.e. gain = 2.0) is about 6.02 dB. Power gain is voltage gain squared, so 6.02 dB is 4 times the power. The specifiers of the MP3 standard chose to divide this interval into 4 steps, which means taking the fourth root of 2 (= 2^¼) for the voltage gain step.

in decibels, this is: 20 log(2^¼) = 20 * 1/4 * log(2) = 5 * log(2) = 1.505149978 dB

n steps is 5 * n * log(2), so n = -13 gives a dB gain of:
-65 * log (2) = -19.5669497 dB

This rounds to -19.6 dB to 1 decimal place.
udauda
QUOTE
In MP3Gain the reference value of 89 dB has a 14 dB margin to the full scale, i.e. it tries to adjust the perceived average volume level to be 14 dB below the 0 dBfs point. In this case the maximum peaks can be 14 dB louder than the average volume level. This margin may or may not be enough depending on how dynamic the file is. (89+14=103).


Doesn't 0 dBfs represent the highest possible level?
AFAIK, the dynamic range of 16-bit (CD?) only extends upto 98 dB.
(1.761+6.0206*16) = 98

Isn't 0 dBfs then 98 dB?

If the maximum peaks are 14 dB louder than the avg volume 89 dB,
what happens to the signals that go over 98 dB?

Unfortunately, I do not have enough background knowledge to comprehend this... wacko.gif
Is there anyone who can explain these dynamic range to dBfs relationships for me? biggrin.gif
greynol
QUOTE(udauda @ May 29 2008, 17:25) *
Isn't 0 dBfs then 98 dB?

Have a look at the unit which you (correctly) chose: dBfs, fs meaning full-scale implying a peak.

Replaygain does not do anything based on a peak sample. It is based on a series of averages which were weighted by frequency when calculated.

I tried to dispel the myth that RG numbers are directly related to peak amplitude earlier. It would seem that I wasn't very successful. unsure.gif

Maybe a link or two might help stem the speculation...
http://replaygain.hydrogenaudio.org/calculating_rg.html
http://replaygain.hydrogenaudio.org/calibration.html
udauda
QUOTE(greynol @ May 29 2008, 18:53) *

Replaygain does not do anything based on a peak sample. It is based on a series of averages which were weighted by frequency when calculated.


I don't know much about technical details..
but it seems RG does do something about peak samples,
though it sticks around 95% of the RMS values:
http://replaygain.hydrogenaudio.org/statistical_process.html


QUOTE

The average RMS value is similarly misleading with the speech sample, and also with classical music. A good method to determine the overall perceived loudness is to sort the RMS energy values into numerical order, and then pick a value near the top of the list...The value which most accurately matches human perception of perceived loudness is around 95%, so this value is used by Replay Level.


Anyway, my question was whether dynamic range (16-bit/24-bit) has something to do with the dBfs.
when a signal goes over 0dBfs, it clips.
In the same manner, when a signal goes over the dynamic range, it clips as well.

Can someone help me to understand this??
lvqcl
QUOTE(udauda @ May 30 2008, 06:22) *

Anyway, my question was whether dynamic range (16-bit/24-bit) has something to do with the dBfs.
when a signal goes over 0dBfs, it clips.
In the same manner, when a signal goes over the dynamic range, it clips as well.

Can someone help me to understand this??

Of course. A signal at 0dBFS is a signal of max. possible amplitude.
"It is an abbreviation for decibel amplitude levels in digital systems which have a maximum available level (like PCM encoding). 0 dBFS is assigned to the maximum possible level."

But your original question was about ReplayGain, isn't it? According to http://replaygain.hydrogenaudio.org/ , RG deals with SPL (Sound pressure level) and not with amplitude values.

Added: and, note that decibel is a "unit of measurement that expresses the magnitude of a physical quantity relative to a specified or implied reference level"
Dynamic
Let us not forget that dB is always a measure of a power ratio but in logarithmic terms. I.e. a dB figure represents a multiplier.

It might represent a plain gain factor (plain dB) such as a 6.02 dB amplifier, which will quadruple the power going in (double the voltage).

...Or it might be referenced to a fixed power.

Example one: the power of full-scale, whose power is 0 dBFS
Example two: a power of 1 milliwatt (1mW), 1mW is denoted as 0 dBm
Example three: the power of a specific calibration signal (such as a pink noise signal whose RMS power is itself set to -20dBFS).

QUOTE(udauda @ May 30 2008, 03:22) *

I don't know much about technical details..
but it seems RG does do something about peak samples,
though it sticks around 95% of the RMS values:
http://replaygain.hydrogenaudio.org/statistical_process.html


No, it measures the perceptual loudness (relative to the perceptual loudness of a pink noise calibration signal) over short time intervals throughout the track or album. Now the simple average of all these perceptual loudness values (either averaging the linear power or the logarithmic (dB) power) isn't very good as a measure of the impression of loudness we perceive for the track.

Instead all the calculated instantaneous loudness values are collected together and sorted into numerical order, with the quietest first and the loudest last.

Imagine them as 1000 people sorted into height order, standing in a line. The 95th percentile of those people's heights would be the height of the 950th person from the left in the line. Incidentally, the median is the height of the 50th person, and can also be described as the 50th percentile.

Going back to loudness power or energy values, the median value will be the one half way through the list, but this isn't representative of how loud we perceive the music. What David has found is that somewhere close to the 95th percentile is representative of how loud we perceive the music overall.

Now re-read the quote below with that in mind, and you'll see it means to pick out the 95th percentile from the sorted list of instantaneous loudness values and use that as a good representation of how we perceive the loudness.

QUOTE

The average RMS value is similarly misleading with the speech sample, and also with classical music. A good method to determine the overall perceived loudness is to sort the RMS energy values into numerical order, and then pick a value near the top of the list...The value which most accurately matches human perception of perceived loudness is around 95%, so this value is used by Replay Level.


Hopefully you can see that we're simply choosing a value from 95% of the way along a sorted list, and we're not measuring an amplitude or a power that it 95% of (or 0.95 times) any other power level. For example is the 1000th's person in the line was 2.7 metres tall, or was swapped for one 2.4 metres tall, it doesn't tell us the height of the person in position 950 in the line.

QUOTE

Anyway, my question was whether dynamic range (16-bit/24-bit) has something to do with the dBfs.
when a signal goes over 0dBfs, it clips.
In the same manner, when a signal goes over the dynamic range, it clips as well.

Can someone help me to understand this??


With 16-bit or 24-bit quantization, the maximum sample value on playback will be the same voltage, whichever you choose. So +32767 might be +1 volt or -1 volt respectively for a 16-bit file. For 24-bit signed integers, the maximum positive value is +8388607 which would also represent +1 volt.

The minimum non-zero voltage at 24-bit is 0.00000012 volt (0.12 microvolt), but at 16-bit it is 0.00003052 volt (30.52 microvolt) - i.e. 256-times larger. These values determine the resolution with which one can define a constant voltage, but not the maximum voltage (where any larger value of input is "clipped" to the maximum value because it's impossible to go higher)

Loosely, the ratio of smallest step size to largest (which could be properly called the normalised quantization resolution) could be thought of as the dynamic range, at least for measuring constant voltages, or more specifically individual sample values.
This is often quoted as about 96 dB for CD audio (16-bit PCM) = 20 log(65535/1) = 96.3 dB.

Things are slightly different when you consider the frequency domain or consider measuring over multiple time samples. When dithered adequately, the system can then be said to have infinite dynamic range, regardless of the resolution, because signals or constant voltages below the quantization resolution can be measured by using longer averaging times (or bigger FFT window sizes) tending towards infinity if there is adequate dither.

Even maximum signal-to-noise ratio for full scale signal versus quantization noise isn't a straightforward concept, because it could be considered over the whole frequency bandwidth or divided into frequency bins, which then depends on the FFT length. On a human level, and for CD audio, one might consider a power spectrum derived from a 1024-point FFT to be roughly appropriate for the time and frequency resolution of the human auditory system. With flat dither at 16-bits, 44.1 kHz, this gives about 120 dB of signal-to-noise ratio in each frequency bin for a full-scale signal. 120 dB is enough

With noise shaped dither for CD audio, the maximum SNR might be much lower (worse) at high frequencies where it doesn't much matter, but at the most audible frequencies, it might be higher (better=lower noise floor) by 15 dB or more.

You can find out more details and visualise what's happening in this old post of mine.
greynol
QUOTE(Dynamic @ May 30 2008, 10:44) *
No, it measures the perceptual loudness (relative to the perceptual loudness of a pink noise calibration signal) over short time intervals throughout the track or album. Now the simple average of all these perceptual loudness values (either averaging the linear power or the logarithmic (dB) power) isn't very good as a measure of the impression of loudness we perceive for the track.

Instead all the calculated instantaneous loudness values are collected together and sorted into numerical order, with the quietest first and the loudest last.

Don't forget this part...
http://replaygain.hydrogenaudio.org/equal_loudness.html
udauda
Thnx a bunch, me bredas.

Your inputs are very informative and highly educational. biggrin.gif I am glad that I asked.



Now I see how RG does its own way of normalization.. Pretty impressive, indeed.

And if a signal which is 100dB loud was played under a 16-bit/44.1kHz condition, it should clip.

lvqcl
QUOTE(udauda @ May 31 2008, 01:14) *

And if a signal which is 100dB loud was played under a 16-bit/44.1kHz condition, it should clip.


Really? blink.gif Can you explain that conclusion to me?
udauda
Since dynamic range is around 98dB in a Compact Disc, that must be the 0 dBFs which is the maximum digital level. If a signal goes over the range, it should clip...



...Am I still wrong??? blink.gif Dohh!

lvqcl
QUOTE(udauda @ May 31 2008, 02:10) *

Since dynamic range is around 98dB in a Compact Disc, that must be the 0 dBFs which is the maximum digital level. If a signal goes over the range, it should clip...



...Am I still wrong??? blink.gif Dohh!


I've read posts of Dynamic (big thanks) and still don't know what is the dynamic range of CD-DA (44.1kHz 16bit LPCM). sad.gif

And, you said that 0 dBFS (max. possible digital level) is around 98dB and min. possible digital level is 0dB?
tot
QUOTE(lvqcl @ May 31 2008, 02:27) *

And, you said that 0 dBFS (max. possible digital level) is around 98dB and min. possible digital level is 0dB?


They are all just relative levels, 0dBFS is the maximum possible and on 16bit signal -96dBFS is the minimum. It is your volume knob that maps those levels to sound pressure level.

Let's say your average volume on the signal is -20dBFS and you adjust the volume so that it gives you nice 85dB SPL level on your chair. That means that the maximum peaks at 0dBFS will be 105dB SPL at your chair, and the noise floor of -96dBFS will be at 9dB SPL (i.e., inaudible).

Now if you turn the volume 10dB higher (that would be perceived to sound about twice as loud), the SPL numbers all will go up 10dB.

The numbering is reverse (0dBFS max) is just because it is most convenient. The maximum possible signal is what matters most. You could quote the numbers the other way, but then it would depend on bit depth where is the maximum signal level is (that would either blow your system or make you deaf.)
Slipstreem
QUOTE(Dynamic @ May 29 2008, 23:29) *
As a voltage gain, doubling voltage (i.e. gain = 2.0) is about 6.02 dB. Power gain is voltage gain squared, so 6.02 dB is 4 times the power.

A doubling in voltage is exactly 6dB, not "about 6.02dB". When did the laws of physics suddenly change?

Cheers, Slipstreem. cool.gif
greynol
QUOTE(Slipstreem @ May 30 2008, 20:36) *
A doubling in voltage is exactly 6dB, not "about 6.02dB". When did the laws of physics suddenly change?

Now don't get all Karl Rovian on us!

Is your calculator handy?

20 * log (2) = ???
Slipstreem
You're right. It's 6.02dB. I was deliberately lied to by every professional tutor who ever taught me about logs. smile.gif

Cheers, Slipstreem. cool.gif
Dynamic
QUOTE(lvqcl @ May 31 2008, 01:27) *

QUOTE(udauda @ May 31 2008, 02:10) *

Since dynamic range is around 98dB in a Compact Disc, that must be the 0 dBFs which is the maximum digital level. If a signal goes over the range, it should clip...



...Am I still wrong??? blink.gif Dohh!


I've read posts of Dynamic (big thanks) and still don't know what is the dynamic range of CD-DA (44.1kHz 16bit LPCM). sad.gif

And, you said that 0 dBFS (max. possible digital level) is around 98dB and min. possible digital level is 0dB?


The dynamic range of CD-DA is infinite if properly dithered, though that's not useful to you.

Imagine a deep fade-out of a full-scale sine wave (i.e. 0 dBfs to start). If you don't dither, the signal must be rounded to the nearest quantization level. So from peaks at -32768 and +32767 it would reach -1 and 0 and turn into a square wave (truncation distortion is thus adding odd harmonics of the fundamental) then it suddenly drops to a level where the negative and positive peaks round to the same value, be it 0 or -1 and the sound stops completely.

In that situation, you have about 96.3 dB dynamic range. (Vp-pmax = 65536, Vp-pmin = 1, work out 20 * log(65536/1) to get dB) where the fundamental signal is present in some form, below which it's absent. In this circumstance, you can usefully consider dynamic range.

However, not dithering is bad practice and shouldn't be considered intrinsic to CD-DA performance.

If you don't dither, the quantization noise might be peaky and related to the signal, or completely cancel the signal when it dips below a peak-to-peak amplitude of about 1. When you use flat dither, this complicated distortion is exchanged for flat, white noise that is constant and uncorrelated with the original signal.

Imagine trying to view the signal waveform on an oscilloscope, whether properly dithered or not. Imagine trying to place a decision threshold (e.g. at a comparator input) to produce a logic-level output, say to count the cycles or generate a clock signal from the live waveform. That sort of engineering situation is where the idea of dynamic range is particularly useful - determining the gap between the noise floor and, say, a saturation level.

With a digital sampling oscilloscope supplied with a suitable clock source you could average the signal over numerous cycles (e.g. 16 or 64), and be able to discern signals below the unaveraged noise floor if they were properly dithered in the first place. But being able to see this isn't the same as dealing with the live waveform.

On many sampling oscilloscopes you can also perform FFT analysis for a spectral view and approximate frequency analysis. Likewise waveform averaging, that involves a form of averaging any white noise present in, say, the 1024-sample FFT window to spread it over the 511 or 512 frequency bins in the power spectrum (remember positive and negative frequencies end up in the same bin, hence the number of bins is half the number of samples in the FFT window). Dividing the power equally among 512 bins it will be at a level in each bin that is about 10 * log(1/512) = -27.1 dB lower than the supposed noise floor measured from the waveform view (the waveform view noise floor is over the whole bandwidth present (e.g. 22.050 kHz for CD-DA), while the FFT of the noise within one bin is over that bin's bandwidth (e.g. 22.05kHz/512 = 43 Hz for CD-DA with 1024-point FFT). So from -96 dBfs, we can resolve sinusoids around 27 dB lower - say -123 dBfs, which would add to the equal noise power in that bin bandwidth, giving a bin power of -120 dBfs, standing out about the relatively flat noise.

The maximum sinusoid we could see would be full scale, i.e. 0 dBfs. So the dynamic range when analysed by a 1024-point FFT is about 120 dB.

1024-point FFTs on CD-DA aren't far from how the ear perceives sound (frequency and time resolution of the ear are in the same ball park as those of the 1024-point FFT), so you might argue that for humans, CD-DA has around 120 dB of usable dynamic range (assuming adequate flat-spectrum dither), though the signal gradually and gracefully sinks into the noise floor without suddenly disappearing.

However, one can frequency-shape the noise spectrum while remain adequately dithered to concentrate the noise into high frequencies where the ear is insensitive and reduce the noise in the areas of peak sensitivity by perhaps 15-18 dB. Thus, the useful dynamic range of noise-shaped CDs might be in the region of 135-138 dB in the frequency regions where the ear tends to have it's widest dynamic range.

If you're not a human ear, you can go for much longer FFT lengths (smaller frequency bins) and move that white noise per bin down further, thereby have a greater the dynamic range.

So, take your pick and decide whether dynamic range is relevant to you.

For 1024-point FFT pictures and audio files to demonstrate visibility / audibilty of signals above the noise floor with various kinds of dither see the old post I referred to previously.

I'd tend to say that all of the following are true:
  • True dynamic range of CD-DA is infinite providing it is adequately dithered, as tonal signals far below the apparent noise floor can be discerned correctly with sufficient averaging time (if synchronised) or autocorrelation averaging time (even if unsynchronised) or sufficient FFT size. Just name your dynamic range and we can calculate the required averaging time or FFT size.
  • The effective signal-to-noise ratio (SNR) of CD-DA for tonal signals with adequate spectrally white dither is probably about 120 dB thanks to the frequency-selectivity of the ear - similar to a 1024-point FFT.
  • With adequate strong ATH noise shaped dither, the SNR of CD-DA for tonal signals at frequencies where the ear is most sensitive, is approximately 135-138 dB at the expense of extra high noise floor (reduced SNR) at the mostly higher frequencies where the ear is far less sensitive to small signals anyway.
  • For undithered (i.e. truncation-quantized) reproduction - not the case for real CD audio these days unless very badly done - CD-DA has about 96 dB of dynamic range for tonal signals, though the quietest signal before it jumps to silence is a square wave, there's 96 dB range on the FFT bin for the fundamental frequency. If the signal is not a test tone (e.g. real music), there usually will be a degree of self dithering, and the dynamic range will vary from time to time as will the level of quantization distortion, depending on the actual signal.

To put this in context, most 18 inch professional chainsaws are labelled at about 116 dB above the threshold of hearing, not far from the pain threshold. That means that 120 to 138 dB should be more than adequate for music reproduction. Recording and processing may benefit from greater bit-depths, but 16-bit/44.1 kHz should be ample for playback.
2Bdecided
Note that, in his great explanation, Dynamic has snook (is that the past tense of sneak?) in yet another scale - something related to (but not exactly) dB / Hz - loudness per frequency bin.


I think the confusion arises when people think dB means something more than it does (i.e. think it's a unit of measurement, like an inch, which it isn't).

Even when they realise it's just an easy way of writing a ratio, they still think that somehow everything that can be expressed in dB must be related somehow. Well, sometimes it is - but that relationship can be quite complex - e.g. the relationship between dB in ReplayGain and dB FS - so complicated, that it's near useless to think of the two as being correlated in any way.


Anyway, back to ReplayGain: the scale is what the scale is - instead of trying to guess whether it'll clip or not from some vague understanding of it, or relating it to some other barely related scale, try this: look in the column marked "clip?" wink.gif

Cheers,
David.
Dynamic
QUOTE(2Bdecided @ Jun 3 2008, 19:32) *

Note that, in his great explanation, Dynamic has snook (is that the past tense of sneak?) in yet another scale - something related to (but not exactly) dB / Hz - loudness per frequency bin.


Oh yes. The unit of dB / Hz is commonly used in engineering but it comes with danger for misunderstanding or unclear thinking, for you don't divide the number of decibels by the bandwidth as the units alone would tend to suggest, but must convert out of the logarithmic domain into linear power units or linear power ratio, which is why my explanation had to subtract 10 * log (512) in the decibel domain to divide the white noise power equally among 512 bins in the power spectrum.

I hope this small digression is acceptable in the Scientific/R&D sub-forum.

As David has pointed out the while dB based units can be really handy you need to be careful to think about the fact it's just a power ratio expressed in logarithmic terms and that to do any other mathematics based on these figures, you should convert into the linear domain first.

You must also bear in mind the reference level you are implicitly making the ratio relative to.

Furthermore, bear in mind that to Joe Public, there's little understanding of dB's peculiarities (rather like other logarithmic scales, such as the Richter scale for earthquake power, or like non-absolute linear scales like temperatures in °C or °F). A 4.0 earthquake is 1000 time less powerful than a 7.0 earthquake, I believe. It's not "twice as hot" in Madrid as in Oslo when it's 32°C and 16°C respectively (90°F and 61°F respectively, or in absolute temperature, 305 Kelvin and 289 Kelvin respectively).
udauda
Thank you so much, Dynamic. I humbly appreciate your great input. Now I see what role a dithering plays in digital audio. We should post this kind of information on HA Wiki for future reference usage!! biggrin.gif

However I am not so clear about this: how come a CD recording can clip when it has a dynamic range way wider than 96dB? Does a digital recording have a limited amount of volume it can output???

I must ask you guys a pardon for my endless questions regarding clipping/SNR/DR/etc, but some of these concepts just don't seem to elaborate well with each other in my brain. The learning curve is darn steep... crying.gif

lvqcl
QUOTE(udauda @ Jun 7 2008, 05:18) *

However I am not so clear about this: how come a CD recording can clip when it has a dynamic range way wider than 96dB? Does a digital recording have a limited amount of volume it can output???


Read this: http://en.wikipedia.org/wiki/Loudness_war
QUOTE
However, as the maximum amplitude of a CD is at a fixed level, the overall loudness can only be increased by reducing the dynamic range. This is done by pushing the lower level program material higher while the loudest peak sounds are either destroyed or severely diminished. Certain extreme uses of compression can cause distorting or clipping the waveform of the recording.

udauda
QUOTE
However, as the maximum amplitude of a CD is at a fixed level, the overall loudness can only be increased by reducing the dynamic range. This is done by pushing the lower level program material higher while the loudest peak sounds are either destroyed or severely diminished. Certain extreme uses of compression can cause distorting or clipping the waveform of the recording.

Thank you, lvqcl.
So.. a CD does have a limited amplitude it can output then. What is the exact value of the maximum amplitude of a CD?? So if you dither a clipping sample adequately, can you make it not to clip? (By increasing SNR?)



lvqcl
QUOTE(udauda @ Jun 8 2008, 04:02) *

So.. a CD does have a limited amplitude it can output then. What is the exact value of the maximum amplitude of a CD??


Since CD has 16 bit, max. amplitude is +32767 or -32768. And, it is 0 dBFS by definition.

QUOTE(udauda @ Jun 8 2008, 04:02) *

So if you dither a clipping sample adequately, can you make it not to clip? (By increasing SNR?)

No, of course. Dithering and noiseshaping can alter noise level, not max. level. Well, look at waveform in ANY audio editor.
udauda
I will try to explain how I interpreted all these. If anything is wrong, please correct it:

CD (16bit/44kHz) has a max amplitude. Its dynamic range is originally 96dB. Dithering/noiseshaping can expand this range further, like upto 138dB, but dithering/noiseshaping can't expand a CD's physical limitation. (96dB) That is why a CD recording clips.



BTW, I found a previous discussion in HA regarding CD's true dynamic range:

http://www.hydrogenaudio.org/forums/index....showtopic=45165

lvqcl
QUOTE(udauda @ Jun 9 2008, 08:17) *

I will try to explain how I interpreted all these. If anything is wrong, please correct it:

CD (16bit/44kHz) has a max amplitude. Its dynamic range is originally 96dB. Dithering/noiseshaping can expand this range further, like upto 138dB, but dithering/noiseshaping can't expand a CD's physical limitation. (96dB) That is why a CD recording clips.


CD recording clips (like at these pictures) only if it was deliberately mastered so. Anybody can make CD that doesn't clip.
And I think it's better to say about 16 bit as a limitation, not 96 dB.

By the way: Wikipedia (http://en.wikipedia.org/wiki/Compact_Disc) says that "There was a long debate over whether to use 14 bit (Philips) or 16-bit (Sony) quantization..." huh.gif
This is a "lo-fi" version of our main content. To view the full version with more information, formatting and images, please click here.
Invision Power Board © 2001-2008 Invision Power Services, Inc.