A better way of measuring dynamic range?

2007-02-24 19:41:09

I've been worried that most people who complain about the loudness wars are using technically incorrect metrics. More specifically, what is being lost in the loudness wars is not the "lack of loudness" but dynamics. But everybody keeps using ReplayGain numbers and PCM waveform plots - which have no direct relation to dynamics, as they only properly measure loudness/peaks, and not their ranges. ReplayGain uses the 95th percentile of loudness; Audacity will fill a signal plot up to the peak of the signal without taking the rest of the samples into account.

I think that having a better way to estimate loudness would be useful in a few situations. At the very least, it would provide more accurate information on the victims of the loudness war, and more objective information than subjective evaluations. Listerners who are looking for music of a specific dynamic range (either very high for system testing, or very low for background music) could use the information.

The "canonical" way to estimate dynamic range, as I understand it (and note that I haven't consulted any audio books on this so I'm not particularly knowledgable - correct me if I'm wrong), is to compute the rms peak-to-average ratio. I see two problems with this. The biggest problem is that the measurement is very dependent on the rms block length. Much heavily-compressed music will still show a lot of dynamics with a common block length like 50ms, even though these dynamics may not be audible due to masking. Conversely, using a very long block size (10 seconds or more) will correctly identify compressed music as having extremely little dynamic range, but ignores shorter-term dynamics that may still be audible. Also, most peak-to-average measurements do not use any form of loudness equalization, and while that's obviously not a well-solved problem, one should at least try to take it into account IMHO. Finally, if the loudness of the signal is not normally distributed, then large changes in the distribution below the 50th percentile could compromise the accuracy of any single-number measurement.

I think I've come up with a scheme to solve all of these problems, although I haven't worked out all the kinks yet. Basically, I'm using three different block lengths at the same time, to compute three different loudness estimations: one for short-term transients, one for time scales on the order of one beat, and one for long-term loudness changes. Right now the block lengths are 0.1s, 1s and 10s respectively, although they could change. For each time scale, I highpass filter at 10hz and apply the ITU 1770 loudness filters, the pre-filter shelf and RLB, to give a rudimentary loudness equalization. the numbers honestly don't change much when I disable those filters though. Finally, I generate a histogram plot of RMS energy for each block length and measure the range between the 90th and 10th percentiles. These three numbers describe the dynamic range of the signal over all important time scales.

And if nothing else, the plots obtained through this analysis look a lot prettier and informative than Audacity plots....

Here are some examples. The first plot of two is the loudness vs time, the second plot is the cumulative distribution of loudness - but note that the x-value of 140 corresponds to 0db (haven't fixed the x-axis yet). The white plot is 10s, the red plot is 1s and the grey plot is 0.1s.

John Mayer, "Waiting On The World To Change", ReplayGain -7.96dB. Dynamic range estimated at 7.4dB (10s), 11.15dB (1s), 21.85dB (0.1s).

Pierre Boulez and the Chicago Symphony Orchestra, "Ionisation" composed by Varese. ReplayGain +9.04db. Dynamic range estimated at 43.7dB (10s), 50.49dB (1s), 54.97dB (0.1s).

Merzbow, "I Lead You Towards Glorious Times". Note that because of the filtering and the intense mastering, the loudness plot is over 0db. ReplayGain -20.64dB. Dynamic range estimated at 0.78db (10s), 1.15dB (1s), 2.05dB (0.1s).

Note how in the Merzbow, the signal is compressed at such a small time scale that there is not much difference between the three measurements, while in the pop music (John Mayer), the extensive dynamics at short time scales goes away at long time scales. The difference in measurements between 10s and 0.1s reflects the loudness coherence. Music which has very fast loud/soft transitions, but the same rough loudness level across the entire piece, will show much more dynamic range at the 0.1s scale than on the 10s scale.

What do all of you think? Is this something worth pursuing further? Alternatively is this just a fishing expedition and I should just do a straight-up implementation of a more common dynamic range measurement?

A better way of measuring dynamic range?

Reply #1 – 2007-02-26 00:24:38

Interesting idea.

It might be possible to come up with words to name the measures. Typically in classical music (and much other sheet music, especially piano music) the longer-term dynamics consist of the range including pianissimo, piano, mezzo-piano, mezzo-forte, forte and fortissimo. Crescendo and diminuendo would perhaps tend to be included as medium term dynamic variations (a gradual change over a few seconds). Sforzando would typically be instantaneous, punchy dynamics (and "transients").

I guess one might also consider changes of tempo and note duration to be part of "dynamics" available for artistic effect as defined by musicians, rather than the definition of dynamics as a purely loudness-based phenomenon that we tend to use..

In the regard to relating these measures to named characteristic, I have another thought:

It's reasonably common to refer to the "punch" of a recording. While this is closest to your short-term measure, your reading is the difference between the 10th and 90th percentile. In the case where a song consists of, for example, a quiet intro with light orchestration followed by a much louder ending (I'm thinking of Aretha Franklin's dynamic version of Tracks Of My Tears), I would presume that the 10th percentile would surely come from the very soft first half of the song, while the 90th percentile would come from the lound last part of the song, so I wouldn't be happy to describe your 0.1 second measure as "punch".

If I were interested in the degree of "transient punch" or "smoothness" in the rhythmic nature of the track, I'd be looking at a different measure to the short-term measure you chose. It might take a short instantaneous measure of loudness (over perhaps 25 to 50ms) and compare the peak and trough or peak and average, or perhaps various percentiles measured over averaging times of a few seconds throughout the song. One could report an instantaneous "punch" value that may vary throughout the song (and could even be displayed in an audio player, just as variable bitrate or a rolling spectrogram can be). Or that perceived loudness ratio (in dB), known as "transient punch" could be averaged or ranked in a cumulative distribution over the whole song to give an overall representative figure for "transient punch".

Other words I can think of for long-term dynamic variations would include "light-and-shade" (or perhaps "contrast", keeping up the visual metaphor), and this could apply to 10s averaging time. A seafaring metaphor, such as "swell" might be adequate to describe the medium term (1s averaging) dynamic variation."

A better way of measuring dynamic range?

Reply #2 – 2007-02-26 01:59:29

Are you doing your programming in LabVIEW? Why

Notice