QUOTE(me)
The dynamic range of a selection of music is dependent on both estimating the time-varying loudness of the music and the timescale used for loudness evaluation. I propose a numerical method of estimating dynamic range that satisfies those dependencies using a modified ITU-R 1770 loudness filter and three moving windows to estimate loudness across three different timescales. The goal is to more accurately measure and compare dynamic range between different music genres and different masterings and processing techniques for the same music.
Summary of algorithm:
I've been kicking this around for almost a year, but I finally broke down and wrote the thing for real in an afternoon last November (it's been extensively tuned since then). The recent discussions about dynamic range have forced my hand, because so many important things were touched upon, and really, you can think of pfpf as an extremely elaborate reply to that topic.Summary of algorithm:
- Apply ITU-R 1770 filters to convert amplitude to instantaneous loudness.
- Estimate loudness across three different timescales by computing 10ms ("short term"), 200ms ("medium term") and 3000ms ("long term") windowed RMS power.
- Decouple timescales by scaling 10ms loudness by 200ms loudness, and 200ms loudness by 3000ms loudness.
- Threshold loudness at each timescale to remove silence (optional)
- Compute histogram for each loudness estimate
- Dynamic range = range between 50th and 97.7th percentile, for each timescale
This is a better way to measure dynamic range, for the following reasons:
- It measures dynamic range as a ratio of loudnesses. Peak-to-average cannot claim this (it is fundamentally a comparison of two different units). ReplayGain comparisons cannot claim this.
- It uses a real loudness model (flawed though it is) for the basis of loudness estimation. Waveform comparisons (especially for loudness-war-related discussions) are fundamentally flawed for this reason - what you get out of Audacity has a relatively tenuous connection to real perceived loudness.
- Dynamic range is estimated across three different timescales - 3000ms, 200ms, and 10 ms - and each scale is fully decorrelated from each other. So pfpf can tell between when a quiet passage has a loud transient, or when a loud passage has a sudden pause. The timescales are configurable.
- It uses a percentile approach on a histogram for estimating dynamic range, instead of min/max/avg. This makes the technique much more resilient to differences in mastering and medium; pops and ticks should not affect results, nor should small bits of digital silence, like in greynol's Tool example. (Yes, greynol, you can distinguish ppp from fff now.) The percentiles are configurable.
- Background noise (when no music is playing) can be masked with a fixed threshold, so that silence won't pile up on one side of the histogram distorting the numbers, and the results should be invariant of any extra silence padding before/after music (this should make CD/vinyl comparisons a lot easier). The threshold is configurable.
