DSP Loudness Control

First, the Fletcher-Munson data is not accurate. See the work of S. S. Stevens, much better. Ever notice how a Fletcher-Munson loudness compensation control never sounds right? That's because they got it wrong to begin with.

That's true, but you're putting up a straw man. I haven't said which "loudness curve" I based my reasoning on, either here or in the thread I referenced. In fact, I used ISO 223:2003, which was referenced by someone else early in the thread.
Is Stevens still about? I haven't seen any work from him for many years.

Second, Woodinville is right, it IS a hard problem, it's not a fixed curve, and it is highly dependent on the specific acoustic play level, so it has to be dynamic.

Ah, yes, but that doesn't make it hard to solve, at least approximately. Say you increase the level at 1 KHz by 6 dB. The change required to produce a similar perceived level increase at, say, 20 Hz is about 3 dB. This ratio holds true over a wide range of phons. So all you need is a coupled level and bass equalisation control that, for every 6 dB of level reduction, adds equalisation resulting in 3 dB of boost at 20 Hz relative to 1 KHz. (Every time the midrange level drops by 6 dB, the bass level drops by 3 dB).

Next...why number them...BIG assumption that everybody mixes to a standard level in a standardized monitoring environment. Not in the music industry! Film, yes, but not music. And that -20dbFS would be nice, but doesn't happen after mastering, especially pop stuff. Not even close. Pretty much have to ignore dbFS in this case, it's not relevant. System acoustic play level is though. But in the context of correcting for differing hearing response at differing levels. You're in no way matching the mix environment, there's just no way to know what it was, and it's not important anyway.

Standardised, or at least similar, monitoring levels are more common than you might think once you move up the ladder a bit. Ask Bob Katz, he could "bore for Africa" on the subject. And the -20 dBFS is relevant for monitoring when mastering, it has no relevance to the final released media level. As for matching the mix environment, try it yourself, assuming you have a competent reproduction chain. For most genres other than the highly artificial (electronica etc), there is a definite SPL at which they sound "right". So even though it may not match the mix environment levels, it sounds balanced to you on your system.

No, you can't do it based on a volume control setting. Been tried by many people for many years, but it doesn't work. The reason is simple: the correction required is dependent on SPL, which a volume control may influence but doesn't predict and is not the only thing that affects it. Hotter signal into it, and you turn it down, but that would change the compensation inappropriately. There were even attempts to calibrate the compensation by adding another control, but it doesn't work because program dynamics are not fixed. No, the correction must be tied to specific SPL, not a control setting. That's actually where many people trying this messed up.

You need two controls. One to set the initial volume level so that it sounds "right", then the coupled control to change the volume to the setting you want to listen at. In theory you would need to do this for each track, or at least each album, but in practice most sources of a given genre and age have similar levels. If you play old vinyl, you should be familiar with the way that the majority of LPs end up being played within a relatively small arc of the volume control. Ditto but different setting for old CDs, and again for current "loudness war" CDs. Apple's Soundcheck and MP3 Replaygain standardise the levels even more.

... And finally, it's been done, and done quite well. It's called Audyssey Dynamic Volume and Dynamic EQ. Rather than base their idea on existing loudness research, their algorithm is based on what was essentially reverse-engineering human loudness perception. They took LOTs of data on lots of subjects, with lots of different program material and the result is pretty darn good. The big advantage is, once an Audyssey system has been calibrated it knows the exact SPL at every moment regardless of volume control setting or variations in program material, so it can apply the right correction dynamically. Pretty darn smart, those guys.

... and missing the point when it comes to music dynamics. Chris acknowledges that the Audyssey dynamically changes the EQ in response to changing program levels. But this is exactly what you do not want when listening to music. As I said elsewhere:
".... Take Ravel's "Bolero". The double bass initially comes in while the levels are still moderate. The loudness of the bass is chosen to be audible but not overpowering. As the piece progresses and the overall levels rise, the bass level also rises but still in proportion to the rest of the players - if the overall level rises by 10 dB, the bass level rises by somewhat less. The point is that "loudness compensation" is built into music by the composers / musicians / mix engineers, and if you make a static adjustment to the volume, you only need to make a static adjustment to the loudness compensation. The rest is already taken care of in the music. "

And why I think loudness compensation is needed:
"... In my opinion, music is best listened to at the SPL at the listener position that it was created for. (Creation may mean the original performance, or the engineer's creation of a mix of separate components recorded at different times in different acoustics - or no acoustic at all in many cases.) If we normally listened at this SPL, there would be no need for any loudness compensation. But we do like to listen at different levels for several good reasons, and when we do so we no longer hear the intended tonal balance. Many of us like to adjust the tonal balance at our chosen listening level so that it is similar to the perceived tonal balance at the "correct" level. Done properly, we find this adjustment effective and pleasing. It is an effective mitigation of the degradation forced by having to listen at a different level to that which the work was intended for.
..."

And on the topic of tonal balance change with level:
"... In the specific case of loudness compensation, we aren't correcting for human hearing deficiencies. We're compensating for deficiencies in the reproduction environment.

In a "live" situation, if we move away from the source we experience an overall level decrease. In addition, the treble decreases somewhat faster than the midrange, and the bass somewhat less. We perceive this as a natural tonal balance change, which needs no correction.

In a reproduction scenario, if we reduce the volume by a similar amount, the tonal balance does not change. Compared to a distance increase, we have too much treble and not enough bass. We perceive this as unnatural. This is why I believe in leaving the HF compensation alone and just boosting the bass. The natural change in HF sensitivity of the ears, as illustrated by the "loudness curves", takes care of the required additional HF attenuation, so only the bass requires compensation. ..."

I suggest you read the referenced thread in DIYAudio, if you haven't done so already. All of the points you raised were also raised there.

DSP Loudness Control

Reply #13 – 2012-03-30 21:05:35

Quote from: Woodinville on 2012-03-30 21:05:35

First, the Fletcher-Munson data is not accurate. See the work of S. S. Stevens, much better. Ever notice how a Fletcher-Munson loudness compensation control never sounds right? That's because they got it wrong to begin with.

Whoa, there, the reason a "loudness control" doesn't work is simple, it doesn't work because it is not time varying (according to signal and absolute presentation level) or signal dependent.

Stevens' curves and Fletcher's curves are not far off, if you remember that one used open ear canals and one closed ear canals. Unsurprisingly, the frequency of the ear canal resonance shifts by approximately an octave as a result. No surprise there.

I wouldn't be so fast to dismiss Fletcher, especially since by any reasonable reading, Stevens is more confirmation than anything else. Claiming "Fletcher got it wrong" is just unjustifiable, and is almost as bad yellow journalism as the crap put forth in the article in Spectrum where it was asserted that Fletcher was trying to figure out how cheap AT&T could make transmission.

If you build a codec based on Fletcher's results (Using modern understanding), you get to AAC, via the original version of AT&T PAC, from before the 'trivestiture". That's not conjecture, that's personal experience.

And Splice, please realize that the thing you want to build must be signal dependent, and must be tied to absolute presentation level as a function of frequency. Signal dependency is not an option, it's a requirement.

DSP Loudness Control

Reply #14 – 2012-03-31 01:42:22

... And Splice, please realize that the thing you want to build must be signal dependent, and must be tied to absolute presentation level as a function of frequency. Signal dependency is not an option, it's a requirement.

I'm missing some crucial piece of understanding. Please bear with this "bear of very small brain" for a bit...
"Signal dependant" - do you mean in time or frequency?
To me, one implies a dynamic EQ that adjusts itself according to the current level or spectral content of the signal (e.g Audyssey processor), the other a static EQ, the curve of which is adjusted according to the auditory system behaviour described by the "equal loudness" curves.

"Tied to absolute presentation level as a function of frequency" - I interpret this as saying that each chosen presentation SPL must have a matching EQ curve. My assertion is that each *change* in presentation SPL requires a fixed *change* in the EQ curve. Almost, but not quite, the same thing.

"Signal dependency being a requirement" - I take that to be the first part of the process. With no EQ, adjust the listening SPL until the source sounds "right" or "natural" or otherwise sounds pleasing. Now use the "loudness" control to make all level adjustments after that.

(If I were to implement this in DSP instead of analog controls, I'd make it more user friendly by unidirectionally coupling the level and loudness controls for the initial adjustment.)

I think part of the understanding problem is the way that the function being compensated for is dynamic - the amount of compensation required changes as the absolute SPL changes, so how can it be compensated for by a statically adjusted EQ? As I tried to explain earlier, the spectral balance of the music is fixed at source - the "Bolero" example - so a fixed compensation is appropriate. You don't have to adjust the bass tone control as a piece of music goes from pianissimo to fortissimo - the musicians have done that for you already. All the "loudness" control does is mimic what happens when you walk from the front of the hall to the back - although a concert hall is a bad example, perhaps more like an open-air concert.

I've been procrastinating because of the difficulty of building such a circuit in the analog domain - not difficult for me, but a disincentive for anyone wanting to try it who doesn't have constructional skills. It occurred to me last night that I should try my hand at coding a foobar plugin implementation. It would make it easy for anyone wanting to try it out.

---------------
Regards,
Don Hills

DSP Loudness Control

Reply #15 – 2012-03-31 03:13:51

Perhaps my statement as to Fletcher-Munson getting it wrong was a bit to generalized. Their data was accurate for the conditions in which it was taken, and the test equipment available in that day. But, since those conditions included pure tones as stimuli presented as a frontal field in an anechoic space, the resulting curves don't represent the actual correction needed for real listening environments. The really unfortunate part of Fletcher-Munson is that the curves became widely adopted, but almost entirely misunderstood. They were applied as complete loudness correction curves, when in fact, they represent human hearing response (in those specific test conditions). Loudness compensation doesn't need to correct for human hearing response, it just needs to correct for the variance in response at differing levels.

Later research using more modern measurement equipment and more appropriate methods yielded better data. Yet even though adopted as a standard, Robinson-Dodson's data (pure tones again, but presented with headphones or random incidence) isn't as pertinent to real listening conditions as is really required, and they freely admit that fact in their paper.

There have been many, historically, who have attempted to characterize human loudness perception, some of them fairly well known (Zwicker, or the Bauer-Torick papers), and what's interesting is there reasonable correlation between all of them, particularly in that the ear response is anything but flat even at live music levels of 100dB or so. The exception, interestingly, is the Fletcher-Munson data, which shows response at 100dB that is much flatter than any other curve family. For that reason alone, the F/M data would not appropriate to apply in a loudness compensation scheme.

To complicate things, as most equal loudness curves go, even Fletcher-Munson, the high frequency portion of the curves above 1KHz are parallel, and so no adjustment is required in that range. But designers applying the F/M equal loudness data to a loudness comp circuit often used the entire curve! So we had boost at the top and bottom, and of course, the wrong amount at the bottom in any case.

Stevens work included a wide variety of stimulus methods, from diffuse, free-field, earphones, etc., and included several subjective quantities as well (annoyance, etc.). One of his test systems extended down to 1Hz! And while that's not useful for loudness compensation, it's notable since other research stops at 20Hz.

At the risk of repetition, it's important to note that the equal-loudness contours found in Stevens, F/M or any other do no represent the actual correction needed, but reflect hearing response. The correction system would actually apply a differential curve, which would, in fact, have to be dynamically variable, a fact easily seen on any equal loudness contour family.

So, I'm afraid I'll modify my Fletcher-Munson comment only slightly: Their data is valid for their test conditions, and considering the limitations of test equipment of the day. However, to consider it at all relevant to an actual loudness compensation algorithm would be an error. Perhaps that's more accurate than "they got it wrong", but you see what I mean.

As to the supposed loudness compensation built into music by composers (Ravel, et al), their hypothetical compensation is valid for only one listening position: the conductor's podium. Ever other seat in the house will hear something else. However, no seat will have a basic level change anywhere near 20dB. Yet that's the kind of level shifts we see in recorded music played in private listening conditions. With that kind of offset, and looking at any equal loudness contour curve family, anyone can see for this to work it must be dynamic and must operate with the knowledge of actual playback SPL. No fixed modifier would be correct at anything but one specific SPL.

The dual-control loudness compensation idea has been tried (Yamaha, late 1970s, early 1980s, Apt-Holman, 1978), but has not survived even though the Apt-Holman implementation actually applied correction based on the Stevens data. The reason is simple: people can't be depended upon to make continual subjective evaluation and apply correction. Two knobs might get you close, but only at one SPL (at least some music still has dynamic range), and one volume setting. The knob would require constant adjustment, something no listener will do.

Bob Katz has made some excellent inroads in studios, but there's decades of music already recorded and released without any of that, and still today volumes of music released without standardization. The film industry became standardized, at least in the high-fidelity sense, when Dolby Labs became involved. That's been 40 years. No, it's wild in music to this day, though getting better.

I don't now how else to make the point that compensation must be dynamic, but if all of the above doesn't do it, perhaps ask yourself: if it's so simple as to be a fixed, static correction, why at this point in history have we moved completely away from fixed-curve and dual control systems? Why do the pre-eminaet voices in this field all say it has to be dynamic? Must be something they know.

DSP Loudness Control

Reply #16 – 2012-03-31 03:40:11

Quote from: splice on 2012-03-31 01:42:22

I'm missing some crucial piece of understanding. Please bear with this "bear of very small brain" for a bit...
"Signal dependant" - do you mean in time or frequency?
To me, one implies a dynamic EQ that adjusts itself according to the current level or spectral content of the signal (e.g Audyssey processor), the other a static EQ, the curve of which is adjusted according to the auditory system behaviour described by the "equal loudness" curves.

You need frequency domain equalization (i.e. a filter curve) that varies with the signal (and of course frequency and presentation level), and where the actual gain of the system post-filter is known to a dB or so.

DSP Loudness Control

Reply #17 – 2012-03-31 03:44:24

Quote from: Woodinville on 2012-03-31 03:44:24

Perhaps my statement as to Fletcher-Munson getting it wrong was a bit to generalized. Their data was accurate for the conditions in which it was taken, and the test equipment available in that day. But, since those conditions included pure tones as stimuli presented as a frontal field in an anechoic space, the resulting curves don't represent the actual correction needed for real listening environments. The really unfortunate part of Fletcher-Munson is that the curves became widely adopted, but almost entirely misunderstood. They were applied as complete loudness correction curves, when in fact, they represent human hearing response (in those specific test conditions). Loudness compensation doesn't need to correct for human hearing response, it just needs to correct for the variance in response at differing levels.

Interestingly, F-M and Stevens disagree on loudness growth at low frequencies, and having built systems using both models for loudness growth, I've been much, much more successful with a variation on F-M than I have with Stevens (annoyingly that work belongs to long-former employer, not even a recently former employer, and it hasn't been put to any use at all).

As to the flatness concern, once you realize that the bandwidth of the critical bands emerges as a factor, F-M makes a great deal of sense, actually.

But, as far as loudness ratio, I've had much more success with loudness ratios using a model I can't talk about (snarl, hiss, grumble) very much that are derived from F-M. Fletcher and Munson show more loudness growth at threshold than Stevens, and that's also been my experience.

DSP Loudness Control

Reply #18 – 2012-03-31 06:20:39

Interestingly, F-M and Stevens disagree on loudness growth at low frequencies, and having built systems using both models for loudness growth, I've been much, much more successful with a variation on F-M than I have with Stevens (annoyingly that work belongs to long-former employer, not even a recently former employer, and it hasn't been put to any use at all).

As to the flatness concern, once you realize that the bandwidth of the critical bands emerges as a factor, F-M makes a great deal of sense, actually.

But, as far as loudness ratio, I've had much more success with loudness ratios using a model I can't talk about (snarl, hiss, grumble) very much that are derived from F-M. Fletcher and Munson show more loudness growth at threshold than Stevens, and that's also been my experience.

Ah, someone with real hands on, now that's a treat!

What would be your comment on why F-M didn't work historically? And why the Stevens-based systems worked markedly better, if much more rare? I'd have some ideas, but I'd rather hear it from someone who made a F-M system actually work.

This would fill in a few holes in what has been a 35 year hot topic for me. F-M always seemed to do too much in the LF in the classic realizations.

DSP Loudness Control

Reply #19 – 2012-03-31 07:56:00

(Excuse my trimming of quotes, I'm trying to keep post lengths down. If you think I've trimmed too much and misrepresented your points, please say so.)

... the resulting curves don't represent the actual correction needed for real listening environments. ... Loudness compensation doesn't need to correct for human hearing response, it just needs to correct for the variance in response at differing levels.

That's it exactly. You understand it here, but why do you remain skeptical at the end of your post?

To complicate things, as most equal loudness curves go, even Fletcher-Munson, the high frequency portion of the curves above 1KHz are parallel, and so no adjustment is required in that range. But designers applying the F/M equal loudness data to a loudness comp circuit often used the entire curve! So we had boost at the top and bottom, and of course, the wrong amount at the bottom in any case.

This is where so many seem to get it wrong. As you point out, there's no need to apply an EQ curve that's the inverse of a given "equal loudness" contour of whatever provenance. All that is needed is a correction to compensate for reproducing a source (music) at a different level than that it was originally performed / mastered for. Take a look at the ISO 226:2003 "equal loudness" curves. As a crude example, imagine you're listening to two tones - 20 Hz and 1 KHz - at the 60 phon level. You perceive them as equally loud, although the 1 KHz tone is at 60 dB SPL and the 20 Hz tone is at 110 dB SPL. Now you turn down the "volume" by 20 dB. This is equivalent to moving the 60 phon curve down to the 40 phon curve. The problem is that they don't match. You've lowered the 1 KHz signal from 60 dB SPL to 40 dB SPL, and you've lowered the 20 Hz signal from 110 dB SPL to 90 dB SPL. But you can see from the curves that a 20 Hz signal should be reproduced at 100 dB SPL to match the 40 dB SPL 1 KHz signal in loudness. In short, if you change the level at 1 KHz by x dB, you have to change the level at 20 Hz by x/2 dB. This is a ratio, not a fixed EQ. It doesn't need an absolute reference level to work.

Stevens work included a wide variety of stimulus methods, from diffuse, free-field, earphones, etc., and included several subjective quantities as well (annoyance, etc.). One of his test systems extended down to 1Hz! And while that's not useful for loudness compensation, it's notable since other research stops at 20Hz.

Other researchers have done work in the 1-20 Hz area recently. I have papers by Yeowart and Evans, and Moller and Pederson, but there may well be others.

As to the supposed loudness compensation built into music by composers (Ravel, et al), their hypothetical compensation is valid for only one listening position: the conductor's podium. Ever other seat in the house will hear something else. However, no seat will have a basic level change anywhere near 20dB. Yet that's the kind of level shifts we see in recorded music played in private listening conditions. With that kind of offset, and looking at any equal loudness contour curve family, anyone can see for this to work it must be dynamic and must operate with the knowledge of actual playback SPL. No fixed modifier would be correct at anything but one specific SPL.

I'm not proposing a fixed modifier. I'm proposing a fixed ratio (2:1 as a ballpark figure). If the loudness compensation (crudely, the bass level relative to the midrange level) is correctly set at one specific SPL, and the ratio is applied to any level change, then the compensation will also be correct at the new SPL.

The dual-control loudness compensation idea has been tried (Yamaha, late 1970s, early 1980s, Apt-Holman, 1978), but has not survived even though the Apt-Holman implementation actually applied correction based on the Stevens data. The reason is simple: people can't be depended upon to make continual subjective evaluation and apply correction. Two knobs might get you close, but only at one SPL (at least some music still has dynamic range), and one volume setting. The knob would require constant adjustment, something no listener will do.

I'm aware of the earlier schemes. They did not accurately couple (or in some cases couple at all) the level and "loudness compensation EQ" controls. I do. "Continual subjective evaluation" is thus not required, and the right amount of correction is applied regardless of the listening level.

Bob Katz has made some excellent inroads in studios, but there's decades of music already recorded and released without any of that, and still today volumes of music released without standardization.

Most of my listening is to various sub-genres of "rock". My vinyl collection spans some 20 years. Almost all of it plays back within a 20 degree arc of the volume control. My early CDs play back as a group, there's the 90s transition, then most of the last 10 years play back as another group. I accept that other genres may be more varied in their playback levels. My point is that it's not hard to establish a playback level that the music was intended to be heard best at, and this level doesn't vary all that much between like grouped sources.

Quote from: dc2bluelight on 2012-03-31 06:20:39

I don't now how else to make the point that compensation must be dynamic, but if all of the above doesn't do it, perhaps ask yourself: if it's so simple as to be a fixed, static correction, why at this point in history have we moved completely away from fixed-curve and dual control systems? Why do the pre-eminaet voices in this field all say it has to be dynamic? Must be something they know. ...

I remain unconvinced. I'm not proposing a "fixed, static" correction. My proposal is also different than any "dual control" system I have seen, and I have been looking hard. And if by "dynamic" you mean that the EQ adjusts itself based on the (varying) level of the source, then I disagree strongly. That would be equivalent to twiddling the bass tone control to match the loud and quiet parts of the music, and we just don't do that. (Well, I don't, anyway.)

One more time... My system has two knobs. As I originally envisaged it, one knob is more or less "set and forget" for a given genre and input source, especially if the source has Soundcheck or Replaygain. The other knob is the main "volume" control. Adjusting this control also applies the correct amount of "loudness compensation" for that volume. In concept, the bass tone control is ganged to the volume control. Where this differs from other schemes is that the ratio of bass to overall level is fixed, and matches the ratio inherent in the "equal loudness" curves.

An alternative scheme which may be more user friendly is to again have two knobs - one labeled "volume" and one labeled "bass", which actually sets the operating point of the loudness compensation. Adjust the volume control to your desired level, regardless of the original intended playback level, then adjust the bass control to your taste - "not too heavy, not too light". But behind the scenes, the two controls are actually linked, so any subsequent adjustment of the volume control automatically applies the correct level of loudness compensation.

DSP Loudness Control

Reply #20 – 2012-03-31 09:25:55

What would be your comment on why F-M didn't work historically? And why the Stevens-based systems worked markedly better, if much more rare? I'd have some ideas, but I'd rather hear it from someone who made a F-M system actually work.

Not having hands-on other Stevens systems, I suspect it was getting the skirts on the cochlear filters right. The upward spread that reduces loudness of higher frequencies near masking level can bite pretty hard if you don't get it right.

But that is a conjecture.

DSP Loudness Control

Reply #21 – 2012-03-31 09:27:49

An alternative scheme which may be more user friendly is to again have two knobs - one labeled "volume" and one labeled "bass", which actually sets the operating point of the loudness compensation. Adjust the volume control to your desired level, regardless of the original intended playback level, then adjust the bass control to your taste - "not too heavy, not too light". But behind the scenes, the two controls are actually linked, so any subsequent adjustment of the volume control automatically applies the correct level of loudness compensation.

For a given standard genre, this has a shot at working "ok", I think.

Unusual music will give it fits, though, I suspect.

DSP Loudness Control

Reply #22 – 2012-03-31 10:01:41

I remain unconvinced. I'm not proposing a "fixed, static" correction. My proposal is also different than any "dual control" system I have seen, and I have been looking hard. And if by "dynamic" you mean that the EQ adjusts itself based on the (varying) level of the source, then I disagree strongly. That would be equivalent to twiddling the bass tone control to match the loud and quiet parts of the music, and we just don't do that. (Well, I don't, anyway.)

Ok, but that's precisely what is required.

One more time... My system has two knobs. As I originally envisaged it, one knob is more or less "set and forget" for a given genre and input source, especially if the source has Soundcheck or Replaygain. The other knob is the main "volume" control. Adjusting this control also applies the correct amount of "loudness compensation" for that volume. In concept, the bass tone control is ganged to the volume control. Where this differs from other schemes is that the ratio of bass to overall level is fixed, and matches the ratio inherent in the "equal loudness" curves.

Yes, I understand what you are saying, but please understand that this has been done, and did not succeed because because of an error in concept outlined in your previous sentence, "the ratio of bass to overall level is fixed, and matches the ratio inherent in the "equal loudness" curves." The curve families show the ratio of bass to overall level is not fixed, it's a non-linear relationship. Because it's non-linear, every time you change the overall level, you operate at a point where the rate-of-change in bass sensitivity is different, and the lower the overall level the faster the rate of change in bass sensitivity. It's the rate-of-change problem that dictates the fact that compensation cannot be fixed. It must track the rate of change of bass sensitivity of the ear. The ear/brain system has what is essentially a volume expander that is both frequency and level dependent. The expansion ratio is dependent on the specific SPL as well as the specific frequency of stimulus. That's why it takes a family of equal loudness curves to show what's actually going on, and also takes something fairly complex and dynamic to perform the compensation. We're kicking around the details of what curve-set to follow in other posts, but they all have this non-linear ratio characteristic.