Help - Search - Members - Calendar
Full Version: TMN and NMT in psymodels
Hydrogenaudio Forums > Hydrogenaudio Forum > Scientific Discussion
Garf
Hi all,

most common descriptions of psymodels I have seen, for example in "Transform Coding of Audio Signals Using Perceptual Noise Criteria" by Johnston or the MPEG standards, use the Tone-Masking-Noise (+- 18-29dB) and Noise-Masking-Tone (+- 6dB) thresholds for calculation of the required SMR per band.

However, the signal we are trying to mask is the quantization noise. This leads to the question: why do we use a Noise-Masking-Tone measure? We're not trying to mask a tone at all. Specifically if the base signal is noise, it seems improbable for the introduced quantization noise to have a tonelike structure.

It seems that the more natural measure would be to use NMN thresholds, but I can't find anything related to this in the literature. Painter & Spanias give a NMN of 26dB but note that the exact amount depends on the phase relationships between the two signals (which I don't understand, since it's supposed to be noise?).

So, what's the justification for using NMT as a metric of how much quantization noise to introduce in noisy signal sections?
Gabriel
The noise and tone in NMT and TMN are referring to the tones or noise signals in the original signal, prior to lossy requantization. They are different from the introduced (re)quantization noise.

For tonality based models, there is usually no pure NMT and TMN, but rather SomethingMaskingSomething, with values going from NMT to TMN. NMN would be an in-between value (but close to TMN).

The TMN and NMT vocabulary might comes from Zwicker, which provided experimental data for both TMN and NMT (ie both extreme cases).

Garf
QUOTE
The noise and tone in NMT and TMN are referring to the tones or noise signals in the original signal, prior to lossy requantization. They are different from the introduced (re)quantization noise.


I know what they are supposed to refer to, but that is not what they are being used for, is it? All the psymodels use these calculations to determine the effective amount of noise they can introduce per band. This means that the signal we are trying to mask is the introduced noise, and not something in the original, correct? (It's the only way things seem to make sense, either)

After all, at no point do the psymodels try to determine what the signal is that is being masked. They calculate the tonality of a band, but this is really the tonality of the strongest signal, i.e. the masker, and not the (weaker) maskee.

This is why I think the correct thing to use would be NMN and TMN, and not NMT.

However, I can't mix this with the NMN value in Painter & Spanias, and with your statement:

QUOTE
NMN would be an in-between value (but close to TMN).


If we don't care what the nature of the signal is that is being masked, then things still work as long as NMT and NMN are close, but it seems they are not (6dB vs >20dB).
Gabriel
TMN and NMT "concepts" are anterior to modern lossy coders. T and N are referring to the input signal characteristics, and not the quantization noise we are introducing.

You are right, most models only care about the tonality of the masker and not about the maskee one. In reality, we are using TNSomething and NMSomething.
This could probably improved, but the biggest influence is based on masker characteristics. The maskee characteristics seems to have a lower influence on masking.
Stricto senso, you are right: TMN and NMT words are usually wrongly used.
Garf
QUOTE(Khushrenada @ Mar 7 2006, 07:58 PM)
Why not "TMNT"?

sorry...
*



???
Garf
QUOTE(Gabriel @ Mar 7 2006, 06:51 PM)
The maskee characteristics seems to have a lower influence on masking.
*



I think we understand each other, but we're still left with this problem:

1) NMT: 6dB
2) NMN: 26dB

Which contradicts the above. Either the NMN is wrong, and there's no problem. Or the NMN is right, and we need to do something to our psymodels smile.gif
Gabriel
6dB for NMT is unusually low. In Lame (NSPsytune) it is about 17dB (TMN: 8dB)
Garf
psymodel.h (some recent LAME version)

#if 1
/* AAC values, results in more masking over MP3 values */
# define TMN 18
# define NMT 6
#else
/* MP3 values */
# define TMN 29
# define NMT 6
#endif

mppenc.c (Musepack)

CODE

static const Profile_Setting_t  Profiles [16] = {
   { 0 },
   { 0 },
   { 0 },
   { 0 },
   { 0 },
/*    Short   MinVal  EarModel  Ltq_                min   Ltq_  Band-  tmpMask  CVD_  varLtq    MS   Comb   NS_        Trans */
/*    Thr     Choice  Flag      offset  TMN   NMT   SMR   max   Width  _used    used         channel Penal used  PNS    Det  */
   { 1.e9f,  1,      300,       30,    3.0, -1.0,    0,  106,   4820,   1,      1,    1.,      3,     24,  6,   1.09f, 200 },  // 0: pre-Telephone
   { 1.e9f,  1,      300,       24,    6.0,  0.5,    0,  100,   7570,   1,      1,    1.,      3,     20,  6,   0.77f, 180 },  // 1: pre-Telephone
   { 1.e9f,  1,      400,       18,    9.0,  2.0,    0,   94,  10300,   1,      1,    1.,      4,     18,  6,   0.55f, 160 },  // 2: Telephone
   { 50.0f,  2,      430,       12,   12.0,  3.5,    0,   88,  13090,   1,      1,    1.,      5,     15,  6,   0.39f, 140 },  // 3: Thumb
   { 15.0f,  2,      440,        6,   15.0,  5.0,    0,   82,  15800,   1,      1,    1.,      6,     10,  6,   0.27f, 120 },  // 4: Radio
   {  5.0f,  2,      550,        0,   18.0,  6.5,    1,   76,  19980,   1,      2,    1.,     11,      9,  6,   0.00f, 100 },  // 5: Standard
   {  4.0f,  2,      560,       -6,   21.0,  8.0,    2,   70,  22000,   1,      2,    1.,     12,      7,  6,   0.00f,  80 },  // 6: Xtreme
   {  3.0f,  2,      570,      -12,   24.0,  9.5,    3,   64,  24000,   1,      2,    2.,     13,      5,  6,   0.00f,  60 },  // 7: Insane
   {  2.8f,  2,      580,      -18,   27.0, 11.0,    4,   58,  26000,   1,      2,    4.,     13,      4,  6,   0.00f,  40 },  // 8: BrainDead
   {  2.6f,  2,      590,      -24,   30.0, 12.5,    5,   52,  28000,   1,      2,    8.,     13,      4,  6,   0.00f,  20 },  // 9: post-BrainDead
   {  2.4f,  2,      599,      -30,   33.0, 14.0,    6,   46,  30000,   1,      2,   16.,     15,      2,  6,   0.00f,  10 },  //10: post-BrainDead
};


I think you got em reversed. Can you see my problem?
Gabriel
Oops! Of course I got them reversed.
kwwong
QUOTE(Garf @ Mar 8 2006, 04:44 AM)
psymodel.h (some recent LAME version)

#if 1
    /* AAC values, results in more masking over MP3 values */
# define TMN 18
# define NMT 6
#else
    /* MP3 values */
# define TMN 29
# define NMT 6
#endif


I think the TMN differences between AAC and MP3 is due to the fact that in AAC psymodel, there is a much more sophisticated handling of the binaural masking effect than in the MP3 psymodel.

At frequencies above 10 Khz, the required TMN value is just about 18 dB compared to lower frequencies which could be as high as 30 dB.

MP3 just assumed that the TMN value is uniform throughout the entire frequency bands, taking the worst case situation. (29 dB)
Gabriel
More precisely kwwong is speaking about the ISO demonstration algorithms, not the formats themselves.
Garf
QUOTE(kwwong @ Mar 9 2006, 04:58 AM)
I think the TMN differences between AAC and MP3 is due to the fact that in AAC psymodel, there is a much more sophisticated handling of the binaural masking effect than in the MP3 psymodel.

At frequencies above 10 Khz, the required TMN value is just about 18 dB compared to lower frequencies which could be as high as 30 dB.

MP3 just assumed that the TMN value is uniform throughout the entire frequency bands, taking the worst case situation. (29 dB)
*



I'm not so sure this is the reason. The "recommended" values in the standard could just be almost randomly picked like so many other things in the informative part.

Specifically, the Johnston paper actually has the TMN increase over the Bark range, from 15dB at Bark 1 to 40dB at Bark 25. This is exactly the other way around from what you say. At that point I doubt BMLD was being considered, but adding it still wouldn't produce the shape your explanation produces.

There are other reasons to prefer 18dB in this situation. It's easier to attain at 96-128kbps, meaning that the ISO noise allocation loops system works better.

But I don't care about the exact value of TMN or why it differs in the standard; I'm getting the strong impression that the entire tonality + TMN/NMT thing isn't based on starting with known TMN/NMT and working from there, but just a heuristic that was found to work well, and for which an explanation was produced after it turned out to work well.
Gabriel
QUOTE
But I don't care about the exact value of TMN or why it differs in the standard; I'm getting the strong impression that the entire tonality + TMN/NMT thing isn't based on starting with known TMN/NMT and working from there, but just a heuristic that was found to work well, and for which an explanation was produced after it turned out to work well.

I think that your are partially right on this point.
TMN and NMT measurements were known before the work on modern codecs. We had them from at least Zwicker's work, and this was older.
However, taking into consideration the tonality of the maskee would seriously complicate the spreading function of ISO demonstration algorithms. It is likely that at this step, a simplification was made, but TMN and NMT values were still presented as an "official" explanation, even though a little bit of koocking was introduced there.

Wanting to keep trade secrets is also a possibility, although this is purely hypothetical speculation.
kwwong
QUOTE(Garf @ Mar 9 2006, 02:58 AM)
Specifically, the Johnston paper actually has the TMN increase over the Bark range, from 15dB at Bark 1 to 40dB at Bark 25. This is exactly the other way around from what you say. At that point I doubt BMLD was being considered, but adding it still wouldn't produce the shape your explanation produces.


Well, Johnston's paper is for a slightly different spreading function slope implementation than the ISO model.

In the ISO model, both slopes of the spreading function are almost identical and much steeper whereas the Johnston's model uses unsymetrical spreading function slopes. That explained why the TMN implementation of the Johnston model differs from the ISO model.

I think Johnston would have already accounted for BMLD in his model, only that the modelling isn't as sophisticated as that of the ISO AAC psychomodel.
Garf
QUOTE(kwwong @ Mar 10 2006, 05:42 AM)
QUOTE(Garf @ Mar 9 2006, 02:58 AM)
Specifically, the Johnston paper actually has the TMN increase over the Bark range, from 15dB at Bark 1 to 40dB at Bark 25. This is exactly the other way around from what you say. At that point I doubt BMLD was being considered, but adding it still wouldn't produce the shape your explanation produces.


Well, Johnston's paper is for a slightly different spreading function slope implementation than the ISO model.

In the ISO model, both slopes of the spreading function are almost identical and much steeper whereas the Johnston's model uses unsymetrical spreading function slopes. That explained why the TMN implementation of the Johnston model differs from the ISO model.


This seems to be correct, i.e. more spreading means lower effective SMR needed in higher frequency parts (for most typical signals). But it's weird to mix intra and inter band masking in such a way to get some BMLD protection in such a highly roundabout manner. (Another heuristic that happens to work?) All in all I have serious doubts about this explanation.

QUOTE
I think Johnston would have already accounted for BMLD in his model, only that the modelling isn't as sophisticated as that of the ISO AAC psychomodel.
*



Thing is, the TMN at the lowest level is 15dB, 40dB at the highest level and the spreading function works mostly from low to high frequencies, so the model doesn't produce the wanted effect. Given that later models have BMLD explicitly accounted for, I don't believe this.

Another reason why I don't believe it is that PXFM was a mono codec and the results in the paper are for mono signals smile.gif
Woodinville
QUOTE(Gabriel @ Mar 7 2006, 01:29 AM)
The TMN and NMT vocabulary might comes from Zwicker, which provided experimental data for both TMN and NMT (ie both extreme cases).
*



Well, I suspect that TMN came from Scharf's work using the Bark scale, where tone masking noise rises with critical band number.

I also suspect that NMT comes from a survey paper by Hellman.

I've heard that NMN is, at lowest, a bit smaller than NMT, but that the correlation of the noise sources makes this a very twitchy subject.

Also, is quantization noise in a coder "noise" or is it not? It's not dithered, we are, after all, trying to get rid of information, aren't we?

(edited to fix confusing Zwicker with Scharf. Oh well.)
Woodinville
QUOTE(Gabriel @ Mar 9 2006, 01:27 AM)
Wanting to keep trade secrets is also a possibility, although this is purely hypothetical speculation.
*



Oh, I'm sure that there was none of that in the MPEG-1 Committee. After all, the filterbank description is perfectly transparent! rolleyes.gif
Woodinville
QUOTE(kwwong @ Mar 9 2006, 07:42 PM)
I think Johnston would have already accounted for BMLD in his model, only that the modelling isn't as sophisticated as that of the ISO AAC psychomodel.
*



Personally, I think the limit of "everything tonal" at low frequencies might have been a hack to protect from BMLD problems. I also suspect that there was some resistance to various issues around BMLD in the MPEG-1 Timeframe, and that might account for differences.

Finally, which MPEG-1 psych model do you mean? The two are substantially different.

Now, in the AAC model, I dare say that the idea of BLMD was addressed, but the idea of having BLMD-like behavior for signal envelope at higher frequencies seems to have been somewhat neglected.
Woodinville
QUOTE(Garf @ Mar 10 2006, 12:41 AM)
Another reason why I don't believe it is that PXFM was a mono codec and the results in the paper are for mono signals smile.gif
*



Well, actually, PXFM was an M/S coder, I believe.
Woodinville
QUOTE(Garf @ Mar 9 2006, 12:58 AM)
Specifically, the Johnston paper actually has the TMN increase over the Bark range, from 15dB at Bark 1 to 40dB at Bark 25. This is exactly the other way around from what you say. At that point I doubt BMLD was being considered, but adding it still wouldn't produce the shape your explanation produces.
*



Those TMN numbers are given by Scharf, actually, in Das Ohr.

I believe that later work moved to using ERB's rather than the Bark Scale, and a fixed 30dB-ish TMN more like Jont Allen's work. Or at least some people moved in that direction.

All of you appear to be leaving out the issue of ERB vs. Bark Frequency. Also, if one wishes to test this 17dB assertion, one should make an AM and an FM signal in one ERB range, and try 17dB SNR on this signal, eh?

Garf
QUOTE
Personally, I think the limit of "everything tonal" at low frequencies might have been a hack to protect from BMLD problems. I also suspect that there was some resistance to various issues around BMLD in the MPEG-1 Timeframe, and that might account for differences.


I remember a paper from Frank Baumgarte that comes just short of calling BMLD "fiction produced due to playing with artificial signals", so I think I can see what you're getting at.

QUOTE(Woodinville @ Mar 14 2006, 11:13 PM)
I believe that later work moved to using ERB's rather than the Bark Scale, and a fixed 30dB-ish TMN more like Jont Allen's work. Or at least some people moved in that direction.


But what about NMT in the ERB scale? Or spreading? Any references for that?

One could recalculate them from the known values in the Bark scale, but that would be working the wrong way around.

QUOTE
All of you appear to be leaving out the issue of ERB vs. Bark Frequency. Also, if one wishes to test this 17dB assertion, one should make an AM and an FM signal in one ERB range, and try 17dB SNR on this signal, eh?
*



I'm not sure what "17dB assertion" you're referring to, or what you're trying to make clear, generally.
Woodinville
QUOTE(Garf @ Mar 20 2006, 01:55 PM)
But what about NMT in the ERB scale? Or spreading? Any references for that?


Well, I've heard that spreading and NMT both are constant in the ERB domain as well. You have to change the spreading values a bit.
QUOTE


One could recalculate them from the known values in the Bark scale, but that would be working the wrong way around.


For signals where there isn't any co-articulation, stick to the usual values, is my guess.
QUOTE

I'm not sure what "17dB assertion" you're referring to, or what you're trying to make clear, generally.
*



17dB was asserted somewhere upstream to be ok for TMN at high frequencies.

Try that sometime for a sine wave smile.gif
Ivan Dimkovic
QUOTE
Try that sometime for a sine wave smile.gif


I did long time ago - it's a no-go wink.gif
This is a "lo-fi" version of our main content. To view the full version with more information, formatting and images, please click here.
Invision Power Board © 2001-2008 Invision Power Services, Inc.