Help - Search - Members - Calendar
Full Version: TNS and filterbanks
Hydrogenaudio Forums > Hydrogenaudio Forum > Scientific Discussion
kwwong

A transient when transformed by the MDCT would result in periodic coefficients in the MDCT domain. As a result, linear prediction across these MDCT coefficients is easily done. That is the foundation of the TNS (Temporal Noise Shaping tool)

What happens if we use other transforms such as FFT or other filterbanks such as PQF, QMF etc-etc?

Is the TNS tool specifically designed to work with the MDCT only? blink.gif

pest
QUOTE

Is the TNS tool specifically designed to work with the MDCT only? blink.gif


i do not have a specific theoretical background but after some of my tests subband
prediction works also with other types of filterbanks
kwwong
QUOTE(pest @ Apr 28 2006, 02:10 AM) *

QUOTE

Is the TNS tool specifically designed to work with the MDCT only? blink.gif


i do not have a specific theoretical background but after some of my tests subband
prediction works also with other types of filterbanks


Can you provide more details in your insights?
pest
QUOTE

Can you provide more details in your insights?


i hope, i don't get bashed as my assumptions and experiments are often a bit sketchy.
some properties of the original waveform-shape also applies to the transform-space
because the subbands are linear in time too, but provide a more compact (less noise)
way to represent the different aspects of the signal.
SebastianG
QUOTE(kwwong @ Apr 28 2006, 06:11 AM) *

Is the TNS tool specifically designed to work with the MDCT only? blink.gif

No, it's also suitable for the DCT (you could minimize mosquito artefacts a.k.a. Gibb's effect in JPEG/MPEG which would be called SNS then, spatial noise shaping). I think it can be adapted for the FFT, too. TNS isn't really needed for subband filterbank (like typical PQF ones) because of the already high temporal resolution. Also, since one subband sample affects usually many time samples there'll be a lot of time aliasing. TNS usage doesn't make much sense here (on such subband filterbanks). DCT and MDCT are suited because of the very high spectral resolution (N bands around 128 and 1024) while affecting only 2N time samples at max.

Sebi
pest
QUOTE

TNS isn't really needed for subband filterbank because of the already high temporal resolution.


a wavelet filterbank has a low temporal resolution. or is this a special case?

SebastianG
QUOTE(pest @ Apr 28 2006, 01:06 PM) *

a wavelet filterbank has a low temporal resolution. or is this a special case?


Something like TNS requires a uniform filterbank (= equal sampling rate per subband). Wavelets don't fit in here that well.

Sebi
pest
QUOTE

Something like TNS requires a uniform filterbank (= equal sampling rate per subband).


Is there a special reason for this? The only reason i can imagine is that
the tns filter-coeffs are stored per block and not per subband.
SebastianG
TNS filters are applied across subbands on samples of the same time slice (one sample per subband). If you have bands with different sampling rates you simply can't do it.

Sebi
petracci
QUOTE(kwwong @ Apr 28 2006, 06:11 AM) *

A transient when transformed by the MDCT would result in periodic coefficients in the MDCT domain. As a result, linear prediction across these MDCT coefficients is easily done. That is the foundation of the TNS (Temporal Noise Shaping tool)

What happens if we use other transforms such as FFT or other filterbanks such as PQF, QMF etc-etc?

Is the TNS tool specifically designed to work with the MDCT only?


It will definitely work for DFT (and thus FFT). In fact, the theory behind TNS is based on the relation between frequency-domain autocorrelation and time-domain Hilbert envelope (similar or dual to the relation between time-domain autocorrelation and power spectral density, aka Wiener-Khinchin). Hence, it assumes a DFT. However, a DFT is not as efficient for audio compression as the MDCT. So, TNS is applied on MDCT coefficients instead. This has the negative side-effect of introduced time-aliasing, which is why typically a short overlap window is applied in combination with TNS.

QUOTE(SebastianG @ Apr 28 2006, 12:45 PM) *

No, it's also suitable for the DCT (you could minimize mosquito artefacts a.k.a. Gibb's effect in JPEG/MPEG which would be called SNS then, spatial noise shaping).


There's an interesting paper on this by Johnson and some other guy. The TNS algorithms does need several adaptations, though.

QUOTE(SebastianG @ Apr 28 2006, 12:45 PM) *

TNS isn't really needed for subband filterbank (like typical PQF ones) because of the already high temporal resolution. Also, since one subband sample affects usually many time samples there'll be a lot of time aliasing. TNS usage doesn't make much sense here (on such subband filterbanks). DCT and MDCT are suited because of the very high spectral resolution (N bands around 128 and 1024) while affecting only 2N time samples at max.


It will work and/or be useful in a system where a uniform filterbank (such as CMFB like MDCT and PQMF) of high frequency resolution is combined with frequency-domain or inter-channel quantization.

Woodinville
Since all it really depends on is time/frequeny duality ...
kwwong
QUOTE(petracci @ Apr 28 2006, 10:07 AM) *

It will definitely work for DFT (and thus FFT).


But surely the transient characteristics of a the DFT differs from the MDCT ?

The coefficients of the DFT power spectrum isn't periodic during a transient as it is with the MDCT?

Are you refering to the complex DFT coefficients (Application of 2 separate TNS prediction on the real and imaginary coefficients) instead of the power spectrum?

QUOTE(petracci @ Apr 28 2006, 10:07 AM) *

QUOTE(SebastianG @ Apr 28 2006, 12:45 PM) *

No, it's also suitable for the DCT (you could minimize mosquito artefacts a.k.a. Gibb's effect in JPEG/MPEG which would be called SNS then, spatial noise shaping).

There's an interesting paper on this by Johnson and some other guy. The TNS algorithms does need several adaptations, though.

I think SebastianG's insight is interesting since the DCT is cosine modulated

In general, I think some prediction gain across transform coefficients is achiveable. But the real problem is what does this prediction gain represents? What is the physical representation of the prediction gain since due to differing transforms, it could have different meanings? rolleyes.gif
kwwong
QUOTE(SebastianG @ Apr 28 2006, 05:45) *

No, it's also suitable for the DCT (you could minimize mosquito artefacts a.k.a. Gibb's effect in JPEG/MPEG which would be called SNS then, spatial noise shaping).


SNS tool.. that is the tool used in the brand new MPEG4 Advance Video Coding (AVC) ?? dry.gif
Gabriel
QUOTE(kwwong @ May 15 2006, 05:28) *

SNS tool.. that is the tool used in the brand new MPEG4 Advance Video Coding (AVC) ??

No
Woodinville
QUOTE(Gabriel @ May 15 2006, 00:22) *

QUOTE(kwwong @ May 15 2006, 05:28) *

SNS tool.. that is the tool used in the brand new MPEG4 Advance Video Coding (AVC) ??

No


The only thing I've seen on Spatial Noise Shaping was something by Stanley Kuo and somebody else. I've never seen a followup, although it may be an idea owned by a company who has no research in the area any longer.
Gabriel
QUOTE(Woodinville @ May 15 2006, 10:14) *

The only thing I've seen on Spatial Noise Shaping was something by Stanley Kuo and somebody else. I've never seen a followup, although it may be an idea owned by a company who has no research in the area any longer.

I think that it was Stanley Kuo and JJ while at At&t. Now Stanley is working for Apple on audio, and JJ is working for...well we know where JJ is working, don't we? However, I don't know what is the subject of his work.
Sns might have been interesting on big transforms, but AVC mainly choosed some small 4x4 transforms which are not really suited for such a tool.

kwwong
QUOTE(Gabriel @ May 15 2006, 03:59) *

QUOTE(Woodinville @ May 15 2006, 10:14) *

The only thing I've seen on Spatial Noise Shaping was something by Stanley Kuo and somebody else. I've never seen a followup, although it may be an idea owned by a company who has no research in the area any longer.

I think that it was Stanley Kuo and JJ while at At&t. Now Stanley is working for Apple on audio, and JJ is working for...well we know where JJ is working, don't we? However, I don't know what is the subject of his work.
Sns might have been interesting on big transforms, but AVC mainly choosed some small 4x4 transforms which are not really suited for such a tool.


But AVC employs adaptive block size switching, that is 4x4, 8x8 and 16x16 transforms. I remembered reading about the SNS tools in the AVC specs. huh.gif
Gabriel
ACV is using adaptative block sizes (ie partitions): 16x16,16x8,8x16,8x8 and subpartitions (down to 4x4) for motion compensation.
For transform sizes it is 4x4 or 8x8 in high profiles.

I agree that using 16x16 transforms something like SNS would be very helpfull, but AVC does not use such huge transforms.
SebastianG
QUOTE(Gabriel @ May 16 2006, 09:37) *

For transform sizes it is 4x4 or 8x8 in high profiles.


...whereas 8x8 transform = 4 * (4x4 transform) + 1 * (2x2 transform on the DC coeffs of the previous transform) IIRC. This should limit ringing compared to 8x8 DCT based formats and makes SNS -- as Gabriel pointed out -- impractical / useless.

Seb
Gabriel
QUOTE(SebastianG @ May 17 2006, 13:14) *

...whereas 8x8 transform = 4 * (4x4 transform) + 1 * (2x2 transform on the DC coeffs of the previous transform) IIRC.

Under baseline/main/extended profiles transforms are 4x4, with a further transform on DC coeffs.
Under high profiles, there is an additionnal 8x8 transform, which really brings 64 frequency coeffs (leading to additionnal low freq resolution)
kwwong
QUOTE(SebastianG @ May 17 2006, 06:14) *

QUOTE(Gabriel @ May 16 2006, 09:37) *

For transform sizes it is 4x4 or 8x8 in high profiles.


...whereas 8x8 transform = 4 * (4x4 transform) + 1 * (2x2 transform on the DC coeffs of the previous transform) IIRC. This should limit ringing compared to 8x8 DCT based formats and makes SNS -- as Gabriel pointed out -- impractical / useless.

Seb


I don't think so.. Even if the 4x4 transform produces smaller ringing artifacts, you must remember that in video applications, unlike audio, very often, there is a need to "expand" the display area.

For example, in MPEG1, the typical video frame size is about 352 x 288 pixels which in turn is interpolated to bigger frame sizes before being send to the computer screen or the TV screen. As a result, even small ringing artifacts gets "amplified" and will become visible. cool.gif
SebastianG
QUOTE(kwwong @ May 23 2006, 05:16) *

Even if the 4x4 transform produces smaller ringing artifacts, you must remember that in video applications, unlike audio, very often, there is a need to "expand" the display area.

Yeah, but there's not much you can do about it.

Sebi
This is a "lo-fi" version of our main content. To view the full version with more information, formatting and images, please click here.
Invision Power Board © 2001-2008 Invision Power Services, Inc.