IPB

Welcome Guest ( Log In | Register )

 
Reply to this topicStart new topic
QuickTime 6 AAC Encoder - block switching issue
Ivan Dimkovic
post Jul 17 2002, 18:06
Post #1


Nero MPEG4 developer


Group: Developer
Posts: 1466
Joined: 22-September 01
Member No.: 8



It seems that QuickTime still has block switching issues on some critical signals.

I don't know what could be the reason causing this - probably wrong codec parameters or maybe something else. This is the part where QT6 AAC engine differs from LiquidAudio AAC engine.

- QuickTime AAC @128 LC (main?)

http://www.psytel-research.co.yu/temp/fatb...icktime_128.mp4


- FhG IIS AAC 2.2 @ 128 MAIN

http://www.psytel-research.co.yu/temp/fatb...fhg_2.2_128.mp4


- PsyTEL AAC 2.15 @ 128 LC

http://www.psytel-research.co.yu/temp/fatb..._psytel_128.mp4


Files will be playable with QuickTime player smile.gif
Go to the top of the page
+Quote Post
Phobos
post Jul 17 2002, 18:16
Post #2





Group: Members
Posts: 290
Joined: 5-April 02
From: Guadalajara, Jalisco
Member No.: 1693



hmm this just lets the world know that the only encoder worth AAC is PSytel smile.gif
Go to the top of the page
+Quote Post
wkw
post Jul 17 2002, 19:41
Post #3





Group: Members
Posts: 85
Joined: 7-June 02
Member No.: 2241



Hard to tell what QuickTime codec did by judging from the sound. Perhaps someone could reverse engineer the QuickTime codec by designing a decoder that could extract the block switching and TNS information from the encoded mp4 clip?

I suspect that block switching is kept to the minimum because the fatboy clip has attacks that are too closely spaced for efficient coding in short blocks. Ideally, short blocks should not be used for fatboy or else, you ended with the entire clip coded in short block. Because the distance between adjacent "attacks" is so close, the post-masking effect of the previous attack will mask some of the pre-echo noise of the next attack. Maybe the TNS tool isn't properly implemented or something wrong with the setting of the Psychoacostic model?

kww
Go to the top of the page
+Quote Post
Ivan Dimkovic
post Jul 18 2002, 05:23
Post #4


Nero MPEG4 developer


Group: Developer
Posts: 1466
Joined: 22-September 01
Member No.: 8



Fatboy clip must be coded in short blocks, because with 1024 point long blocks, no TNS could save the pure harmonic sound of fatboy clip.

Problem we are talking here is that QT6 and LiquidAudio are using the encoding engine that is very similar (FhG / AT&T know-how) and there is absolutely no reason to fail on fatboy clip, even the oldest FhG AAC encoders encoded that clip properly.

Same problem occurs on castanets clip, too!
Go to the top of the page
+Quote Post
wkw
post Jul 18 2002, 12:06
Post #5





Group: Members
Posts: 85
Joined: 7-June 02
Member No.: 2241



I thought the TNS tool is designed to remove the Temporal envelope of the spectral and preserve the harmonic sound.. I noticed that the TNS specs allowed up to 4 lpc filters.. Is it possible that it is necessary to use more than one lpc filters?

Actually, in order to use block switching effectively, in my opinion, they shouldn't have limit the block size to 2048 and 256 only.. It is too inflexible to handle all the attacks which spacing from each other varies considerably. There should be other block size as well such as 512,1024 and etc. For example, MP3 with a short long block of 1152 should be able to handle "fatboy" very well by switching to short-block.

The other alternative you could explore is the Gain-Control Tool. Actually, the MPEG committee must have its reasons to include so many other coding tools to the standard. There is no one single coder that could handle every situation.. The best coder is just a "compromise" of all situation. Staying in short-block for the entire clip as in the "temporal" portion of castanet just consumed too much bits. I am still investigating the Gain-control setting on the "temporal" portion of castanet and it is premature for me to present anything...

Actually, the post-masking effect of the previous attack can be used to mask some of the pre-echo spread of the next attack. In fact, in our block switching strategy, we only concern about attacks, because the pre-masking duration is too short to hide the noise. But it is not only attacks that generates noises, the release duration ( situation where the signal suddenly falls from a high level to low level) also generates noise as well. Because of the long post-masking duration, we don't switch to short blocks during signal release. But if you analyze the decoded clip of a signal release, you will noticed that the noise actually spread out over the entire frame into the quiet region after the signal release but we can't hear it.. So if the next attack occurs very closely to this signal release, in theory some or all of its pre-echo noise will be masked!

Well, for this fatboy clip, I think it requires further analysis..

KWW
Go to the top of the page
+Quote Post
Ivan Dimkovic
post Jul 18 2002, 12:45
Post #6


Nero MPEG4 developer


Group: Developer
Posts: 1466
Joined: 22-September 01
Member No.: 8



wkv,

You made some very interesting points -

Of course, there is not a unique way to make one codec transparent - you can do that in many ways, and even change algorithm .

Post-masking effect on long blocks is interesting to estimate, and use for a very small pre-echo - occuring "under" the post-masking curve. It is effective and many advanced implementations are using the effect.

TNS is helping encoding fatboy, but not as much as it is required - switching to short blocks is absolutely reqired in this sample.
Go to the top of the page
+Quote Post
wkw
post Jul 19 2002, 11:47
Post #7





Group: Members
Posts: 85
Joined: 7-June 02
Member No.: 2241



Dear Ivan,

I knew very little about the TNS tool but I did some analysis of the mdct spectral waveform before entering the TNS filter and after entering the TNS filter. I noticed that the spectral envelope of the rising edge of an attack on the right hand side of the windowed time domain sample is very similar in shape to the time domain envelope of human speeches.. which can be easily modeled using standard linear predictive coding techniques. The TNS filter actually removes this envelope leaving behind the noise energy. At the decoder, the inverse TNS filter reconstruct this Temporal envelope and add to the noise energy.

However, the accurate modelling of this temporal envelope depended also on the quantization of the reflection coefficients of the lpc filters.

Also, I noticed that in standard speech modelling methods, the time domain samples are windowed by a hamming function to reduce the "block-effect" of the analysis window and also increase the stability of the calculated lpc coeffiecients. However, in the ISO reference code, windowing isn't done on the mdct spectral before the autocorrelation calculation... and yet, all the lpc calculations I have encountered were always stabil??? From my experience with Levinson_Durbin algorithm in speech-coding, there are certain time-domain frames which the algorithm can become unstabil..

Then there is this quantization problem, the MPEG4 draft specs stated that the reflection coefficients is first scaled and rounded to the nearest integer which can be either -ve or +ve.. But in the ISO reference code, they rounded it to the nearest integer by just a +0.5 factor ? Could this be an implementation error of ISO?

Also, I noticed that after TNS filtering in the encoder, the first few spectral coefficients filtered displayed usually high error... It becomes spikey or I could not flattenned the first few coefficients.

Also I do not understand why I could not lower the TNS filtering down to near 0 Hz? It sounded terrible when I filtered from 100Hz to the upper limit..

Then I am not very sure of the criteria need to activate the TNS filtering, the MPEG4 specs stated TNS filtering is activated if the lpc gain > 1.4.. I did some study on this and I found that, sometimes, the TNS just miss detecting the presence of a Temporal envelope..

Then I noticed as what you pointed out, improper activation of TNS filtering actually, flattened some of the tone information as well. I noticed this on the clip "spahm" which you can download from the internet. During the long block mode of spahm, the lpc gain is rather high .. > 6.0 which justified TNS filtering... but I thought TNS filtering is justified for attacks or near attacks..



KWW
Go to the top of the page
+Quote Post
wkw
post Jul 20 2002, 21:31
Post #8





Group: Members
Posts: 85
Joined: 7-June 02
Member No.: 2241



Dear Ivan,

I have tried the "fatboy" clip using the Gain-Control tool and it did not sound very different from the QuickTime encoded clip. I think the problem lies in the limitation of the psychoacoustic model itself. Somehow, the masking threshold calculated is too high at the low frequency band and switching to short block solves the problem.

I found this interesting article in the internet. Look at Advanced Topics about limitation of the present generation of psychoacoustics...

http://www.helsinki.fi/~ssyreeni/dsound/dsound-a-03


KWW
Go to the top of the page
+Quote Post

Reply to this topicStart new topic
1 User(s) are reading this topic (1 Guests and 0 Anonymous Users)
0 Members:

 



RSS Lo-Fi Version Time is now: 20th April 2014 - 00:04