Help - Search - Members - Calendar
Full Version: AAC LC Profile
Hydrogenaudio Forums > Lossy Audio Compression > AAC > AAC - General
kennyzero
I am really a new comer to the field of audio coding. I have read some MPEG4 AAC standard documents and am still confused.

May anyone clarify which blocks are in the Low Complexity profile? I mean those like FilterBank, TNS, PNS, Quantization, M/S Stereo, etc.

Thanks very much.
hans-jürgen
QUOTE(kennyzero @ Jan 20 2003 - 04:46 PM)
I am really a new comer to the field of audio coding. I have read some MPEG4 AAC standard documents and am still confused.

Which ones did you read then?

QUOTE
May anyone clarify which blocks are in the Low Complexity profile? I mean those like FilterBank, TNS, PNS, Quantization, M/S Stereo, etc.


Roughly said, everything that deals with any kind of prediction (backward, long term, whatever) is not Low Complexity and either has a profile of its own (like LTP) or is integrated into the Main profile. So all of the modules or tools you mention above are part of the LC profile already. Perceptual Noise Substitution was introduced after the finalization of MPEG-2 AAC and so it's a part of MPEG-4 AAC, but also of its LC profile there, not Main or LTP which also was introduced in MPEG-4 AAC then.
Ivan Dimkovic
I think LTP and Main profiles won't find any commercial application, at least not in consumer range -

Not a single consumer player supports AAC-Main (or LTP) and new MPEG-4 High Efficiency (AAC+) profile is AAC-LC + SBR
SK1
So AAC audio will never reach it's full potential?! This is insane...then why are there many advanced features that will be able to make it better and better?
MAIN profile will not be supported??..
I think if this will be the case (i trust what you think) then AAC is doomed pretty much...(when it comes to best potential quality)
Ivan Dimkovic
It is just not true - difference between MAIN and LC profiles even with highest quality encoder is too small compared to the processing power used, and the new tools used in High Efficiency profile are much better suited for very low bit rate range, where MAIN or LTP would mean something.

Furthermore - none of these mean something for the VBR encoding, which is something that most consumers usually do - MAIN or LTP would decrease bit rate for 0.5 or even less kb/s for most natural signals, so it is nothing special. Only pure tonal signals, that are easily predicted would benefit from prediction, but then again - there is a problem with complexity of the whole thing.
Ivan Dimkovic
Similar thing is present with MP3 encoders, for example MP3 standard allows tools like mixed blocks - and encoders do not use them.

Or, should I mention MPEG-4 video standard? Do you how many profiles does it have, and corresponding algorithms/tools? But, somehow, only simple (SP) and Advanced Simple (ASP) are used widely.
SK1
So the quality increase is so slight for much processing power?.. (i assume that basically long term prediction will requiere much power) i never thought it was that insignificient...that only pure tonal signals would benefit from the prediction features...this shattered my expectations Ivan smile.gif...
And i didn't know about that mixed blocks feature in the MP3 standard..interesting..
About MPEG-4 video however, yep i know about the scary complexities basically smile.gif...
Recently tested an H.264 implementation and oh my god it was about 1 frame per second while encoding, with a pentium 2.53ghz!!...sheesh....
yanchen
QUOTE(Ivan Dimkovic @ Jan 21 2003 - 03:55 AM)
I think LTP and Main  profiles won't find any commercial application, at least not in consumer range -

Not a single consumer player supports AAC-Main (or LTP)  and new MPEG-4 High Efficiency (AAC+) profile is AAC-LC + SBR

When it comes to audio encoder, does there exist any promising commercial opportunity except "RIP".
Even though all encoder usage are happened at PC platform, it seems none of the business for embeded system.

One application that embeds mp3 encoder is "digital voice recoder". It chooses the mp3 encoder but
speech encoder just for the reason I think mp3 decoder is widely distributed which is not the case for AAC decoder.

Maybe, when the market accept the "mp4" file format, AAC encoder will get the chance to have solid commercial position. (If microsoft still hold its wmv wma at PC platform).
Ivan Dimkovic
QUOTE
Maybe, when the market accept the "mp4" file format, AAC encoder will get the chance to have solid commercial position. (If microsoft still hold its wmv wma at PC platform).


Hmm - main reason why AAC wasn't popular in consumer hardware market (portable players, etc..) was bad pricing policy, and lack of consumer software - but since acceptance of AAC in QuickTime, Nero, etc.. and different licensing terms (www.vialicensing.com is now in charge of patent licenses) things will start to change.

Regarding WMA vs. AAC - this will be a very interesting "battle" - AAC+ will be true competitor to WMA and MP3Pro. From the quality perspective, AAC+ is true winner - but this is not the morst important thing for broad acceptance - old problem is still there - MPEG-4 Audio patent license is still not complete sad.gif Thanks to many patent holders and all of them probably have some wishes, etc.. (and this is not like voting - you must be able to satisfy all parties) - but we all hope that this will be finished in next weeks to come.

Hopefully, this license will include all MPEG-4 profiles, for all applications (hardware/software)

Microsoft is playing dirty - by setting very low prices for hardware implementators ( http://www.osopinion.com/perl/story/20395.html ) - since they have major income from the "consumer" (OS) side.
Gabriel
A lot of features in mp3 are not used by encoders. I think that all featured are used, but there is no encoder using them all. (btw that is the way used by EncSpot to guess encoders).

But mp3 tools are no so time consuming. Some aac tools are requiring much more processing power, and it makes sense to not use them.
Ivan Dimkovic
Yes - exactly my point

MAIN prediction for example, would increase required decoding power by 40% - I mean, this is nonsense - unless this tool shows significant improvement in audio quality - which is clearly not the case.

OTOH, for the low bit rate range, new MPEG-4 profile called 'High Efficiency' is being standardized right now - It will have SBR tool for ultra low bit rates, which is much better than PNS combined with those predictions - and, it is not consuming sigificant power both in encoder and decoder (only intensive thing is QMF filterbank, which can be written in a very optimized way). And it is backwards compatible - i.e. decoder without SBR would be able to play files (with lower quality) - while decoder without prediction is not possible to play MAIN or LTP profile AAC files.
Gabriel
Is TNS part of LC or main?

It seems to me that this tool is (theorically) usefull.
Ivan Dimkovic
These tools can be used in all profiles:

- M/S stereo
- IS stereo
- TNS
- PNS (MPEG-4 only)
- Channel coupling
- DRC (dynamic range control)


TNS is very useful for certain range of signals (like speech with strong harmonics, like german_speech MPEG example) - but its control is matter of tuning, not science - because number of bits for TNS data can some times cancel the benefits of the tool itself smile.gif
hans-jürgen
QUOTE(Ivan Dimkovic @ Jan 20 2003 - 09:38 PM)
It is just not true - difference between MAIN and LC profiles even with highest quality encoder is too small compared to the processing power used, and the new tools used in High Efficiency profile  are much better suited for very low bit rate range, where MAIN or LTP would mean something.

As far as I've understood it so far, SBR is also using prediction and substitution methods for tonal and noise components in the original sound, but of course only in the high frequency part of the spectrum in order to replicate it as close as possible. I once read an explanation about this in some EBU documentation, can't remember which one it was, probably referring to DRM...

QUOTE
Only pure tonal signals, that are easily predicted would benefit from prediction,  but then again - there is a problem with complexity of the whole thing.


People seem to forget or not bothering to inform themselves about the reasons for implementing tools like LTP etc. Most of the times they have been developed for low bitrate speech encodings where tonality of the source can be easily estimated and so predicted correctly.
Ivan Dimkovic
QUOTE
As far as I've understood it so far, SBR is also using prediction and substitution methods for tonal and noise components in the original sound, but of course only in the high frequency part of the spectrum in order to replicate it as close as possible. I once read an explanation about this in some EBU documentation, can't remember which one it was, probably referring to DRM...


Hmm - it is not using "prediction" in normal sense, but it is using delta coding in time or frequency domain to code spectral envelopes after quantization - which could be called "prediction" with some differences from usual the term smile.gif
hans-jürgen
QUOTE(yanchen @ Jan 21 2003 - 06:13 AM)
When it comes to audio encoder, does there exist any promising commercial opportunity except "RIP".
Even though all encoder usage are happened at PC platform, it seems none of the business for embeded system.

No, that's not true, in fact there have been some embedded hardware MPEG-2 AAC encoders/recorders for some time now, but only by professional studio/broadcasting equipment manufacturers like Philips or Mayah (see their websites for further infos).

Furthermore there are already companies like Nokia that implement AAC into their mobile phones and/or audio player extensions. This field of communication will probably benefit from the complete MPEG-4 standard as much as the PC sector or maybe even more, because it covers the whole range from 2 kbps to "transparent" bitrates with specialized codecs for each purpose.
hans-jürgen
QUOTE(Ivan Dimkovic @ Jan 21 2003 - 10:30 AM)
And it is backwards compatible - i.e. decoder without SBR would be able to play files (with lower quality)  - while decoder without prediction is not possible to play MAIN or LTP profile AAC files.

Which brings me back to one of my favorites questions: wink.gif Are you sure that LTP in PsyTEL or Nero is working at all? I could not hear any difference back then during my low bitrate tests, and now someone else reports the same at the Audiocoding.com forum. He also mentions that QuickTime 6 can play these "fake" LTP files from Nero, which should not be possible if your comment above is correct. If you're sure it works, please tell me the bitrates/settings/samples I should use in order to hear LTP "in progress".

QUOTE
Hmm - it is not using "prediction" in normal sense, but it is using delta coding in time or frequency domain to code spectral envelopes after quantization - which could be called "prediction" with some differences from usual the term


OK, I'll look through my PDF files from the EBU if I can find this passage again and post it later.
hans-jürgen
QUOTE(Gabriel @ Jan 21 2003 - 10:44 AM)
Is TNS part of LC or main?

Grrrrrrr... wink.gif

QUOTE
It seems to me that this tool is (theorically) usefull.


Yes it is, do you have access to the c't test files somehow? In my opinion the FhG sample used a lot of TNS, because I think I can hear a related artifact on two occasions in this file. The interesting thing is I had and still have to "switch on my TNS detector" (concentrate on time-related distortions) to hear them. When examining the sample for the c't listening test, I did not notice them at all. So TNS seems to serve well for people like me... wink.gif
hans-jürgen
QUOTE(hans-jürgen @ Jan 21 2003 - 11:54 AM)
OK, I'll look through my PDF files from the EBU if I can find this passage again and post it later.

Here it is (taken from EBU Technical Review "DRM - key technical features" by Jonathan Stott, March 2001, p. 11f):

AAC-SBR
[...]
The SBR technique synthesizes the sounds which fall within the highest frequency octave-and-a-bit. Sounds in this range are usually either:

a.) noise-like (sibilance, percussion instruments such as shakers, brushed cymbals etc.), or
b.) periodic and related to what appears lower in the spectrum (overtones of instruments or voiced sounds).

At the sender, the highest-frequency band of the audio signal is examined to determine the spectral distribution and whether it falls into category (a.) or (b.) above. A small amount of side information is then prepared for transmission to help the decoder. The highest-frequency band is then removed before the remaining main band of the audio signal is passed to the AAC coder, which codes it in the conventional way.

At the receiver, the AAC decoder first decodes the main band of the audio signal. The SBR decoder then adds the synthetic upper band, helped by the instructions sent in the side information. Overtones are derived from the output of the AAC decoder, while noise-like sounds are synthesized using a noise generator with suitable spectral shaping.
-----------------------------------------------------------------------------------------------

So it seems I have mixed up tonal estimation and prediction with "b.) periodic sounds" whose "overtones are derived from the output of the AAC decoder". I think I'll add this explanation to the Audiocoding Wiki page for SBR. wink.gif

By the way, the whole Technical Review can be read and/or downloaded at the EBU directly, just like all publically available documents there:

http://www.ebu.ch/trev_286-stott.pdf
enry2k
About market acceptance of mpeg-4 format: I got the impression that the standstill licensing issue of LAmpeg4 could play a winning role for Microsoft in controlling the entire consumer electronics market. Not just in the operating system ground, but also in DVD players, TV sets, VCRs and portable devices. It would result agaist the consumer choice to allow all this. A proprietary comprehensive standard, own by a single industry giant rather than a common standard supported by many competitors.
sesshoumaru
hi....hope you don't mind I raise a question on TNS on LC profile again tongue.gif

When they say "filter order is limited in LC", what is the value used? So far I thought the order depends on sampling rate (20 for 32khz and below, 12 otherwise)....and (I'm not quite sure about this), the difference between TNS in LC and Main is in the value of TNS_MAX_BANDS unsure.gif

Do advise me on this ^^
This is a "lo-fi" version of our main content. To view the full version with more information, formatting and images, please click here.
Invision Power Board © 2001-2008 Invision Power Services, Inc.