Help - Search - Members - Calendar
Full Version: AAC's robustness against signal processing
Hydrogenaudio Forums > Lossy Audio Compression > AAC > AAC - General
rpp3po
Just look back how far technology has evolved in the last 30 years, where we are today and what we would never have thought of becoming possible.

What if, maybe in ten years, there would be a really smart workgroup developing a whole new way to look at audio data, especially the contained spatial information. On side effect of their work would be an algorithm X finally capable of finally creating a perfect "virtual speaker simulation" for matched headphones and a soundstage only known from good speaker setups.

We have seen a lot of failed attempts in this area and none of them worked without seriously compromising fidelity (e.g. Dolby Headphone). Now assume for a second, they get the job done and it works well. Obviously signal timing would have to be modified. How robust would AAC be against this kind of processing?

I just thought abut wether there are possible future scenarios left, that are worth keeping a lossless collection.
Mike Giacomelli
QUOTE (rpp3po @ Feb 28 2009, 13:39) *
What if, maybe in ten years, there would be a really smart workgroup developing a whole new way to look at audio data, especially the contained spatial information. On side effect of their work would be an algorithm X finally capable of finally creating a perfect "virtual speaker simulation" for matched headphones and a soundstage only known from good speaker setups.


What like HRTFs? Anyway, unless you're doing multichannel, I don't think this will be a real issue since you're not going to ever find a way to make >2 channels from a stereo stream.
rpp3po
QUOTE (Mike Giacomelli @ Feb 28 2009, 21:10) *
Anyway, unless you're doing multichannel, I don't think this will be a real issue since you're not going to ever find a way to make >2 channels from a stereo stream.


Where would I have suggested that? The problem with headphones is that the sound focusses in the middle of your head. That's far away from the stereo image you can attain in the sweet spot between two speakers.

Well there's binaural recording, but you all know how successful that has been.
Mike Giacomelli
QUOTE (rpp3po @ Feb 28 2009, 15:45) *
Where would I have suggested that? The problem with headphones is that the sound focusses in the middle of your head.


I think most people use crossfeed for that problem, and lossy audio shouldn't be an issue with it.
rpp3po
QUOTE (Mike Giacomelli @ Feb 28 2009, 22:25) *
I think most people use crossfeed for that problem, and lossy audio shouldn't be an issue with it.


I have tried crossfeeding. It's a poor compromise. I helps against fatigue, but the stereo image is still far from a good speaker setup.
Mike Giacomelli
QUOTE (rpp3po @ Feb 28 2009, 16:43) *
QUOTE (Mike Giacomelli @ Feb 28 2009, 22:25) *
I think most people use crossfeed for that problem, and lossy audio shouldn't be an issue with it.


I have tried crossfeeding. It's a poor compromise. I helps against fatigue, but the stereo image is still far from a good speaker setup.


It sounds like you just don't like headphones.
rpp3po
QUOTE (Mike Giacomelli @ Feb 28 2009, 22:50) *
It sounds like you just don't like headphones.


Could we come back to the original question asked?

I frequently use headphones and enjoy them very much. The stereo image difference between headphones and loudspeakers is a well known fact, not needing further discussion.

As said in theory it could be possible to eliminate that by time domain signal processing. Maybe someone with greater insight into AAC encoding can comment about possible implications.
Mike Giacomelli
QUOTE (rpp3po @ Feb 28 2009, 17:08) *
The stereo image difference between headphones and loudspeakers is a well known fact, not needing further discussion.


I've never heard of it.

QUOTE (rpp3po @ Feb 28 2009, 17:08) *
As said in theory it could be possible to eliminate that by time domain signal processing.


Isn't this essentially what crossfeed and HRTFs do?

QUOTE (rpp3po @ Feb 28 2009, 17:08) *
Maybe someone with greater insight into AAC encoding can comment about possible implications.


I don't think this there are any implications to AAC.
rpp3po
QUOTE (Mike Giacomelli @ Feb 28 2009, 23:39) *
Isn't this essentially what crossfeed and HRTFs do?


I can only repeat myself. The theory is there (basically HRTF), but up to now there is no working implementation for high fidelity music. You either have strong degradation of sound quality (Dolby Headphone) or are only a very simple approximation (crossfeed).

QUOTE (Mike Giacomelli @ Feb 28 2009, 23:39) *
I don't think this there are any implications to AAC.


And how would you back that up? The needed transfer function modifies the input sample's frequency composition.

1. Content being perceptually irrelevant in plain stereo could be missing from an AAC track. It could have been relevant to the transfer function, which would have pushed it above the threshold of hearing, if it hadn't been eliminated during encoding.

2. Encoding artifacts not noticeable in plain stereo could be pushed above the threshold of hearing by the transfer function.
Mike Giacomelli
QUOTE (rpp3po @ Feb 28 2009, 18:26) *
QUOTE (Mike Giacomelli @ Feb 28 2009, 23:39) *
Isn't this essentially what crossfeed and HRTFs do?


I can only repeat myself. The theory is there (basically HRTF), but up to now there is no working implementation for high fidelity music. You either have strong degradation of sound quality (Dolby Headphone) or are only a very simple approximation (crossfeed).


Much more sophisticated systems that then exist, but they require you to measure the HRTF. I think this limitation is fundamental, so I don't think you're going to do much better given these constraints, particularly in the stereo case. Without knowledge of the specific HRTF, I don't think you can do much better then crossfeed.

Also, Dolby headphone really only provides much improvement if you have 5.1 from what I understand. Are you talking about 5.1 or stereo? Perhaps you wouldn't have to repeat yourself if you'd be more specific about what you're talking about.


rpp3po
Ok, I maybe wasn't really too precise from the beginning.

Maybe in 20 years you just need to photograph your head from a few directions and an algorithm deducts a 3D model of your head and ears and a second algorithm deducts a specific HRTF. Of course there are other variables than just exterior form, but it would reach already much further than we are today.

That's all pure speculation and I didn't really wanted to discuss actual implementations. I only want to know if today's lossy encoded material could be an inferior source for a time in the future when high quality specific HRTFs would be commonly available.

I didn't mean completely 3D or multichannel audio but just a better imaging on headphones of material produced for 2 speaker setups.
shadowking
Its a matter of quality headroom. Using high bitrates (250..400k) *should* cover this issue.
Mike Giacomelli
QUOTE (rpp3po @ Feb 28 2009, 19:33) *
That's all pure speculation and I didn't really wanted to discuss actual implementations. I only want to know if today's lossy encoded material could be an inferior source for a time in the future when high quality specific HRTFs would be commonly available.


In principle, its never a good idea to postprocess lossy audio. In practice people do it all the time (EQ, crossfeed, etc) and it doesn't seem to matter.

And anyway, if HRTFs ever catch on, I doubt lossy will be a huge issue. They're essentially simulating the diffraction around the head, a process that happens apparently without issue every time you listen to lossy music on speakers. Unless you're also doing some sort of additional "enhancement" or taking serious short cuts, I'm skeptical the process will work much differently between digital and analog.

QUOTE (rpp3po @ Feb 28 2009, 19:33) *
I didn't mean completely 3D or multichannel audio but just a better imaging on headphones of material produced for 2 speaker setups.


Better how? Effects like you mentioned above generally try to move the perceived location of sound around a persons head, which is very much a 3D effect.
rpp3po
QUOTE (Mike Giacomelli @ Mar 1 2009, 01:57) *
And anyway, if HRTFs ever catch on, I doubt lossy will be a huge issue. They're essentially simulating the diffraction around the head, a process that happens apparently without issue every time you listen to lossy music on speakers. Unless you're also doing some sort of additional "enhancement" or taking serious short cuts, I'm skeptical the process will work much differently between digital and analog.


That does indeed make a lot of sense. I haven't thought about it that way.

QUOTE (Mike Giacomelli @ Mar 1 2009, 01:57) *
Better how? Effects like you mentioned above generally try to move the perceived location of sound around a persons head, which is very much a 3D effect.


I meant 3D sources/recording. That the whole thing is a spatial effect is without question.
Dracaena
QUOTE (rpp3po @ Mar 1 2009, 12:14) *
QUOTE (Mike Giacomelli @ Mar 1 2009, 01:57) *
And anyway, if HRTFs ever catch on, I doubt lossy will be a huge issue. They're essentially simulating the diffraction around the head, a process that happens apparently without issue every time you listen to lossy music on speakers. Unless you're also doing some sort of additional "enhancement" or taking serious short cuts, I'm skeptical the process will work much differently between digital and analog.


That does indeed make a lot of sense. I haven't thought about it that way.

QUOTE (Mike Giacomelli @ Mar 1 2009, 01:57) *
Better how? Effects like you mentioned above generally try to move the perceived location of sound around a persons head, which is very much a 3D effect.


I meant 3D sources/recording. That the whole thing is a spatial effect is without question.

I don't claim to be an expert on these things, and honestly I can't say I fully understand how HRTF alters a signal. However this thread caused me to ponder an experiment you might want to try:

1. Figure out a way to record the output of your soundcard/software or whatever it is that is applying the processing eg. Dolby Headphone. On the PC I'm using at the moment (with an audigy 2) this is as simple as selecting "what you hear" as the recording source in creative's mixer.
2. Playback a lossless file, recording the processed output.
3. Encode your newly recorded file to AAC
4. ABX this file vs. a normally encoded AAC. Remember that when playing back the specially recorded file, you'll need to have disabled or bypassed any signal processing or the dolby headphone/eq/whatever will be getting applied to the special file twice. One way to do this would be to play it back through a program that interfaces directly with the hardware through ASIO or some such, eg. not DirectSound. Playback the regular AAC as normal, with all the same processing that was enabled when you recorded the special file.
5. Tell us what you find!


Of course there will be other things that create disparity between any headphone soundfield and that of a stero speaker setup, the most obvious being room acoustics. Your standard room is a rather complex thing, posssibly with hundreds of objects with complex shapes, a wide range of surfaces with different acoustic properties etc.
Dracaena
Well I just tried but I can't figure out a way of actually doing it, short of physically plugging a recording device into the headphone jack.
Maybe you'll have more luck.
This is a "lo-fi" version of our main content. To view the full version with more information, formatting and images, please click here.
Invision Power Board © 2001-2009 Invision Power Services, Inc.