Help - Search - Members - Calendar
Full Version: Stereo-to-Mono downmixing in PS AAC
Hydrogenaudio Forums > Lossy Audio Compression > AAC > AAC - Tech
dand
Hi,
As I see in 3GPP specification for PS AAC (HE v2) encoding, stereo-to-mono downmixing is performed in the frequency domain:
Hybrid Analysis - downmix - Hybrid Synthesis.

Would it be possible to avoid Hybrid Synthesis and generate mono signal for core and sbr AAC in the time domain simply as
Mono = (Left + Right) / 2 ?

Is there a reason for downmixing in the frequency domain?

Daniel
SebastianG
These filterbanks are linear => analysis(left+right) = analysis(left)+analysis(right).

Since you need the transformed version of L, R (for spatial cue analysis) and M (to be further processed and coded) you could either:
- downmix in the time domain (M=L+R) and transform all three channels
or
- transform L and R and downmix in the frequency domain

The latter approach will save you one transform step (2 channels instead of 3)


Sebi
dand
QUOTE(SebastianG @ Apr 25 2006, 02:53 PM) *

These filterbanks are linear => analysis(left+right) = analysis(left)+analysis(right).

Since you need the transformed version of L, R (for spatial cue analysis) and M (to be further processed and coded) you could either:
- downmix in the time domain (M=L+R) and transform all three channels
or
- transform L and R and downmix in the frequency domain

The latter approach will save you one transform step (2 channels instead of 3)


Sebi


Sure it will save me one Analysis step - but with the cost of one Synthesis (which I don't have if downmixing in time domain).
SebastianG
Oh wait! I get it now: You're saying that in the encoder they do the following:
- alanysis (subband filterbank) on both channels
- downmix to mono
- and synthesize the mono signal via the inverse filterbank INSTEAD OF directly computing L+R in the time domain

Right?

If so, the only reason I can think of is they don't want M to be simply the sum of L and R but rather a "phase-difference-compensated downmix" to avoid attenuation of out-of-phase signal parts (just my guesswork, I havn't studied their code).

Sebi
Ivan Dimkovic
The reason is of energy preservation nature - simple linear average downmix is not suitable for the PS because, as SebastianG mentioned, out-of-phase signals would calcel themselves out.

dand
Thanks for the info.

I just realized this, after implementing downmix in time domain. What I got is good-sounding, but very much attenuated output! Switching to plan B (ajmo jovo nanovo...)...
Ivan Dimkovic
Basically, the problem of out-of-phase attenuation is even worse for PS, as the M channel is panned again to the stereo field in the decoder - attenuation then tends to be perceived as "Stereo Image Collapse" due to loss of panned signal energy.

Typical problem sample for this one is Layla (first 15 or so seconds) - with linear average downmix, guitar in the lead is very attenuated and disclocated in the stereo field towards center. With proper downmix, problem does not exist.

I remember this problem from a long time ago during the early days of PS wink.gif
This is a "lo-fi" version of our main content. To view the full version with more information, formatting and images, please click here.
Invision Power Board © 2001-2008 Invision Power Services, Inc.