Hi,
As I see in 3GPP specification for PS AAC (HE v2) encoding, stereo-to-mono downmixing is performed in the frequency domain:
Hybrid Analysis - downmix - Hybrid Synthesis.
Would it be possible to avoid Hybrid Synthesis and generate mono signal for core and sbr AAC in the time domain simply as
Mono = (Left + Right) / 2 ?
Is there a reason for downmixing in the frequency domain?
Daniel
SebastianG
Apr 25 2006, 06:53
These filterbanks are linear => analysis(left+right) = analysis(left)+analysis(right).
Since you need the transformed version of L, R (for spatial cue analysis) and M (to be further processed and coded) you could either:
- downmix in the time domain (M=L+R) and transform all three channels
or
- transform L and R and downmix in the frequency domain
The latter approach will save you one transform step (2 channels instead of 3)
Sebi
QUOTE(SebastianG @ Apr 25 2006, 02:53 PM)

These filterbanks are linear => analysis(left+right) = analysis(left)+analysis(right).
Since you need the transformed version of L, R (for spatial cue analysis) and M (to be further processed and coded) you could either:
- downmix in the time domain (M=L+R) and transform all three channels
or
- transform L and R and downmix in the frequency domain
The latter approach will save you one transform step (2 channels instead of 3)
Sebi
Sure it will save me one Analysis step - but with the cost of one Synthesis (which I don't have if downmixing in time domain).
SebastianG
Apr 25 2006, 10:44
Oh wait! I get it now: You're saying that in the encoder they do the following:
- alanysis (subband filterbank) on both channels
- downmix to mono
- and synthesize the mono signal via the inverse filterbank INSTEAD OF directly computing L+R in the time domain
Right?
If so, the only reason I can think of is they don't want M to be simply the sum of L and R but rather a "phase-difference-compensated downmix" to avoid attenuation of out-of-phase signal parts (just my guesswork, I havn't studied their code).
Sebi
Ivan Dimkovic
Apr 25 2006, 11:00
The reason is of energy preservation nature - simple linear average downmix is not suitable for the PS because, as SebastianG mentioned, out-of-phase signals would calcel themselves out.
Thanks for the info.
I just realized this, after implementing downmix in time domain. What I got is good-sounding, but very much attenuated output! Switching to plan B (ajmo jovo nanovo...)...
Ivan Dimkovic
Apr 25 2006, 15:48
Basically, the problem of out-of-phase attenuation is even worse for PS, as the M channel is panned again to the stereo field in the decoder - attenuation then tends to be perceived as "Stereo Image Collapse" due to loss of panned signal energy.
Typical problem sample for this one is Layla (first 15 or so seconds) - with linear average downmix, guitar in the lead is very attenuated and disclocated in the stereo field towards center. With proper downmix, problem does not exist.
I remember this problem from a long time ago during the early days of PS
This is a "lo-fi" version of our main content. To view the full version with more information, formatting and images, please
click here.