Help - Search - Members - Calendar
Full Version: Converting stereo PCM to mono
Hydrogenaudio Forums > Hydrogenaudio Forum > Scientific Discussion
Jebus
Hi all,


If I'm trying to downmix a stereo wave file (PCM or float) to a mono one, how exactly is that done? I know very little beyond the basics for how digial audio works, but I do understand how PCM is stored in a wave file how to parse channels, convert between integer and float, etc.

I assume I just add the two channels together and then divide by 2 (which is like attenuating by 3.5dB, right)?


Thanks in advance.
john33
Yep, mono channel = (channel a + channel b) * 0.5 smile.gif
greynol
QUOTE(Jebus @ Apr 15 2008, 18:09) *
(which is like attenuating by 3.5dB, right)

6dB cool.gif
DualIP
QUOTE(Jebus @ Apr 16 2008, 03:09) *

I assume I just add the two channels together and then divide by 2

When using 16 bits integer numbers, first divide by 2 then add them together! This rules out overflow which causes very nasty distortion.
When played on a stereo set using both speakers, the resulting mono file will be as loud as the original!

Jebus
Thanks guys!

(I've always been terrible at math... I thought 3.5dB was 50% rolleyes.gif )



I am very careful about clipping, but I'm doing all the processing with normalized 32-bit float values anyhow so it isn't an issue.

bhoar
QUOTE(DualIP @ Apr 16 2008, 00:48) *

QUOTE(Jebus @ Apr 16 2008, 03:09) *

I assume I just add the two channels together and then divide by 2

When using 16 bits integer numbers, first divide by 2 then add them together! This rules out overflow which causes very nasty distortion.


I'm no expert here (not even an amateur), but if you do that, you guarantee the LSB is always zero. That seems like a potential problem area to me.

If forced into 16-bit math when adding two channels, what you probably want to do is add the two together with a 16-bit addition routine that can also alert you of overflow via a carry flag (vs. an error or no notification). Then divide by two (shift right) and apply the carry flag to the MSB if set.

But I'm guessing (again, not an expert here) that even that probably raises some potential noise signature issues in the LSB area - there's probably some sort of dither and/or noise-shaping that should be considered after this step. Or maybe not.

EDIT: of course, above I'm talking of non-negative numbers only. IIRC, audio data is often stored in signed integer format, so the math above needs another level of complexity to ensure the signs are preserved accurately.

-brendan
pdq
QUOTE(bhoar @ Apr 17 2008, 12:29) *

I'm no expert here (not even an amateur), but if you do that, you guarantee the LSB is always zero. That seems like a potential problem area to me.

If I understand what you are saying, both values get truncated in the LSB (which is forced to zero). The LSB of the result is definitely not always zero.

Mathematically, shifting both values right before adding results in an error of between zero and 1 LSB, where adding the two values and shifting the 17-bit sum to the right results in an error of only zero or 0.5 LSB.
Jebus
QUOTE(bhoar @ Apr 17 2008, 10:29) *

I'm no expert here (not even an amateur), but if you do that, you guarantee the LSB is always zero. That seems like a potential problem area to me.

If forced into 16-bit math when adding two channels, what you probably want to do is add the two together with a 16-bit addition routine that can also alert you of overflow via a carry flag (vs. an error or no notification). Then divide by two (shift right) and apply the carry flag to the MSB if set.

But I'm guessing (again, not an expert here) that even that probably raises some potential noise signature issues in the LSB area - there's probably some sort of dither and/or noise-shaping that should be considered after this step. Or maybe not.

EDIT: of course, above I'm talking of non-negative numbers only. IIRC, audio data is often stored in signed integer format, so the math above needs another level of complexity to ensure the signs are preserved accurately.

-brendan


Programatically speaking, it isn't really an issue as long as you use 32-bit intermediates.

in C#:
short monoChannel = (short)Math.Round(((int)leftChannel + (int)rightChannel) / 2F)

Also, in Wave files, all values are signed unless it is 8-bit or less. The above should work either way.
This is a "lo-fi" version of our main content. To view the full version with more information, formatting and images, please click here.
Invision Power Board © 2001-2008 Invision Power Services, Inc.