Typically, downmixing from stereophonic (2-channel) to monophonic (1-channel) sound is achieved by adding the sample values from left and right channels and dividing by 2.
Usually this works pretty well, partly because most music for playback on HiFi's has been professionally recorded using microphone techniques that keep left and right largely in-phase. If they didn't, it wouldn't play well on old mono radios (or AM radio or TV). Likewise, studio recordings compiled from individually miked instruments and vocalists then placed in a stereo soundstage, are usually mastered to ensure mono compatibility.
However, if there are phase differences between channels, particularly frequency-dependent phase differences (such as simple time delays), it can create a comb-filter effect or any arbitrary effect where parts of the sounds are missing, where some frequencies add constructively, others cancel out completely, and many are somewhere in between. This is particularly true of artificial stereo effects, I believe.
Mostly, simple downmixing works. Sometimes when it doesn't, simply choosing the left or the right channel is good enough. I seem to recall that before the days of NICAM stereo broadcasts, TV viewers watching the video of Kiss by Tom Jones/Art of Noise were provided with the left channel only, which completely removed much of the sound in certain sections of the song, leaving practically only electronic drum sounds. This is one occasion when it didn't work (although it added some interesting dynamic to the music).
Some samples would never be quite right with either method.
What I've never found is a robust "secure mode" for downmixing to mono the way the human ear would expect. All encoders, even multi-channel capable ones like Ogg Vorbis seem to simply take the average of the sample values.
I've tried Googling, but found nothing. Has anyone heard of such a tool?
I'd assume there might be a few approaches.
A mathematical way would be to derive the frequency spectrum (real part or the power spectrum) of each channel for short frames or blocks and add or average the amplitudes of the components together with the same phase relationship for the part from each channel. For projects like LAME, this sounds like the sort of thing that could be done while encoding to MP3, just after the MDCT, rather than doing the downmix first. Adjustments to the phase relationship may induce clipping on decode.
Another approach might involve some form of psychoacoustics that I haven't thought of. Any ideas?
Regards,
Dick Darlington