Sorry for blowing your cover - those time cops will be after you!
BTW, can I ask why you do an FFT in your plugin? Is it simply to measure the approximate effective scaling of the impulse response and automatically adjust the gain?
I agree that cross-mixing is bloat for most purposes (e.g. simple equalization), though it provides better representation of reflections for room simulation (particularly binaural room simulation for headphones).
The other application of cross-mixing convolution I forgot to mention is for speaker simulation on headphones. If you like the sound of your home HiFi setup in its sweet spot, you could record its impulse response on each channel in stereo using, for example, binaural headphones (like those on
Jim Bamford's Binaural Field Recordings site). By crossmix-convolving that with your music, you could, for example use FB2K and foo_clienc to output MP3 or MP2 files to play on a portable with earphones that sound just like your favourite system (apart from any non-linear effects). Without crossmixing impulse responses, you don't get the crossfeed effect.
I know most people don't have binaural microphones or dummy heads to do this measurement (or the deconvolution to create a binaural effect over loudspeakers) but a few people with the right equipment could create impulse responses for a variety of pleasant-sounding loudspeaker set-ups to act as loudspeaker simulation for headphones that could be even more natural (and doubtless more coloured!) than many crossfeed plugins.
A few links relating to this are included on:
http://www.geocities.com/kangimp/One worth looking for is Angelo Farina and his Aurora and Ramsete software. You may need to use Google or www.archive.org to download the cached versions of pages, because I'm having trouble reading them direct. This package includes some methods regarding the calculation of inverse impulse responses (i.e. for deconvolution of linear effects) and he also has numerous scientific papers on these matters. There are various methods suggested which have different qualiteis.
Getting a bit technical, division works perfectly in the Fourier domain to deconvolve in the time domain (albeit that highly attenuated frequencies when amplified back to the original level tend to be very noisy - it's division by a small number represented in a fixed-point scale). The same happens in image deblurring with 2D Fourier Transforms, where high spatial frequencies tend to become noisy in the sharpened image.
However, simply dividing by the convolution function in the time domain doesn't do the same thing. I think that's equivalent to convolution in the frequency domain (or perhaps convolution by the complex-conjugate - my memory is a bit fuzzy). I can't imagine it sounding pretty, especially with all those divide-by-zeroes!
where * represents convolution and x is multiplication:
m(t) = g(t) * h(t)
M(f) = G(f) x H(f)
If you want to recover the original signal, g(t) (which can be represented by its full Fourier Transform, G(f)) which is the sound before it was convolved by the impulse response of your system and listening room, you can take the Fourier Transform of the output, M(f) and divide it by the F.T. of the impulse response, H(f) to obtain G(f). This is the same as multiplying it by 1/H(f). If so we have:
G(f) = M(f) x (1/H(f))
so we can also say
g(t) = m(t) * inverseFFT(1/H(f))
So if you know h(t), you can derive the deconvolution function by taking its Fourier Transform, H(f), inverting it (watch out for divide-by-zero!) and taking the inverse Fourier Transform of that. You might find that this contains some complex components (I haven't thought it through to work that out), and near-infinite components (divide by zero for frequencies that were fully attenuated in the original).
For example, your DAC should contain a brickwall filter to remove frequencies between about 20 and 22.05 kHz, which the pure deconvolution function would try to correct for, leading to masses of HF noise (because of the near-zero functions). It's not desirable or pracitcal to overcome this filter, so I'd modify the deconvolution filter to act only on audible frequencies, up to around 20 kHz while not attempting to boost frequencies too near the Nyquist. This could be done in the Fourier domain, e.g. by setting zeroes instead of calculating 1/H(f) for frequency bins above +20 kHz (and those below -20 kHz in negative frequency) or it could be done by post-filtering the deconvolution function with a suitable low-pass.
There are a number of technical difficulties in these sorts of techniques and various solutions.
Impulses, being very brief, are rather low energy, and this energy is spread by the true impulse response so that the measured response is prone to being noisy (causing problems with accurate deconvolution). One method of improving the signal-to-noise ratio is to average many many impulse responses and reduce the measurement error of the average.
For simple frequency equalization, tone sweeps (chirps) can work and can contain more energy per frequency than a single Dirac delta function pulse.
From a prior life when I was heavily into research about clipped digital correlation or noisy analogue signals, I also remember techniques like chirp-Z are possible for some similar analysis (equivalent to correlation) and might be adaptable.
One nice technique, also used in radar and time-domain reflectometry, is the use of pseudorandom noise, which has a white spectrum, just like a delta function, but much higher average energy. You can either cross-correlate the received signal with the known pseudrandom input signal and determine the time signature or you can take the Fourier transform of the correlation function to determine the power spectrum. I suspect a similar technique could be adapted to deriving impulse responses and inverse convolution functions from relatively noisy measurements. A correlation function is closely related to an impulse response. Similar ideas are used in spread-spectrum communications techniques like Code Division Multiple Access (CDMA).