Audio mixing: channels stored interleaved vs. not

Topic: Audio mixing: channels stored interleaved vs. not (Read 7592 times) previous topic - next topic

0 Members and 1 Guest are viewing this topic.

Audio mixing: channels stored interleaved vs. not

2012-10-27 20:26:47

When downmixing non-interleaved multichannel audio, does it make sense to interleave it first, or just process it as it is?

If as is, sample by sample jumping between channels, or a buffer-full of each channel at a time?

Audio mixing: channels stored interleaved vs. not

Reply #1 – 2012-10-28 01:21:28

I'm not sure I understand what you want to achieve. Just making it work, or designing a computationally efficient algorithm e.g. to minimize random reads from disk. Downmix of e.g. 5.1 to stereo or multi-track studio recording? Using existing software or writing your own? Any software environment, such as Audacity, or a bunch of files - one per channel pr track?

Audio mixing: channels stored interleaved vs. not

Reply #2 – 2012-10-28 04:28:07

I mean efficiency-wise for an algorithm (x87 and later SSE) for data in memory. Downmix an arbitrary number of input channels: multiply/add input channels to output channels according to a mix matrix. Another point, relevant to SSE, is that the data is only 4-byte aligned.

Audio mixing: channels stored interleaved vs. not

Reply #3 – 2012-10-28 10:00:24

Especially with it being in fast memory rather than disk I'd have thought there's not a lot of difference between

INTERLEAVE ALL THE AUDIO from 6 channels - THEN DOWNMIX from one interleave into, say 2 channels, one sample at a time

and

DOWNMIX from 6 independent channels into 2 channels on sample at a time.

To me, the latter seems easier if it's in memory.

Maybe someone who has done this already can make a recommendation or you can hack together some quick code each way and simply test it for speed. You don't need to implement anything complicated like the mixdown and whether or not to dither down the mixed stream.

E.g. populate your memory with 6 streams of random numbers, then just the read in the data and do something simple like copy its sample value to a variable then discard it and move on to the next channel / sample. You could even test it only on a bunch of data of fixed size and use a for loop to dispense with checking for end of stream, and simply create a few megabytes of fixed value data on 6 channels into memory at the start.

Put the whole lot in a for... next loop and repeat enough times for decent timing accuracy and you'll know which is faster using your hardware and your compiler.

Then implement the faster method properly for your requirements.

Audio mixing: channels stored interleaved vs. not

Reply #4 – 2012-10-28 12:07:59

I do plan to do some benchmarking eventually, but I was hoping someone already explored it. Besides differences in results on different CPUs and depending on number of channels, I expect small things I may not think of can tip the scales.

Audio mixing: channels stored interleaved vs. not

Reply #5 – 2012-10-28 14:40:22

Well, in the interleave-first case, you do: 1. read samples from various places, 2. store samples that belong together in a new place, 3. read samples from the new place, 4. multiply/add, 5. store result.
In the process-as-is case, you do: 1. read samples from various places, 4. multiply/add, 5. store result.
Hmm … Will adding additional steps (2 and 3) make a program run faster, especially if they involve memory access where the data does not fit into the cache? I very much doubt it, but perhaps I am missing something.

Audio mixing: channels stored interleaved vs. not

Reply #6 – 2012-10-28 15:51:01

I don't know. Interleave-first can at least read sequentially (one pass per channel), and the final interleaved data will be 16-byte aligned.

Audio mixing: channels stored interleaved vs. not

Reply #7 – 2012-10-28 16:11:23

"arbitrary number of input channels" means that SSE code will not be efficient for interleaved data.

Audio mixing: channels stored interleaved vs. not

Reply #8 – 2012-10-28 17:38:25

Why not? Just doing 4 at once. The end could have special handling.

Audio mixing: channels stored interleaved vs. not

Reply #9 – 2012-10-28 17:50:04

So interleaved audio is "4 samples from channel #1, then 4 samples from channel #2, ..."?

Audio mixing: channels stored interleaved vs. not

Reply #10 – 2012-10-28 18:25:29

No, it's sample 0 for channels 0..N, sample 1, etc. But why not process 4 channels at a time?

Audio mixing: channels stored interleaved vs. not

Reply #11 – 2012-10-28 21:25:19

Quote from: sheh on 2012-10-28 18:25:29

No, it's sample 0 for channels 0..N, sample 1, etc. But why not process 4 channels at a time?

For a fixed number of channels, its probably not too much different either way, at least assuming you're on an SSE flavor that can do some kind of scatter/gather loads (IIRC newer flavors have this). On systems without this (e.g. older ARM without NEON), interleaved is likely to be slower due load/store throughput and register space.

For a variable number of channels though, that could be tough even on modern CPUs. You might have to special case each number to get efficient processing.

Audio mixing: channels stored interleaved vs. not

Reply #12 – 2012-10-31 16:06:31

Quote from: saratoga on 2012-10-28 21:25:19

For a fixed number of channels, its probably not too much different either way, at least assuming you're on an SSE flavor that can do some kind of scatter/gather loads

I don't think any SSE does scatter/gather, but anyway I'm only looking at SSE1.

Quote

On systems without this (e.g. older ARM without NEON), interleaved is likely to be slower due load/store throughput

You mean the initial copy-to-interleaved?

Quote

and register space.

What do you mean?

Audio mixing: channels stored interleaved vs. not

Reply #13 – 2012-10-31 16:20:50

You can simulate scatter/gather (for de-interleaving) with either shufps or unpckhps/unpcklps, but it's only really efficient for 2 or 4 channels. It'd be far better to just use separate per-channel (non-interleaved) buffers, perform your processing on them and then interleave only when necessary. Interleaved audio is a relic from WAV and CDDA and is generally an inefficient way of dealing with multichannel audio data.

Audio mixing: channels stored interleaved vs. not

Reply #14 – 2012-10-31 21:15:35

Quote from: sheh on 2012-10-31 16:06:31

Quote
and register space.
What do you mean?

If you run low on register space, you will likely find that interleaved is much harder to do efficiently. For example, in benski's example of using shufps in SSE to do gather, you cannot simply load 1 128 bit register of 4 consecutive singles since it gives you 64 bits worth of each channel. This means that if you want 4 consecutive values (say to implement an FIR filter) you'll have to load 2 128 bit values at once and thus need 2 registers. This may (or may not depending on what you are doing) cause you to run out of registers and have to resort to a less efficient algorithm.

Audio mixing: channels stored interleaved vs. not

Reply #15 – 2012-11-04 14:15:27

Thanks.

Audio mixing: channels stored interleaved vs. not

Reply #16 – 2012-11-04 22:06:27

It probably doesn't matter what you do as long as it's reasonably efficient. Often it's better to work with non-interleaved data though to keep things simpler as handling mixing of 1, 2, 4, 6, or whatever channels ends up with many permutations of algorithms for each channel case. It would probably be worth trying to get the data SSE aligned though if possible, or make algorithms that can handle the bulk of the data aligned and handle the starts and ends one at a time if need be. Later versions of SSE can handle unaligned load/store but then you get into architecture specific penalties that can sometimes cost you more than the actual cost of the mixing operations.

It's really only worth going to the trouble of operating on interleaved data if you need to run more expensive DSP on your data that can't be vector processed. E.g. IIR filters.

I would say though that mixing audio data is about the cheapest DSP operation you could ever do. Even on SSE1 era CPUs mixing a few data streams together is hardly going to show up as CPU use. So I'm not sure you want to worry about it all that much unless there is some reason to.

Audio mixing: channels stored interleaved vs. not

Reply #17 – 2012-11-07 15:41:40

Yeah, special-handling the starts/ends then doing the middle aligned is probably the most elegant way around the non-alignment.

Some mixing might not be a problem by itself, but it wouldn't hurt to optimize a little since I want to be able to decode and mix a few tens of channels on Pentium 3s (plus some other things going on).

Notice