Help - Search - Members - Calendar
Full Version: Multithreaded Replay Gain
Hydrogenaudio Forums > Hydrogenaudio Forum > General Audio
skamp
When people ask for multithreaded encoding support, they are usually answered that they should encode multiple files at once, and that the increased complexity of multithreading wouldn't be worth it. While that point is debatable, the suggested work-around works well enough when dealing with single-track files, which happens to be the most popular form of encoded audio.

There's one operation that's still desperately single-threaded though: Replay Gain computation in album mode, which requires to decode all files sequentially. As with encoding, that could fairly easily be worked around with third-party scripts/applications, if only the utilities (metaflac, wvgain, vorbisgain…) would provide an option for generating some kind of output or log file containing all the elements required for album gain computation. A script would then be able to run RG utilities on multiple files concurrently, gather the necessary data they output, and run the utility once more in a final pass to compute the album gain from that data only.

The details of the implementations are to be determined, but at least existing RG utilities would remain singlethreaded. Computing the album gain with one CPU core after encoding an album at 2, 3, 4 times the speed is rather frustrating…
tuffy
I don't think it's possible to apply album gain in a multithreaded fashion, short of adjusting the algorithm. The slowest part of the calculation, in my experience, is applying the equal loudness IIR filters and those can't be broken apart into smaller chunks. For example, figuring out filtered sample X not only requires unfiltered sample Y, but also the value of filtered sample X - 1 (and Y - 1, and so on). Building data cumulatively like that means we can't send pieces of the file onto different CPUs to work independently on the problem.

I'm no math expert, though, so perhaps someone smarter than I can think of a way to pull it off.
Yirkha
The built-in ReplayGain scanner in current foobar2000 release runs in multiple threads, by scanning the $number_of_processors tracks at once. The only "single-threaded" operation is the tag writing afterwards.
skamp
While I don't fully understand the algorithm (far from it), I do think I understand the sequential aspect of it (please correct me if I'm wrong). But it seems to me it's mostly an accumulation of data from each track (or rather, each chunk of audio), that is operated on in the final stage to compute the album gain. My suggestion is to make each track's relevant data available externally, so that it can be generated in a parallel fashion; the final step would require only to order the gathered data and compute it. If there's a fundamental flaw in my assessment, please let me know.
cabbagerat
QUOTE(skamp @ Mar 28 2008, 07:46) *

While I don't fully understand the algorithm (far from it), I do think I understand the sequential aspect of it (please correct me if I'm wrong). But it seems to me it's mostly an accumulation of data from each track (or rather, each chunk of audio), that is operated on in the final stage to compute the album gain. My suggestion is to make each track's relevant data available externally, so that it can be generated in a parallel fashion; the final step would require only to order the gathered data and compute it. If there's a fundamental flaw in my assessment, please let me know.
There certainly doesn't seem to be any fundamental problem doing this with one thread per file. It might not be any faster, however, as the bottleneck could be at the hard drive, not the CPU. It would be interesting for somebody to code it up.
Borisz
QUOTE
the bottleneck could be at the hard drive, not the CPU.

Word. I can only encode to FLAC at some 200x speed because of hard drive speed limitations. Replaygain, which is less cpu intensive and requires half as many hard drive action (only reading, no writing) can get up to 420x on my overclocked e6550 (for an album encoded in libflac 1.2.1 20070917). This is using foobar2000 with multithreading enabled, the replaygain scan used up 100% of my cpu according to task manager so both cores were used.

Now to translate those numbers into real world scenarios: That 420x speed replaygain scan completed in under ten seconds for a 67 minute album.

Read that again: ten seconds.

Ten.

Seconds.

I think thats fast enough. Don't you?
skamp
QUOTE(Borisz @ Mar 29 2008, 00:18) *
Replaygain [...] can get up to 420x on my overclocked e6550

Not everybody overclocks their CPUs.

QUOTE(Borisz @ Mar 29 2008, 00:18) *
This is using foobar2000 with multithreading enabled, the replaygain scan used up 100% of my cpu according to task manager so both cores were used.

Which suggests that Foobar's implementation of Replay Gain is indeed multithreaded. My post concerns developers who don't want to add multithreading to their apps.

QUOTE(Borisz @ Mar 29 2008, 00:18) *
That 420x speed replaygain scan completed in under ten seconds for a 67 minute album. Read that again: ten seconds. [...] I think thats fast enough. Don't you?

It takes 26 seconds for a 53 minutes FLAC album (32 seconds for the same album in Ogg Vorbis) on my quad-core Phenom 9600 (2.3 GHz) with only one core used. Is that fast enough? Sure. A lot of things are fast enough. Does it stop the industry from making faster hardware, and consumers from buying it? It's fast enough, so, what, we stop trying to improve things? Did Josh stop making FLAC faster? Did David with WavPack? Did Thomas with TAK?

Again, I'm not even trying to push for actual multithreading, merely a workaround.
Mike Giacomelli
QUOTE(skamp @ Mar 28 2008, 18:42) *

Which suggests that Foobar's implementation of Replay Gain is indeed multithreaded. My post concerns developers who don't want to add multithreading to their apps.


Did you read the second reply to your thread? It is multithreaded, and it works for all formats that support replaygain. Just use it already.
Lyx
QUOTE(skamp @ Mar 29 2008, 00:42) *

Is that fast enough? Sure. A lot of things are fast enough. Does it stop the industry from making faster hardware, and consumers from bying it? It's fast enough, so, what, we stop trying to improve things?

Ah, senseless and meaningless "improvements" - more for the sake of more, not for it being useful. A development-tactic typically used by people who are obsessed or uninnovative.

Let me show you the stupidity of this think-mode with a practical analogy: Can humans perceive samplerates above 44khz? No. Does that stop the industry from increasing samplerates in consumer tech and media? No. BUT comsumers pay for added research and increase in resourceusage - they also pay for upgrading existing tech. AND it serves as a great excuse for NOT innovating with truely useful new features. So to summarize: the result is increased cost, increased resourceusage, lack of useful innovation - and the perpetrators of this idiocy staying in business. Where are the advantages?

QUOTE
Did Josh stop making FLAC faster? Did David with WavPack? Did Thomas with TAK?

Compressors still have room for useful speed increases - replaygain in turn is already at the edge of being trivial. Usefulness of speed-increses is not linear: If an app starts up in 9 instead of 10 seconds, then thats more useful, than an app starting up in 1 second instead of 2. This is because application usage is a machine-user-interaction.... not just the machine is doing something, but also the user... and there is a limit to how fast the user can accomplish tasks - once you approach that limit, the bottleneck is no longer the machine, but the user.

- Lyx

P.S.: Also: Decompression speed is very important for hardware support in portable players. Part of the reason for FLAC gaining so much hardware support, is that its decompression is fast enough for portables, regardless of encoding-mode.
Jebus
Hey, cool thread! Guess what I was working on today...

Omni Encoder 2.0 (that thing I keep delaying) will have threaded everything, including replaygain analysis. It's very smart, too: it could be scanning 4 files at a time for album analysis, then as each one finishes, switches to encoding 2 of them with 2 different codecs. The # of concurrent "jobs" is configurable but defaults to your # of CPU cores.

I think this makes the most sense from a development perspective, vs "optimizing" Lame (for example). Lame is complex enough; the threading should occur at a higher level. Who cares if it takes 30 seconds vs 15 seconds to encode a single file.
skamp
QUOTE(Mike Giacomelli @ Mar 29 2008, 00:51) *
Did you read the second reply to your thread? It is multithreaded, and it works for all formats that support replaygain. Just use it already.

Ahem, I don't use Foobar, I don't use Windows, and I can't easily run Wine. Maybe everybody should run Foobar. And Windows. Uh?

QUOTE(Lyx @ Mar 29 2008, 01:10) *
Compressors still have room for useful speed increases

Do they? By using all four cores of my CPU, I encode a 74 minutes album with FLAC in 9 seconds at the default encoding setting, 30 seconds at the highest setting (album gain computation, being limited to one core, takes 35 seconds). Wouldn't you say that's fast enough?
Sarcasm aside, don't you think it's rather absurd that Replay Gain takes longer to compute than the encoding itself at the highest compression setting, nearly 4 times longer with the default setting, on a mid-range CPU?

QUOTE(Lyx @ Mar 29 2008, 01:10) *
Also: Decompression speed is very important for hardware support in portable players. Part of the reason for FLAC gaining so much hardware support, is that its decompression is fast enough for portables, regardless of encoding-mode.

So, we've determined that FLAC is fast enough for encoding and fast enough for decoding. It's also been established in many other threads that no competing lossless codec will ever be able to make a dent in FLAC's market share unless they offer something like 10% more compression while remaining as fast (at least for decoding). What are we to conclude? David, Thomas, Ghido, stop working on your codecs, it's a waste of time! Sigh.

QUOTE(Jebus @ Mar 29 2008, 03:58) *
Lame is complex enough; the threading should occur at a higher level.

That seems to be the consensus indeed, and I'm not qualified to go against it. Which is why I'm trying to think of compromises, instead of either giving it up altogether or bitching about it. That said, I can't help but notice that video codecs such as x264 are multithreaded, and I doubt they're much simpler than audio codecs…

QUOTE(Jebus @ Mar 29 2008, 03:58) *
Who cares if it takes 30 seconds vs 15 seconds to encode a single file.

A few people I guess, otherwise there wouldn't be so many threads and posts on this forum alone about multithreaded codecs; nobody would use the multithreaded versions of oggenc and LAME; nobody would download CPU-specific optimized builds; nobody would develop them in the first place.

Guys, while I appreciate the fact that I get a reaction (for once), you're not helping. If there are multithreaded RG applications on Windows already, that's great, just not for me. I'm thinking of other platforms and existing tools in particular.
Jebus
QUOTE(skamp @ Mar 28 2008, 21:41) *


Guys, while I appreciate the fact that I get a reaction (for once), you're not helping. If there are multithreaded RG applications on Windows already, that's great, just not for me. I'm thinking of other platforms and existing tools in particular.




Did you mention that you don't use windows in your first post? I didn't notice... well, I'm mostly caught up coding right now in C#, but i do plan on doing some mac stuff later (bought a macbook recently and love it). So maybe down the road... it wouldn't be hard at all, really. The replaygain library just needs to be scripted, really.
Borisz
QUOTE(skamp @ Mar 29 2008, 00:42) *

QUOTE(Borisz @ Mar 29 2008, 00:18) *
Replaygain [...] can get up to 420x on my overclocked e6550

Not everybody overclocks their CPUs.

Ignoring the fact that theres no point in NOT overclocking a Core 2 Duo (these monsters were meant for overclocking), even without overclocking I'd get speeds in the 300-400x range. I didn't overclock that much, I only raised the ram speed to 400 from 333 (since I have ddr2 800). This increases the cpu speed from 2,33ghz to 2,8ghz, pretty awesome for stock voltage and stock cooler.
slks
I get around 300x doing album scans with foobar, on a non-OC'd Athlon 64 FX-62. They don't even sell FX-62s anymore, in 2008 I'd consider it a midrange CPU.

It takes around 15 seconds to scan a whole album. When ripping a CD takes 20-30 minutes, the extra 15 seconds doesn't bother me at all.
Mike Giacomelli
QUOTE(skamp @ Mar 28 2008, 22:41) *

Sarcasm aside, don't you think it's rather absurd that Replay Gain takes longer to compute than the encoding itself at the highest compression setting, nearly 4 times longer with the default setting, on a mid-range CPU?


Replaygain processing time is more or less just the decode time for a codec. If replaygain really is slower then encoding, it just means the codec takes longer to decode then encode.

QUOTE(skamp @ Mar 28 2008, 22:41) *

Guys, while I appreciate the fact that I get a reaction (for once), you're not helping.


Neither are you. Helping involves doing things, not complaining to people who clearly don't care.

QUOTE(skamp @ Mar 28 2008, 22:41) *

If there are multithreaded RG applications on Windows already, that's great, just not for me. I'm thinking of other platforms and existing tools in particular.


Which makes your decision to post here all the more baffling. Why not go ask whatever developer writes your software of choice to improve their threading, or better yet do it yourself? Complaining on some random forum is just dumb. What are the odds that your software of choice is written by one of the couple dozen random people who are reading this? And if they are, what are the odds they read to the bottom of this thread where you finally say what you want?
Teknojnky
Apparently this won't help the OP since he isn't using windows, but dbpoweramp 'reference' supports multi-threaded replay gain scanning. However the free/powerpack versions are crippled in this aspect won't support multi-cpu encoding.
skamp
QUOTE(Mike Giacomelli @ Mar 29 2008, 22:15) *
Replaygain processing time is more or less just the decode time for a codec. If replaygain really is slower then encoding, it just means the codec takes longer to decode then encode.

You missed the part where 4 cores where used for encoding and only one for Replay Gain processing, because files can be encoded in parallel even with singlethreaded codecs, while their RG implementations can't use more that one core, hence the big difference.

QUOTE(Mike Giacomelli @ Mar 29 2008, 22:15) *
Neither are you. Helping involves doing things, not complaining to people who clearly don't care.

I would, if stock RG tools would provide me with the means to compute RG in parallel with just a little more output. My skills are limited to scripting, there's nothing I can do about multithreading in C/C++/whatever apps. Btw, this thread wasn't meant for you. If you don't care, why are you replying?

QUOTE(Mike Giacomelli @ Mar 29 2008, 22:15) *
Why not go ask whatever developer writes your software of choice to improve their threading, or better yet do it yourself? Complaining on some random forum is just dumb. What are the odds that your software of choice is written by one of the couple dozen random people who are reading this?

I believe the people who wrote metaflac, wvgain, vorbisgain, mp3gain, wavegain, replaygain are active HA members.

So far I'm not the one bitching, you're the one with the usual jackass attitude of telling people to "just use" X software. This thread is a freakin' feature request. But some people always mistake those for bitching or whining. :-|
This is a "lo-fi" version of our main content. To view the full version with more information, formatting and images, please click here.
Invision Power Board © 2001-2008 Invision Power Services, Inc.