Help - Search - Members - Calendar
Full Version: Using GPU when convolving
Hydrogenaudio Forums > Hosted Forums > foobar2000 > Development - (fb2k)
Henrik
After digging into the theoretics of convolution I understand that long tap FIR filters require lots of cpu power, with the tradeoff between cpu power and high latency.
I then stumbled across this site, FIR on GPU. Their conclusion is that a modern GPU such as Geforce 6600 outperforms SSE enabled Pentium-4-HT @3.2GHz, when it comes to long tap FIR's.
A fb2k convolving plugin using GPU would be nice, but hey, I don't even know if windows would allow it. But if it does, I wouldn't mind a black screen when convolving...
foosion
Why would you get a black screen? You sure don't get one when running a 3D application in windowed mode. wink.gif
Garf
Unfortunately, the paper is garbage, because they seem to have used the most stupid algorithm possible for their benchmarks (but of course, the most stupid one happened to have a good speedup on the GPU).

It is pointless to do this on the GPU.
Shade[ST]
QUOTE(Garf @ Mar 8 2006, 09:48 PM)
Unfortunately, the paper is garbage, because they seem to have used the most stupid algorithm possible for their benchmarks (but of course, the most stupid one happened to have a good speedup on the GPU).

It is pointless to do this on the GPU.
*


How is the algorithm stupid? Couldn't the additional power of the graphics card benefit the computer anyways? What about the memory bandwidth? Some cards use DDR-2, don't they?
Gabriel
GPU are fast at massively paralel computations, massively beeing several thousands of similar computations.
A big drawback is the high loading time to tranfer data to the graphic card.

This means that right now they are efficient only for very large data sets with identical computations.
Henrik
QUOTE
Unfortunately, the paper is garbage, because they seem to have used the most stupid algorithm possible for their benchmarks (but of course, the most stupid one happened to have a good speedup on the GPU).

I see your point, Garf. If I understand things correctly, they're using linear(?) convolution (element-wize multiplication of vectors) which is exponentially slower than FFT methods on long tapped filters, right?

Here is an example of FFT calculation using GPU.

Couldn't circular buffer/algorithm methods be used to minimize bandwith? After all, if the input is audio samples and a constant FIR, there really is no point shuffling coefficients back and forth.
Garf
QUOTE(Henrik @ Mar 10 2006, 05:54 AM)
QUOTE
Unfortunately, the paper is garbage, because they seem to have used the most stupid algorithm possible for their benchmarks (but of course, the most stupid one happened to have a good speedup on the GPU).

I see your point, Garf. If I understand things correctly, they're using linear(?) convolution (element-wize multiplication of vectors) which is exponentially slower than FFT methods on long tapped filters, right?

Here is an example of FFT calculation using GPU.
*



Yes, exactly. The FFT is much more irregular and complex and hence much harder to implement on a GPU quickly. If you check their results, the GPU is outperformed by the CPU by a factor 5. Oops, that's not so promising anymore.

Of course, as GPU's get faster and better at complex algorithms, this may become worthwhile.

But wanting to do convolutions on the GPU because it's fast at doing FIR filters...now that's just stupid.
Googer
QUOTE(Garf @ Mar 10 2006, 05:12 AM)
Yes, exactly. The FFT is much more irregular and complex and hence much harder to implement on a GPU quickly. If you check their results, the GPU is outperformed by the CPU by a factor 5. Oops, that's not so promising anymore.

Of course, as GPU's get faster and better at complex algorithms, this may become worthwhile.
I don't know if it's been enough time since that paper for what you say to be true but you do have to keep in mind that their paper is from 2003 and the comparison is between a Geforce FX 5800 Ultra and a 1.7 GHz Xeon. Granted today's CPU's are a good amount faster than that (I'll be relatively generous and guess 5x faster when architectural improvements - faster FSB, memory, etc., and new SIMD instructions are factored in), but today's video cards vastly outperform the dustbuster, which is known to particularly be a dog in many shader operations unless you run at reduced precision. tongue.gif
Hanky
GPUFFTW
QUOTE
GPUFFTW is a fast FFT library designed to exploit the computational performance and memory bandwidth on GPUs. Our library exploits the data parallelism available on current GPUs and pipelines the computation to the different stages of the graphics processor. Moreover, our library uses an efficient tiling strategy to further improve the memory performance of our algorithm. GPUFFTW can efficiently handle large real and complex 1-D arrays at 32-bit floating point precision on commodity GPUs. Furthermore, our FFT algorithm achieves comparable precision to the IEEE 32-bit FFT algorithms on CPUs even on large 1-D arrays. The library supports both Windows and Linux platforms.


A benchmark on their site shows that a single NVidia GeForce 7900GTX outperforms a dual Opteron 280 workstation (4 cores @ 2.4 GHz) by a factor 5
This is a "lo-fi" version of our main content. To view the full version with more information, formatting and images, please click here.
Invision Power Board © 2001-2008 Invision Power Services, Inc.