FLACCL: CUDA-enabled FLAC encoder by Gregory S. Chudov (prev. FlaCuda), Formerly "lossless codecs and CUDA" |
![]() ![]() |
FLACCL: CUDA-enabled FLAC encoder by Gregory S. Chudov (prev. FlaCuda), Formerly "lossless codecs and CUDA" |
Sep 26 2009, 13:31
Post
#76
|
|
![]() Group: Members Posts: 651 Joined: 10-January 06 From: Zagreb Member No.: 27018 |
CUDA:
CODE D:\temp_2>CUETools.FlaCuda.exe -8 "Coldplay - Left Right Left Right Left.wav" CUETools.FlaCuda, Copyright (C) 2009 Gregory S. Chudov. This is free software under the GNU GPLv3+ license; There is NO WARRANTY, to the extent permitted by law. <http://www.gnu.org/licenses/> for details. Filename : Coldplay - Left Right Left Right Left.wav File Info : 44100kHz; 2 channel; 16 bit; 00:39:54.7470000 Results : 157,73x; 285525875 bytes in 00:00:15.1828684 seconds; FLAC took one minute and 12 seconds! CODE D:\temp_2>flac -8 "Coldplay - Left Right Left Right Left.wav" flac 1.2.1, Copyright (C) 2000,2001,2002,2003,2004,2005,2006,2007 Josh Coalson flac comes with ABSOLUTELY NO WARRANTY. This is free software, and you are welcome to redistribute it under certain conditions. Type `flac' for details. Coldplay - Left Right Left Right Left.wav: wrote 285951666 bytes, ratio=0,677 Mighty Impressive! I think I will use FlaCuda for FLAC encoding. |
|
|
|
Sep 26 2009, 18:57
Post
#77
|
|
![]() Group: Members (Donating) Posts: 3474 Joined: 7-November 01 From: Strasbourg (France) Member No.: 420 |
I'm far from ridiculous performances with my little fanless 9400GT - so further improvements are still welcome
CODE 0.4 0.6 -0 102.29 145.79 +43% -2 91.93 127.71 +39% -4 62.25 54.83 -12% -6 42.82 47.34 +11% -8 26.29 36.49 +39% -10 11.75 15.22 +30% Speed is clearly better except for -4 compression level for which flacuda 0.4 is faster but with lower compression ratio (see table below). The file is a 54 minutes full album on .wav (PCM) format. That's really impressive. Congratulations! Could someone tell me if it's normal that CUETools.FlaCuda.exe reaches a 50% load on my Core2Duo E6300/9400GT whatever the compression level I choose? Shouldn't the CPU stay inactive during the encoding process? I'm using fb2k so I checked on foobar2000 options if I didn't do something wrong (like active DSP…) but apparently it isn't the case. Is it the same with stronger GPU? APPENDIX: full table CODE flacuda version 0.4 size in KB GPU -0 102.29x 265.028 GPU -2 91.93x 263.071 GPU -4 62.25x 262.059 GPU -6 42.82x 261.771 GPU -8 26.29x 261.579 GPU -10 11.75x 261.254 GPU -11 7.75x 261.137 flacuda version 0.6 GPU -0 145.79x 265.543 GPU -1 131.13x 263.687 GPU -2 127.71x 262.563 GPU -3 126.77x 262.335 GPU -4 54.83x 261.881 GPU -6 47.34x 261.712 GPU -8 36.49x 261.578 GPU -10 15.22x 261.253 GPU -11 9.92x 261.137 flac.exe version 1.21 CPU -0 122.84x 275.077 CPU -5 74.47x 263.170 CPU -8 26.88x 262.408 |
|
|
|
Sep 26 2009, 18:58
Post
#78
|
|
![]() Group: Members (Donating) Posts: 478 Joined: 22-November 01 From: United Kingdom Member No.: 519 |
Holy crap! Nice work, Greg!
FlaCuda -4 on my 8800GT is now pretty much as fast FLAC -6 as 3 of my Q6600 cores, with a tiny bit better compression. The strange slowdown I was getting when using multiple converter threads seems no longer to be an issue; in fact, it speeds up from one thread to up to three, after which it seems to slightly slow down. |
|
|
|
Sep 26 2009, 19:23
Post
#79
|
|
![]() Group: Developer Posts: 653 Joined: 2-October 08 From: Ottawa Member No.: 59035 |
Could someone tell me if it's normal that CUETools.FlaCuda.exe reaches a 50% load on my Core2Duo E6300/9400GT whatever the compression level I choose? Shouldn't the CPU stay inactive during the encoding process? I'm using fb2k so I checked on foobar2000 options if I didn't do something wrong (like active DSP…) but apparently it isn't the case. Is it the same with stronger GPU? Yep... Maybe NVIDIA will fix this at some point, but currently the function call that's waiting for the GPU to finish work is wasting 100% of one CPU, obviously just spinning in a loop and constantly checking if GPU is ready. There are options in CUDA which control the waiting mode, but the one which was supposed to make a process sleep and wait for results doesn't seem to be working on Windows Vista, i suppose it's only implemented on Linux where CUDA driver is more advanced. This post has been edited by Gregory S. Chudov: Sep 26 2009, 19:25 -------------------- CUETools 2.1.4
|
|
|
|
Sep 26 2009, 19:36
Post
#80
|
|
![]() Group: Members (Donating) Posts: 3474 Joined: 7-November 01 From: Strasbourg (France) Member No.: 420 |
Thank you for the explanation, Gregory. I Hope it'll be fixed by Nvidia soon.
This post has been edited by guruboolez: Sep 26 2009, 21:00 |
|
|
|
Sep 26 2009, 20:15
Post
#81
|
|
![]() Group: Members Posts: 2296 Joined: 18-May 03 From: Denmark Member No.: 6695 |
It crashes on my Geforce 9300 that's supposed to support CUDA
Works fine on my Quadro 2700M though. -------------------- Can't wait for a HD-AAC encoder :P
|
|
|
|
Sep 26 2009, 21:20
Post
#82
|
|
![]() Group: Developer Posts: 653 Joined: 2-October 08 From: Ottawa Member No.: 59035 |
Previous versions too?
-------------------- CUETools 2.1.4
|
|
|
|
Sep 27 2009, 01:37
Post
#83
|
|
![]() Group: Members Posts: 2296 Joined: 18-May 03 From: Denmark Member No.: 6695 |
FlaCUDA03-05:
CODE Error : Exception of type 'GASS.CUDA.CUDAException' was thrown. FlaCUDA06 shows info of source-file, then crashes in Windows and returns in the console at last: CODE Unhandled Exception: ErrorNotInitialized Can you recommend anything stable I can test/verify that CUDA is working correctly on my GPU? -------------------- Can't wait for a HD-AAC encoder :P
|
|
|
|
Sep 27 2009, 06:35
Post
#84
|
|
![]() Group: Members Posts: 128 Joined: 9-August 06 Member No.: 33830 |
Yep... Maybe NVIDIA will fix this at some point, but currently the function call that's waiting for the GPU to finish work is wasting 100% of one CPU, obviously just spinning in a loop and constantly checking if GPU is ready. There are options in CUDA which control the waiting mode, but the one which was supposed to make a process sleep and wait for results doesn't seem to be working on Windows Vista, i suppose it's only implemented on Linux where CUDA driver is more advanced. Now that you mentioned... I had a look at the processes and foobar eats up ~20% and FlaCuda ~38 while converting. It would be interesting to compare the energy consumption of the CPU and GPU implementation but I don't have the proper instruments now... |
|
|
|
Oct 1 2009, 01:01
Post
#85
|
|
![]() Group: Members Posts: 847 Joined: 7-October 01 Member No.: 235 |
One small question about FlaCudas default blocksize. When i remeber right flac used a blocksize at -8 of 4608 and changed to 4096 cause of small advantages in compression on average. My own limited tests with FlaCuda show also a tiny advantage with the blocksize at 4096 with a selection of mixed kinds of music.
|
|
|
|
Oct 1 2009, 07:17
Post
#86
|
|
![]() Group: Developer Posts: 653 Joined: 2-October 08 From: Ottawa Member No.: 59035 |
Smaller blocks make it slower, but have their advantages. I will try to reduce performance penalty for smaller blocks if i can.
-------------------- CUETools 2.1.4
|
|
|
|
Oct 1 2009, 15:40
Post
#87
|
|
|
Group: Members Posts: 221 Joined: 12-January 03 From: Kowloon, Hong Kong Member No.: 4533 |
Sorry for breaking in ...
Did someone planning for a Foobar2000 Flac-CUDA plugin? Will makes the conversion a lot easier. -------------------- Hong Kong - International Joke Center (after 1997-06-30)
|
|
|
|
Oct 1 2009, 15:48
Post
#88
|
|
![]() Group: Members Posts: 2296 Joined: 18-May 03 From: Denmark Member No.: 6695 |
Did someone planning for a Foobar2000 Flac-CUDA plugin? No need for a plugin. foobar2000 relies on commandline encoders, and you can setup any commandline-encoder as you wish - including FlaCUDA. However, if you have a multicore CPU, you might need to set the Thread Count to 1 under Advanced, since this encoder is not CPU-dependant. Note that this affects all encoders (including CPU dependant). Would be nice if this setting was user-definable for each encoder. -------------------- Can't wait for a HD-AAC encoder :P
|
|
|
|
Oct 1 2009, 20:08
Post
#89
|
|
![]() Group: Members Posts: 651 Joined: 10-January 06 From: Zagreb Member No.: 27018 |
That was exactly what I was thinking - because it is needed, too, when using iTunes AAC encoder through foobar2000.
|
|
|
|
Oct 2 2009, 08:49
Post
#90
|
|
|
Group: Members Posts: 221 Joined: 12-January 03 From: Kowloon, Hong Kong Member No.: 4533 |
Did someone planning for a Foobar2000 Flac-CUDA plugin? No need for a plugin. foobar2000 relies on commandline encoders, and you can setup any commandline-encoder as you wish - including FlaCUDA. However, if you have a multicore CPU, you might need to set the Thread Count to 1 under Advanced, since this encoder is not CPU-dependant. Note that this affects all encoders (including CPU dependant). Would be nice if this setting was user-definable for each encoder. Thanks. Any example of setting FlacCUDA in Foobar2000? I'm not good at command line settings. -------------------- Hong Kong - International Joke Center (after 1997-06-30)
|
|
|
|
Oct 2 2009, 17:42
Post
#91
|
|
|
Group: Members Posts: 20 Joined: 17-February 09 Member No.: 67079 |
johnsonlam, see screenshot
|
|
|
|
Oct 4 2009, 01:14
Post
#92
|
|
![]() Group: Members Posts: 172 Joined: 18-March 05 From: Wichita, KS Member No.: 20701 |
Sempron 3400+ (Socket 754) o/c'ed to 2500 MHz, GeForce GTX 260 Core 216 at standard frequencies:
Album is Paul Simon's Graceland, converted to a single 456,539,708 byte WAV. Flac 1.2.1b (RareWare's ICL compile) -8 : 01:06.812, 250,990,457 bytes. FlaCuda06 -8 : 00:26.015, 250,421,454 bytes. 2.57x faster, sounds good to me! -------------------- My photo gallery: http://www.flickr.com/photos/inghramjp
|
|
|
|
Oct 4 2009, 02:08
Post
#93
|
|
|
Group: Members Posts: 410 Joined: 9-August 07 From: Los Angeles Member No.: 46048 |
Can't wait to see how Fermi does with FLACuda. Very exciting stuff to think about, that's for certain.
|
|
|
|
Oct 5 2009, 02:56
Post
#94
|
|
|
Group: Members Posts: 52 Joined: 10-December 06 Member No.: 38550 |
Can't wait to see how Fermi does with FLACuda. Very exciting stuff to think about, that's for certain. A DirectX11 version would be better as it could run on ATI hardware too. |
|
|
|
Oct 5 2009, 17:27
Post
#95
|
|
|
Group: Members Posts: 221 Joined: 12-January 03 From: Kowloon, Hong Kong Member No.: 4533 |
-------------------- Hong Kong - International Joke Center (after 1997-06-30)
|
|
|
|
Oct 5 2009, 20:06
Post
#96
|
|
![]() Server Admin Group: Admin Posts: 4810 Joined: 24-September 01 Member No.: 13 |
Can't wait to see how Fermi does with FLACuda. Very exciting stuff to think about, that's for certain. A DirectX11 version would be better as it could run on ATI hardware too. Or OpenCL... |
|
|
|
Oct 5 2009, 20:44
Post
#97
|
|
|
Group: Members Posts: 954 Joined: 6-September 04 Member No.: 16817 |
|
|
|
|
Oct 7 2009, 12:59
Post
#98
|
|
![]() Group: Members Posts: 128 Joined: 9-August 06 Member No.: 33830 |
Or OpenCL... Haven't Nvidia just realised their OpenCL drivers? Wish AMD would do the same! That would be the best - there's no hope for DX11 compute shader for XP (ok, I know, I know, in a few years no one will still be using XP...) AMD/CUDA is the same |
|
|
|
Oct 8 2009, 09:44
Post
#99
|
|
|
Group: Members Posts: 329 Joined: 30-September 05 From: London, Europe Member No.: 24805 |
Or OpenCL... Haven't Nvidia just realised their OpenCL drivers? Wish AMD would do the same! Apparently AMD/ATI is not sitting on its hands: ATI Jumps on OpenCL Bandwagon by Releasing OpenCL Drivers. OpenCL would be a lot better than DirectX support as it means it won't be limited to just Windows. Apparently Windows 7 comes standard with OpenCL compatible NVIDIA drivers which is promising... A more in depth look at OpenCL: OpenCL: To GPGPU and Beyond AMD: An Introduction to OpenCL This post has been edited by Maurits: Oct 8 2009, 10:14 |
|
|
|
Oct 19 2009, 12:17
Post
#100
|
|
|
Group: Members Posts: 329 Joined: 30-September 05 From: London, Europe Member No.: 24805 |
QUOTE AMD Leads Industry as First Chip Supplier to Offer OpenCL™ Development Kit that Supports GPUs and x86 CPUs ATI Stream SDK v2.0 beta with support for OpenCL 1.0 allows developers to program complete AMD platforms for breakthrough performance on processing-intensive applications October 19, 2009 12:01 AM Eastern Daylight Time SUNNYVALE, Calif.--(EON: Enhanced Online News)--AMD (NYSE: AMD) announced availability of a key piece of its strategy to help improve the end-user's compute experience by leveraging the combined power of AMD graphics processors (GPUs) and AMD multi-core x86 processors through software. With the beta release of its ATI Stream Software Development Kit (SDK) v2.0, featuring OpenCL 1.0 support, AMD provides a free set of tools software developers can use to create applications that are accelerated by AMD GPUs and AMD multi-core x86 CPUs working together. The ATI Stream SDK v2.0 is certified compliant with OpenCL 1.0 by the Khronos Working Group. * The ATI Stream SDK v2.0 beta is available for download today and can be accessed here. Good news... |
|
|
|
![]() ![]() |
|
Lo-Fi Version | Time is now: 19th June 2013 - 02:33 |