FLACCL: CUDA-enabled FLAC encoder by Gregory S. Chudov (prev. FlaCuda), Formerly "lossless codecs and CUDA" |
![]() ![]() |
FLACCL: CUDA-enabled FLAC encoder by Gregory S. Chudov (prev. FlaCuda), Formerly "lossless codecs and CUDA" |
Jan 29 2010, 17:53
Post
#151
|
|
![]() Group: Members Posts: 2296 Joined: 18-May 03 From: Denmark Member No.: 6695 |
according to nVidia's driver page, you are not using the latest drivers .. try these: Not sure they will work. Geforce 9300 isn't listed under supported products. I've been struggling previously to get newer releases to work on this. Note that while it's called Geforce 9300, it's a mainboard chipset based on a nForce 730i chip - Not to mix that up with the Geforce 9300 GS chip, which are entirely different.http://www.nvidia.com/object/win7_winvista...96.21_whql.html as for your graphics card, it should be able to handle CUDA, if it has at least 256MB of local memory I have not yet verified with 100% accuracy that I have activated 256MB on it, but according to Windows, it seems that I have. I'll be back on this one but to be honest, I wouldn't expect any miracles, since it seems to be equipped with a mere of 16 cores Well it IS sold as having CUDA capability, so no matter how good it would ever perform it should, and I'll be glad if I could just get FlaCUDA up and running. It compresses better than native FLAC, so if it's just able to compress my lossless music even further I'm happy This post has been edited by odyssey: Jan 29 2010, 17:55 -------------------- Can't wait for a HD-AAC encoder :P
|
|
|
|
Jan 29 2010, 19:11
Post
#152
|
|
![]() Group: Members Posts: 128 Joined: 9-August 06 Member No.: 33830 |
Having a mere 16 cores is not that bad - my 8600GT has only 32 and it's faster in FLAC encoding (I've installed new drivers a few weeks ago and tried the actual flaCUDA) than 2 stock flac encoders running in parallel on my 3.33GHz conroe core2duo - so if these 16 cores have the same clock rate (which I'm not sure about at all...) it can still be faster than a single threaded software encoder on virtually any non-overclocked CPU.
It's almost scary how well these low level GPUs stand against much higher class CPUs of their own age |
|
|
|
Jan 29 2010, 20:37
Post
#153
|
|
![]() Group: Members Posts: 2296 Joined: 18-May 03 From: Denmark Member No.: 6695 |
Problem found: It's due to Microsoft's RDP-lameness. When using remote desktop, the graphics adapter is disabled and replaced by the one used for RDP.
So thanks MS, I can't use CUDA programs using RDP! -------------------- Can't wait for a HD-AAC encoder :P
|
|
|
|
Feb 1 2010, 09:43
Post
#154
|
|
|
Group: Members Posts: 114 Joined: 31-May 07 Member No.: 43892 |
according to nVidia's driver page, you are not using the latest drivers .. try these: Not sure they will work. Geforce 9300 isn't listed under supported products. I've been struggling previously to get newer releases to work on this. Note that while it's called Geforce 9300, it's a mainboard chipset based on a nForce 730i chip - Not to mix that up with the Geforce 9300 GS chip, which are entirely different.http://www.nvidia.com/object/win7_winvista...96.21_whql.html look again ... QUOTE GeForce 9 series: 9500 GS, 9600 GT, 9200, 9800 GX2, 9500 GT, 9600 GS, 9300, 9800 GT, 9400 GT, 9300 GS, 9400, 9600 GSO, 9300 GE, 9800 GTX/GTX+ as for your graphics card, it should be able to handle CUDA, if it has at least 256MB of local memory I have not yet verified with 100% accuracy that I have activated 256MB on it, but according to Windows, it seems that I have. I'll be back on this one you could try and run GPU-z for getting those details, as well as information about which APIs are supported by your card http://www.techpowerup.com/gpuz/ but to be honest, I wouldn't expect any miracles, since it seems to be equipped with a mere of 16 cores Well it IS sold as having CUDA capability, so no matter how good it would ever perform it should, and I'll be glad if I could just get FlaCUDA up and running. It compresses better than native FLAC, so if it's just able to compress my lossless music even further I'm happy fair enough ... Problem found: It's due to Microsoft's RDP-lameness. When using remote desktop, the graphics adapter is disabled and replaced by the one used for RDP. So thanks MS, I can't use CUDA programs using RDP! now that's a major bummer ... how about using eg. TightVNC for your remote activities ? http://www.tightvnc.com/ Cheers, Maggi |
|
|
|
Apr 24 2010, 09:44
Post
#155
|
|
|
Group: Members Posts: 1 Joined: 24-April 10 Member No.: 80115 |
I'm getting "Error : Exception of type 'GASS.CUDA.CUDAException' was thrown."
CODE CUETools.FlaCuda.exe -11 Priceless.wav FlaCuda#.91, Copyright © 2009 Gregory S. Chudov. This is free software under the GNU GPLv3+ license; There is NO WARRANTY, to the extent permitted by law. <http://www.gnu.org/licenses/> for details. Filename : Priceless.wav File Info : 44100kHz; 2 channel; 16 bit; 00:04:07.6270000 Error : Exception of type 'GASS.CUDA.CUDAException' was thrown. I ran the deviceQuery.rar and got this CODE CUDA Device Query (Driver API) statically linked version There is 1 device supporting CUDA Device 0: "GeForce GTX 480" CUDA Driver Version: 3.0 CUDA Capability Major revision number: 2 CUDA Capability Minor revision number: 0 Total amount of global memory: 1576468480 bytes Number of multiprocessors: 15 Number of cores: 120 Total amount of constant memory: 65536 bytes Total amount of shared memory per block: 49152 bytes Total number of registers available per block: 32768 Warp size: 32 Maximum number of threads per block: 1024 Maximum sizes of each dimension of a block: 1024 x 1024 x 64 Maximum sizes of each dimension of a grid: 65535 x 65535 x 1 Maximum memory pitch: 2147483647 bytes Texture alignment: 512 bytes Clock rate: 0.81 GHz Concurrent copy and execution: Yes Run time limit on kernels: No Integrated: No Support host page-locked memory mapping: Yes Compute mode: Default (multiple host threads can use this device simultaneously) Test PASSED OS: Windows 7 x64 GPU: GeForce GTX 480 Graphics Driver: 197.55 (8.17.11.9755) |
|
|
|
Apr 24 2010, 09:54
Post
#156
|
|
![]() Group: Developer Posts: 648 Joined: 2-October 08 From: Ottawa Member No.: 59035 |
Wow. Congrats on getting a GTX 480
I think i'll have to wait for the release of GTX 460, because GTX 480/470 are a bit over my budget. -------------------- CUETools 2.1.4
|
|
|
|
Apr 24 2010, 20:11
Post
#157
|
|
|
Group: Members Posts: 1 Joined: 23-December 09 Member No.: 76270 |
Hi. Did some tests on Bel Canto's CD comparing Flac 1.21 to ure newest version .91.
HW: Intel Core 2 Quad Q9550, 8GB RAM, Nvidia GTX260 Driver: 197.45 @ Win7x64, Intel X-25 SSD BelCanto.wav File Info : 44100kHz; 2 channel; 16 bit; 00:47:44.2000000 Results Results: FLAC 1.21 Mode -3 : Belcanto.wav: wrote 293901039 bytes, ratio=0,582 ,15,2Sec Mode -6 : Belcanto.wav: wrote 284872007 bytes, ratio=0,564 ,20,4Sec Mode -8 : Belcanto.wav: wrote 283904326 bytes, ratio=0,562 ,72.4Sec FlaCuda#.91, Mode -3 : 495,34x; 284585708 bytes in 00:00:05.7823308 seconds; Mode -6 : 504,41x; 283252159 bytes in 00:00:05.6783248 seconds; Mode -8 : 418,60x; 283217473 bytes in 00:00:06.8423914 seconds; CPU Options: c:\CDRIPS\cuda>CUETools.FlaCuda.exe -8 --cpu-threads 2 ..\BelCanto.wav Results : 433,81x; 283217473 bytes in 00:00:06.6023776 seconds; c:\CDRIPS\cuda>CUETools.FlaCuda.exe -8 --cpu-threads 3 ..\BelCanto.wav Results : 392,28x; 283217473 bytes in 00:00:07.3014177 seconds; c:\CDRIPS\cuda>CUETools.FlaCuda.exe -8 --cpu-threads 4 ..\BelCanto.wav Results : 406,71x; 283217473 bytes in 00:00:07.0424028 seconds; Every other time i get a : Error : Exception of type 'GASS.CUDA.CUDAException' was thrown. Unhandled Exception: ErrorLaunchTimeout Description: Stopped working Problem signature: Problem Event Name: CLR20r3 Problem Signature 01: cuetools.flacuda.exe Problem Signature 02: 1.0.0.0 Problem Signature 03: 4b49fea7 Problem Signature 04: CUDA.NET Problem Signature 05: 2.3.7.0 Problem Signature 06: 4ae56b31 Problem Signature 07: 345 Problem Signature 08: 22 Problem Signature 09: GASS.CUDA.CUDAException OS Version: 6.1.7600.2.0.0.256.1 Locale ID: 1044 Besides the crash, i must say, IMPRESSIVE This post has been edited by dragmore: Apr 24 2010, 20:13 |
|
|
|
Apr 25 2010, 19:00
Post
#158
|
|
|
Group: Members Posts: 170 Joined: 23-August 06 Member No.: 34375 |
Wow, FLAC -8 works at ~50x on my laptop, FlaCuda -11 does ~150x, very impressive.
Is FlaCuda with the "--verify" switch considered to be safe for archive use? I understand that software can never be guaranteed to be error free and I don't ask for it, I just wonder if you consider your code (with the verify option) robust enough to be an alternative to the official FLAC. As far as I understand you use the parallel processors on the GPU find the best "next step" (don't know how it's called in FLAC terminology) and then execute it on the CPU. Is this approach limited to FLAC or can similar computations of other audio/video formats use it? |
|
|
|
Apr 26 2010, 11:24
Post
#159
|
|
![]() Group: Developer Posts: 648 Joined: 2-October 08 From: Ottawa Member No.: 59035 |
Unhandled Exception: ErrorLaunchTimeout Exactly how often does it happen? Is there any pattern to this? Does it look like a screenshot in this article: http://www.microsoft.com/whdc/device/displ...dm_timeout.mspx ? Anybody else having those problems? Is FlaCuda with the "--verify" switch considered to be safe for archive use? Yes. --verify guarantees that produced file can be decoded, at least with CUETools.Flake decoder, and it's audio contents is identical to the source. In theory, it cannot give a 100% guarantee that produced file can be decoded with reference FLAC decoder, because --verify uses other decoder, but so far nobody reported any such problems. As far as I understand you use the parallel processors on the GPU find the best "next step" (don't know how it's called in FLAC terminology) and then execute it on the CPU. More or less. Recent versions can do almost everything on GPU, and latest version does this by default (can be disabled with --slow-gpu option). CPU only does some sanity checks, formats the resulting data as a FLAC bitstream and writes it to file. Is this approach limited to FLAC or can similar computations of other audio/video formats use it? Effective parallel processing is possible only if format is suitable for it. For example, ALAC uses adaptive compression, which makes it very inconvenient for parallel processing. Maybe FLAC isn't the only codec which can benefit from GPU encoding, but for most codecs the task will be much harder and the speed won't be that impressive. Most of the GPU code in FlaCuda is very specific for FLAC. As for video, there are several GPU encoders for x264 video codec, most if not all of them are proprietary. -------------------- CUETools 2.1.4
|
|
|
|
May 4 2010, 00:23
Post
#160
|
|
|
Group: Members Posts: 395 Joined: 17-September 02 From: Hell Member No.: 3380 |
Getting a crash when I try to convert individual wave files using FlaCuda 091. The error codes are nearly the same as mentioned in an earlier post. Interestingly, FlaCuda does not crash if converting a wavpack image file with embedded cue to flac image with embedded cue.
Windows error report below. Problem signature: Problem Event Name: CLR20r3 Problem Signature 01: cuetools.flacuda.exe Problem Signature 02: 1.0.0.0 Problem Signature 03: 4b49fea7 Problem Signature 04: mscorlib Problem Signature 05: 2.0.0.0 Problem Signature 06: 4a27471d Problem Signature 07: 349e Problem Signature 08: 1c5 Problem Signature 09: System.IO.IOException OS Version: 6.1.7600.2.0.0.256.48 Locale ID: 1033 This was using foobar2000. I also grabbed that error code: Conversion failed: The encoder has terminated prematurely with code -532459699 (0xE0434F4D); please re-check parameters Commandline parameters are set to: -8 - -o %d -------------------- Looking for a digital idiot? Look no further.
|
|
|
|
May 29 2010, 07:26
Post
#161
|
|
|
Group: Members Posts: 12 Joined: 29-May 10 Member No.: 80969 |
Hell FlaCuda091 is ultra fast. Using a NVidia GT8800 with foobar v1.0.3 and "-8 - -o %d --verify" parameters. Only thing is that my HDD is limiting the encoding speed. A 46min Wav file took up less than 10sec. Detailed results coming up soon.
Thanks for that encoder. I hope and wish that the FlaCuda will be compatibel with all other software player, devices and decoder. This post has been edited by modernartistry: May 29 2010, 07:28 |
|
|
|
May 29 2010, 19:54
Post
#162
|
|
![]() Group: Members Posts: 90 Joined: 22-August 07 Member No.: 46407 |
nVidia 8800 GT with Intel Q6600 Quad core.
Foobar 1.03 transcoding FLAC to FLAC (Pink Floyd Final Cut [13 tracks]) CODE *** FLAC 1.2.1 @ 4 threads *** level 8 Total encoding time: 0:22.277, 124.90x realtime *** FlaCuda 0.91 *** -8 - -o %d --verify Total encoding time: 1:01.184, 45.47x realtime -8 --cpu-threads 2 - -o %d --verify Total encoding time: 0:50.529, 55.06x realtime -8 --cpu-threads 3 - -o %d --verify Total encoding time: 0:42.807, 65.00x realtime -8 --cpu-threads 3 - -o %d Total encoding time: 0:42.011, 66.23x realtime -8 --cpu-threads 4 - -o %d --verify Total encoding time: 0:42.027, 66.20x realtime -8 --cpu-threads 4 - -o %d Total encoding time: 0:41.356, 67.28x realtime -8 --slow-gpu --cpu-threads 4 - -o %d --verify Total encoding time: 0:37.939, 73.34x realtime CPU usage with FlaCuda never peaks above 25% per core. Seems for a practical scenario with a quad core CPU it doesn't compete. |
|
|
|
Jun 4 2010, 06:58
Post
#163
|
|
![]() Group: Developer Posts: 191 Joined: 8-July 03 Member No.: 7653 |
I'm amused by flacuda's speed.... I can't think of too much use for 800x realtime flac encoding, but I thought I throw out something that I'm too lazy to implement that flacuda's speed would make almost reasonable:
_Optimal_ block size selection. Flac lets you change the frame size on the fly. Truly optimal selection across all supported sizes would be a bit insane, but globally optimal selection on a subset of sizes is not too terrible. Lets consider all powers of two from 64 to 32768, there are ten sizes. At every 64 sample offset through the file, encode all ten sizes, and store the resulting sizes. Making the hand-wavy assumption that the computation per sample is constant this will be 1023x slower than normal. Take the sizes and construct a directed graph with a vertex at every 64th sample and 10 edges leaving the sample connecting it to the vertex for the sample 64,128,256,etc. away. Assign the coding cost for the block at each of the sizes to each of the edges. Now run the Dijkstra shortest path algorithm from the first to last or last to first vertex. The result will be the globally optimal frame size selection given the available block sizes. Either re-encode or, if you wasted a lot of ram saving the results of the first past, reassemble the final stream. Limiting yourself to powers of two in the flac subset over the range 64-4096 would be 127x the number of processed samples processed, 32-4096 would be 255x. The cuda implementation might be able to maintain almost decent speeds while doing this extra work. ;) This isn't limited to power of two sizes, but you probably want to arrange it so that your smallest size is a common factor of all the sizes you use. This post has been edited by NullC: Jun 4 2010, 07:12 |
|
|
|
Jul 29 2010, 19:36
Post
#164
|
|
|
Group: Members Posts: 1 Joined: 29-July 10 Member No.: 82629 |
Is there anybody here who knows the math behind Cholesky decomposition used in ffmpeg as an alternative method of LPC coefficients search? This method is too slow for CPU, but i thought i'd give it a shot on GPU. The problem is, GPU doesn't do double precision very well. Gregory, maybe you can find background info here : http://www.cise.ufl.edu/research/sparse/ch...OLMOD/Cholesky/ |
|
|
|
Jul 30 2010, 02:24
Post
#165
|
|
|
Group: Members Posts: 18 Joined: 24-December 02 Member No.: 4222 |
Man, that's some fast encoding! Nice work!
Is there any chance that tag writing will be added to the binary, so that it can be used with EAC? |
|
|
|
Jul 30 2010, 13:12
Post
#166
|
|
![]() Group: Members Posts: 840 Joined: 7-October 01 Member No.: 235 |
Man, that's some fast encoding! Nice work! Is there any chance that tag writing will be added to the binary, so that it can be used with EAC? You already can with metaflac. I gave an example here flacuda.exe & metaflac.exe in EAC |
|
|
|
Jul 30 2010, 17:31
Post
#167
|
|
|
Group: Members Posts: 118 Joined: 9-July 10 Member No.: 82156 |
Has anyone tested this when using it multiple times in parallel? The ability to single threaded encoding on several files at once is pretty amazing, wondered if this wouldn't use up all the GPU and could also be run several in parallel. I don't run into a hard drive bottleneck as easily as most as I use a raid 0 configuration of high end desktop hard drives.
|
|
|
|
Jul 30 2010, 19:58
Post
#168
|
|
![]() Group: Members Posts: 128 Joined: 9-August 06 Member No.: 33830 |
My 8600GT got used fully by one instance of the CUDA encoder, more threads gave no advantage. On the top of that it can be I/O limited very quickly (I tried it with the source being on different HDD than the target).
|
|
|
|
Jul 30 2010, 20:46
Post
#169
|
|
|
Group: Members Posts: 18 Joined: 24-December 02 Member No.: 4222 |
Man, that's some fast encoding! Nice work! Is there any chance that tag writing will be added to the binary, so that it can be used with EAC? You already can with metaflac. I gave an example here flacuda.exe & metaflac.exe in EAC Yep, you did. |
|
|
|
Aug 3 2010, 01:39
Post
#170
|
|
|
Group: Members Posts: 12 Joined: 29-May 10 Member No.: 80969 |
Is there any further developement on FlaCuda? Current version is 0.91?
Another test: FlaCuda 091 + Foobar 1.03 Music: Edenbridge - Solitair / Symphonic Metal Album in a wav file, timelenght 57:25min Hardware: Intel Dual Core E8400 / Nvidia GT8800 (with newer 92b core) FLAC 1.2.1 level 8 (2 threads) Total encoding time: 1:13.711, 46.74x realtime FlaCuda -8 - -o %d --verify Total encoding time: 0:31.216, 110.37x realtime FlaCuda -8 --cpu-threads 2 - -o %d --verify Total encoding time: 0:24.492, 140.67x realtime FlaCuda is a good choice for Dual Core system. As Bad Monkey above wrote a quad core may be faster than gpu. This post has been edited by modernartistry: Aug 3 2010, 02:17 |
|
|
|
Aug 3 2010, 07:11
Post
#171
|
|
![]() Group: Members Posts: 90 Joined: 22-August 07 Member No.: 46407 |
I have an 8800GT too but I was unable to get much above 70x, per my post above, with FlaCuda. Your results would beat my Q6600's benchmark of 125x. Am I missing something?
|
|
|
|
Aug 3 2010, 20:10
Post
#172
|
|
|
Group: Members Posts: 12 Joined: 29-May 10 Member No.: 80969 |
I have an 8800GT too but I was unable to get much above 70x, per my post above, with FlaCuda. Your results would beat my Q6600's benchmark of 125x. Am I missing something? Hm. As i wrote i have the newer version of 8800GT that came out in february 2008 with 512MB RAM instead of 378MB. This GPU core (G92) was the fastest out there. followed by the G200b core a year later which is nearly the same. Might be that you have an older modell or drivers? I used the same setting as you. Maybe your harddrive is too slow? My GPU card spec: 512 MByte GDDR3 65 nm Stream-Processors: 112 RAM bandwith: 256-bit Core-frequenz: 600 MHz Shader-frequenz: 1500 MHz RAM-frequenze: 900 MHz This post has been edited by modernartistry: Aug 3 2010, 20:21 |
|
|
|
Aug 4 2010, 07:10
Post
#173
|
|
![]() Group: Members Posts: 90 Joined: 22-August 07 Member No.: 46407 |
Yeah I have 512 MB but core clock is only 450 MHz / VRAM 700 MHz. Okay.
Am going to upgrade to a GTX 460 sometime soon. So that'll be interesting. Haha. |
|
|
|
Aug 4 2010, 07:23
Post
#174
|
|
![]() Group: Members Posts: 128 Joined: 9-August 06 Member No.: 33830 |
There must be some other limit, because this 70x matches my results with a 8600GT and an early version of flaCUDA. Any 8800GT should be much faster than it.
HDD speed, perhaps? Those are mechanical and thus seriously limited when they have to read/write more threads at once (have to move their heads back and forth). Whenever I tested any encoder I used a different HDD for destination and did not use more than 2 threads, ever (it wouldn't even benefit my core2duo, to begin with I'm planning on getting an SSD in a few months (for system and some temp area) so I'll test 2-thread encoding again. edit. I forgot that I'm planning on replacing my vcard to a Redeon too. Well, so much for CUDA... This post has been edited by alvaro84: Aug 4 2010, 07:27 |
|
|
|
Aug 4 2010, 07:58
Post
#175
|
|
![]() Group: Members Posts: 90 Joined: 22-August 07 Member No.: 46407 |
If there is another limit clearly it would have to be something not shared with the CPU [turning in faster results @ 125x], which is obviously not the case with a HDD restriction. In any case the FLAC result above is only 260 MB.
|
|
|
|
![]() ![]() |
|
Lo-Fi Version | Time is now: 19th May 2013 - 23:24 |