FLACCL: CUDA-enabled FLAC encoder by Gregory S. Chudov (prev. FlaCuda), Formerly "lossless codecs and CUDA" |
![]() ![]() |
FLACCL: CUDA-enabled FLAC encoder by Gregory S. Chudov (prev. FlaCuda), Formerly "lossless codecs and CUDA" |
Jul 13 2008, 05:34
Post
#1
|
|
|
Group: Members Posts: 176 Joined: 20-January 03 From: A Tropical Isle Member No.: 4640 |
With my recent purchase of a 9000 series nVidia graphics card, I started thinking, has anyone investigated if nVidia's CUDA could be useful for lossless compression? I'm not even remotely close to being a programmer, so I haven't a clue how the code works, but it seems like CUDA is valuable for coding/decoding. I know nVidia is already holding a contest to speed up LAME (which ends in about 2 weeks), so perhaps it could be used to speed up lossless compressors? The fastest modes of several codecs are already blazing fast, approaching the limits of hard drives, but perhaps the high-compression modes could be sped-up through CUDA. Maybe, if the speed-up is enough, developers could even implement more ways to gain compression while still maintaining good encoding rates. It would be pretty cool if compression levels like La's best could be done at 50x or something.
Anyway, my curiosity is large, so just thought I'd ask. :) |
|
|
|
Jul 13 2008, 09:29
Post
#2
|
|
![]() Group: Members Posts: 452 Joined: 31-May 04 From: Czech Rep. Member No.: 14430 |
I apologize for being completely incorrect.
This post has been edited by Martel: Jul 13 2008, 10:53 -------------------- HD 238 Sansa Clip+ Vorbis q6; HD 380 Xonar DX FB2k FLAC
|
|
|
|
Jul 13 2008, 10:00
Post
#3
|
|
![]() Server Admin Group: Admin Posts: 4808 Joined: 24-September 01 Member No.: 13 |
If I'm not mistaken, lossless coding usually employs dictionary methods (like LZW/LZMA) which generate a lot of random access and branching operations. Not at all! Most lossless audio compressors use large predictive LPC filters. This would be an operation that is well fit to a GPU, if it weren't for a small detail: because of the need to be LOSSLESS, the operations are often integer, not floating point. It would be possible to do it in floating point also, but then there is a need to have PRECISELY defined operations, rounding, precision. Exactly what GPU's dont have. Despite all the hype, there aren't that many things GPUs are actually good at. |
|
|
|
Jul 14 2008, 03:52
Post
#4
|
|
|
Group: Members Posts: 176 Joined: 20-January 03 From: A Tropical Isle Member No.: 4640 |
Ah, I see now. Thanks very much for the response, Garf.
|
|
|
|
Sep 10 2009, 03:27
Post
#5
|
|
![]() Group: Developer Posts: 648 Joined: 2-October 08 From: Ottawa Member No.: 59035 |
Here is good news.
An alfa version of flac encoder for GPU. I only tested it on GTS 250, so i'm eager to hear from people with other cards. As all my applications, this requires .NET framework. And this time of course a CUDA-enabled graphics card. Source code as usual on SourceForge. UPD1: A bit more optimized version re-tuned to not so paranoid compression levels. UPD2: added pipe encoding for use with fb2k (encoder parameters: -5 - -o %d) UPD3: seeking problem with pipe encoding in fb2k fixed, lower compression levels speed up. UPD4: general speed improvement UPD5: wasted_bits/lossyWav support UPD6: final optimizations UPD7: rice partitioning on GPU (--gpu-only), multi-core CPU utilization support (--cpu-threads #) UPD8: default compression level changed to -7, rice partitioning on GPU on by default, memory/IO optimizations UPD9: bugfix release; UPD91 - fb2k pipe input fix * Download:
FlaCuda091.rar ( 97.7K )
Number of downloads: 1372* Old version:
FlaCuda06.rar ( 84.9K )
Number of downloads: 817This post has been edited by Gregory S. Chudov: Jan 10 2010, 17:30 -------------------- CUETools 2.1.4
|
|
|
|
Sep 10 2009, 05:58
Post
#6
|
|
|
Group: Members Posts: 71 Joined: 8-July 08 Member No.: 55505 |
Sounds awesome, care to elaborate on the performance for those of us without a CUDA capable card.
|
|
|
|
Sep 10 2009, 06:05
Post
#7
|
|
![]() Group: Developer Posts: 648 Joined: 2-October 08 From: Ottawa Member No.: 59035 |
Less impressive than i hoped to, but this is only initial version, and GPUs grow faster each day.
On my GTS 250 it's approximately as fast as my C# encoder (which is fast by the way). FlaCuda -4 achieves the same compression ratio as reference flac -8 (version 1.2.1 on Core 2 Duo@3Gz) at approximately double-triple speed. FlaCuda -8 is as slow as flac -8, but gives an extra 0.5% of compression ratio. Would be nice if someone could thoroughly compare them on a different hardware and post his/her results here. This post has been edited by Gregory S. Chudov: Sep 10 2009, 06:17 -------------------- CUETools 2.1.4
|
|
|
|
Sep 10 2009, 09:24
Post
#8
|
|
![]() Group: Members Posts: 51 Joined: 30-May 09 From: Germany Member No.: 70242 |
No love for ati? *sniff*
|
|
|
|
Sep 10 2009, 09:49
Post
#9
|
|
![]() Group: Developer Posts: 648 Joined: 2-October 08 From: Ottawa Member No.: 59035 |
There is love, but there's no implementation ^^
But i guess someone else can do it, now that we have a proof-of-concept -------------------- CUETools 2.1.4
|
|
|
|
Sep 10 2009, 16:30
Post
#10
|
|
|
Group: Developer (Donating) Posts: 2040 Joined: 19-October 01 From: Finland Member No.: 322 |
I ran some tests with my Core i7 940 (stock speed) and GeForce GTX 285. Original wav file was 237368588 bytes in size. Not too impressive results:
FLAC -5 : Elapsed Time : 00:00:08.268 (181929373 bytes) FLAC -8 : Elapsed Time : 00:00:30.560 (181788832 bytes) FlaCuda -4 : Elapsed Time : 00:00:09.204 (181892106 bytes) FlaCuda -5 : Elapsed Time : 00:00:10.904 (181763725 bytes) FlaCuda -8 : Elapsed Time : 00:00:12.370 (181676614 bytes) FlaCuda -11: Elapsed Time : 00:00:23.883 (181734405 bytes) |
|
|
|
Sep 10 2009, 20:34
Post
#11
|
|
![]() Group: Developer Posts: 648 Joined: 2-October 08 From: Ottawa Member No.: 59035 |
Thank you!
-------------------- CUETools 2.1.4
|
|
|
|
Sep 10 2009, 20:44
Post
#12
|
|
|
Group: Members Posts: 410 Joined: 9-August 07 From: Los Angeles Member No.: 46048 |
I'm anxious to see how this would perform on the next generation of NVIDIA hardware (GT300), which is supposedly significantly faster in general computational performance than the previous architecture (G200).
Very exciting -- thank you! |
|
|
|
Sep 10 2009, 21:20
Post
#13
|
|
![]() Group: Members Posts: 120 Joined: 13-September 08 From: Louisville, KY Member No.: 58234 |
No love for ati? *sniff* maybe the "evergreen" release here in a bit will improve things (i hope) as far as this goes, i would be interested in lossy gpu encoding, and that might work a bit better regarding the inaccurate floating point calculations ati stream support This post has been edited by thundat00th: Sep 10 2009, 21:21 -------------------- My $.02, may not be in the right currency
|
|
|
|
Sep 10 2009, 22:42
Post
#14
|
|
![]() Group: Members Posts: 648 Joined: 10-January 06 From: Zagreb Member No.: 27018 |
Here are my test results:
Klaus Shultze - Dreams Deluxe Edition, size 797 MB Core2Duo 8200, Geforce 9600GT with passive cooling Encoding with FLAC 1.2.1 in command line, -6, version from Sourceforge, 38 seconds And this... PS D:\temp_2> .\CUETools.FlaCuda.exe -6 '.\Klaus Schulze - Dreams Deluxe Edition.wav' CUETools.FlaCuda, Copyright Đ 2009 Gregory S. Chudov. This is free software under the GNU GPLv3+ license; There is NO WARRANTY, to the extent permitted by law. <http://www.gnu.org/licenses/> for details. Filename : .\Klaus Schulze - Dreams Deluxe Edition.wav File Info : 44100kHz; 2 channel; 16 bit; 01:19:00.8800000 Results : 61,11x; 499280528 bytes in 00:01:17.5764372 seconds; Windows 7 32 bit. Well... not that impressive (edit) wrote 10 seconds too much for flac encode... This post has been edited by hlloyge: Sep 10 2009, 22:43 |
|
|
|
Sep 10 2009, 23:33
Post
#15
|
|
![]() Group: Developer Posts: 648 Joined: 2-October 08 From: Ottawa Member No.: 59035 |
What was the file size for flac -6? We should compare the speed at the same compression ratio, e.g. output file size, not at the same compression level, because e.g. -6 for flac is much lower compression than -6 for flacuda. Please, try to compare flacuda -5 vs flac -8, and compare both execution times and file sizes.
Here's a graph i made of Case's results: ![]() This shows x3 speedup of flac -8 compression. This post has been edited by Gregory S. Chudov: Sep 10 2009, 23:40 -------------------- CUETools 2.1.4
|
|
|
|
Sep 11 2009, 00:33
Post
#16
|
|
![]() Group: Members Posts: 840 Joined: 7-October 01 Member No.: 235 |
Not to shabby. Tried it on a C2D@3600+GTX260
Dream Theater, Awake Original 793.976.444 Bytes Flac 1.21 -8 568.604.561 Bytes ~94 sec. encoding time Flaccuda -8 567.956.198 Bytes ~53 sec. I donīt have a recent Flake version at hand so i donīt know how much comes from Cuda alone. Edit: Flaccuda -6 568.280.716 Bytes ~48 sec. This post has been edited by Wombat: Sep 11 2009, 00:50 |
|
|
|
Sep 11 2009, 02:21
Post
#17
|
|
![]() Group: Members Posts: 217 Joined: 11-May 03 From: China Member No.: 6546 |
This is on a 9500 GT
FlaCuda Filename : Clocks.wav File Info : 44100kHz; 2 channel; 16 bit; 00:05:07.4670000 Results : 43.10x; 35657424 bytes in 00:00:07.1331000 seconds; Flac 1.2.1 Clocks.wav: wrote 35796074 bytes, ratio=0.660 2.91 seconds Both were just run as <executable> Clocks.wav |
|
|
|
Sep 11 2009, 03:24
Post
#18
|
|
![]() Group: Developer Posts: 648 Joined: 2-October 08 From: Ottawa Member No.: 59035 |
Flac 1.2.1 Clocks.wav: wrote 35796074 bytes, ratio=0.660 2.91 seconds That's a bit too small file for comparison. And it's better to compare against flac -8. Default flac compression level is very fast, i don't think it can be beaten by FlaCuda, at least yet. FlaCuda is focusing on higher compression. -------------------- CUETools 2.1.4
|
|
|
|
Sep 11 2009, 08:28
Post
#19
|
|
|
Group: Members Posts: 14 Joined: 19-November 08 Member No.: 62733 |
GPU audio encoding will be useful when OpenCL get adopted by both ATI and Nvidia for now is just "proof of concept"
|
|
|
|
Sep 11 2009, 19:52
Post
#20
|
|
![]() Group: Members Posts: 648 Joined: 10-January 06 From: Zagreb Member No.: 27018 |
Here I am again, this time, more detailed:
Flac 1.2.1 vs Cuda 01 File: album.wav 643566044 Windows 7, C2Q9400 @ 2.66 GHz, Geforce 9500 GS flac -8: wrote 405957413 bytes, ratio=0,631 in 99 seconds cuda -8: 34,98x; 405731414 bytes in 00:01:44.2910429 seconds; Is there multicore flac encoder? |
|
|
|
Sep 12 2009, 02:45
Post
#21
|
|
|
Group: Developer Posts: 165 Joined: 3-June 06 From: Raleigh, NC Member No.: 31393 |
Is there multicore flac encoder? http://softlab-pro-web.technion.ac.il/Proj.../downloads.html I haven't tested this personally or done anything about trying to adapt the code for inclusion in Flake. |
|
|
|
Sep 12 2009, 06:10
Post
#22
|
|
|
Group: Members Posts: 176 Joined: 20-January 03 From: A Tropical Isle Member No.: 4640 |
Hey, wow. This topic of mine was bumped, and with proof of concept software to boot. Thank you, Gregory!
Here are my results to add to the data (I used flac 1.2.1 -8 and Flacuda01 -8 as suggested): CPU: Athlon X2 @ 2.35 GHz GPU: 9600 GSO @ 600 MHz File 1: 656647868 bytes Flac: 466183490 in 148 seconds cuda: 465898530 in 65 seconds File 2: 654389948 bytes Flac: 362792762 in 145 seconds cuda: 360670158 in 63 seconds More than 2x faster and better compression too. That's pretty impressive. |
|
|
|
Sep 12 2009, 09:40
Post
#23
|
|
|
Group: Members Posts: 498 Joined: 2-October 01 Member No.: 168 |
Well, I believe that even a small gain is always welcome.
I'm not a developer, so I dunno if possible, but: what about a liboil-like library but for GPGPU encodings, so *any* codec could benefit from GPU computations ? |
|
|
|
Sep 12 2009, 10:32
Post
#24
|
|
![]() Group: Members Posts: 648 Joined: 10-January 06 From: Zagreb Member No.: 27018 |
Again: C2D8200, Geforce 9600GT
album.wav to flac -8 original: 578046380 flac: 344489508 in 80 seconds cuda: 344226134 bytes in 00:00:52.8150209 seconds Nice. |
|
|
|
Sep 12 2009, 11:33
Post
#25
|
|
![]() Group: Developer Posts: 648 Joined: 2-October 08 From: Ottawa Member No.: 59035 |
I'm not a developer, so I dunno if possible, but: what about a liboil-like library but for GPGPU encodings, so *any* codec could benefit from GPU computations ? Not sure. The code i wrote is quite codec specific. The catch is in a relatively slow connection between CPU and GPU. I had to implement practically the whole FLAC algorithm on the device, so that i won't have to transfer intermediate values between host and GPU, only the final result. FLAC turned out to be very convenient for GPU. Probably the most convenient. One look at e.g. ALAC algorithm was enough to understand it can never get the same benefit. original: 578046380 flac: 344489508 in 80 seconds cuda: 344226134 bytes in 00:00:52.8150209 seconds Nice. Thank you. And how about FlaCuda -5? It should provide enough compression to beat flac -8. This post has been edited by Gregory S. Chudov: Sep 12 2009, 11:34 -------------------- CUETools 2.1.4
|
|
|
|
![]() ![]() |
|
Lo-Fi Version | Time is now: 21st May 2013 - 04:13 |