IPB

Welcome Guest ( Log In | Register )

17 Pages V  « < 3 4 5 6 7 > »   
Reply to this topicStart new topic
FLACCL: CUDA-enabled FLAC encoder by Gregory S. Chudov (prev. FlaCuda), Formerly "lossless codecs and CUDA"
RendoR
post Oct 20 2009, 03:25
Post #101





Group: Members
Posts: 17
Joined: 16-October 09
Member No.: 74074



I was reading you could take 2-3 of those Nvidia Tesla cards and build a super computer that would rival a million dollar rack system. You would use a server type Mobo with like 64 gigabytes of RAM, and 2 intel Zeon processors. The throughput is just ridiclous, look at these specs! I realize it's a 5k PC we're building but damn! That's what I call getting it done!

This post has been edited by RendoR: Oct 20 2009, 03:30
Go to the top of the page
+Quote Post
saratoga
post Oct 20 2009, 04:18
Post #102





Group: Members
Posts: 4723
Joined: 2-September 02
Member No.: 3264



QUOTE (RendoR @ Oct 19 2009, 22:25) *
I was reading you could take 2-3 of those Nvidia Tesla cards and build a super computer that would rival a million dollar rack system. You would use a server type Mobo with like 64 gigabytes of RAM, and 2 intel Zeon processors. The throughput is just ridiclous, look at these specs! I realize it's a 5k PC we're building but damn! That's what I call getting it done!


If you just want to do audio, the GTX from Newegg is ~$220 and a little faster then that model. I've got one in my PC they're pretty nice.
Go to the top of the page
+Quote Post
Chinch
post Dec 2 2009, 08:35
Post #103





Group: Members
Posts: 90
Joined: 22-July 09
Member No.: 71664



Just a benchmarking result for you, according to foobar2000:

Input file length: 52:53.493 (139951056 samples), PCM WAV 44.1khz 16-bit

Original FLAC encoder (compression -8) = 1 minute, 07 seconds / Output file: 357MB (374 398 783 bytes) @ 944 kbps / Length: 52:53.493 (139951056 samples)
flaCUDA 0.6 (compression -11) = 58 seconds / Output file: 355MB (373 035 151 bytes) @ 940 kbps / Length: 52:53.493 (139951056 samples)
flaCUDA 0.6 (compression -8) = 16 seconds / Output file: 356MB (373 983 388 bytes) @ 943 kbps / Length: 52:53.493 (139951056 samples)

All MD5's match up.

Video card is an nVidia GeForce 9600 GSO. Hope that helps you out.

Note: The ENCODER field still says "FlaCuda#0.5", even though I downloaded FlaCuda06.rar from the website... so either you forgot to change the version number, or 0.5 is really in the 0.6 rar file...


This post has been edited by Chinch: Dec 2 2009, 08:38
Go to the top of the page
+Quote Post
MachineHead
post Dec 9 2009, 07:05
Post #104





Group: Members
Posts: 403
Joined: 17-September 02
From: Hell
Member No.: 3380



Interesting tool here. Currently my drives seem to be holding back the true potential of what speeds this will do for me.

A little tidbit from console: Total encoding time: 0:09.236, 394.12x realtime

Files uncompressed: 612MB (642 120 224 bytes)

Files compressed: 358MB (375 779 070 bytes)

Parameters: -8 - -o %d


--------------------
Looking for a digital idiot? Look no further.
Go to the top of the page
+Quote Post
Unkosibomvu
post Dec 17 2009, 09:21
Post #105





Group: Members
Posts: 6
Joined: 31-May 09
Member No.: 70271



Apologies if this is not the right thread, but I couldn't find any that looked like a better place. I would like to request support for multichannel (at least up to 8) in flacuda. It doesn't seem to be there now - I tried feeding 0.6 a 7.1 multichannel flac for re-compression and it threw an exception saying that the input was an invalid flac file.

Multichannel support is important to me because I rip all of my blurays and re-encode the lossless audio tracks (truehd/mlp, dts master audio and raw lpcm) to flac. The flac encoding accounts for at least half of the wall-clock time of the rip (typically 15+ minutes just for the audio re-encoding). Occasionally I'll rip a DVD-A with 5.1 audio to flac for easy playback in foobar too and that can also take roughly the same amount of time as a bluray.

I haven't looked at the source for flacuda, but I was wondering if doing multichannel might be able to get a super-linear speed-up due to the parallel nature of gpus - similar to the way that Kaspersky is doing massively parallel pattern matching on the gpu to speed up their anti-virus scanning.
Go to the top of the page
+Quote Post
Wombat
post Dec 19 2009, 00:17
Post #106





Group: Members
Posts: 951
Joined: 7-October 01
Member No.: 235



Since i read more then once some people donīt trust in GPU encoding cause of errors that can creep in i tried to force Flacuda to do so.
My 24/7 overclock on my 260 is 648/1100 at lowered voltage. Here i never got an error since i use Flacuda.
So i tried 684/1161, this is were gaming may hang or crashes to desktop after a while at this voltage. I encoded ~20 albums without a problem.
Now i tried a surely unstable overclock of 725/1242 but only tried 5 albums in a row cause i donīt want to fry anything. All 5 albums encoded without a problem.

I have the imagination using Cuda isnīt really stressing my card in any way.

Now i have 2 questions. How secure is the verify implementation? Are these fears of highered chances for errors in the data justified in any way for Flacuda?
Go to the top of the page
+Quote Post
Gregory S. Chudo...
post Dec 20 2009, 09:49
Post #107





Group: Developer
Posts: 683
Joined: 2-October 08
From: Ottawa
Member No.: 59035



QUOTE (Unkosibomvu @ Dec 17 2009, 11:21) *
Multichannel support is important to me because I rip all of my blurays and re-encode the lossless audio tracks

I'm afraid those multichannel tracks also have higher bit depth, most probably 24 bits per sample instead of CD's 16 bits per sample.
That might be a problem, because those require 64-bit integer arithmetic at some point, which current GPU's aren't very good at.
I'm not saying it can't be done, but the encoder might be very slow.


--------------------
CUETools 2.1.4
Go to the top of the page
+Quote Post
Gregory S. Chudo...
post Dec 20 2009, 09:57
Post #108





Group: Developer
Posts: 683
Joined: 2-October 08
From: Ottawa
Member No.: 59035



QUOTE (Wombat @ Dec 19 2009, 02:17) *
Now i have 2 questions. How secure is the verify implementation? Are these fears of highered chances for errors in the data justified in any way for Flacuda?

Verify is secure. It decodes each frame (on CPU) and compares each sample with original, so we can be sure that the result can be decoded into original at least using this decoder.

The fears of GPU errors aren't really justified in general, and are completely unjustified for Flacuda. It uses GPU to calculate best encoding options for each frame, but generates the output on CPU, so even if GPU would produce an error, this would only result in slightly lower compression.


--------------------
CUETools 2.1.4
Go to the top of the page
+Quote Post
ChronoSphere
post Dec 20 2009, 13:13
Post #109





Group: Members
Posts: 399
Joined: 11-March 07
Member No.: 41384



I do compare the original wav to the one decoded from the flac file just to be sure and so far i had no checksum mismatch between the files.
My laptop does become really laggy (visual lag of the aero UI) while encodning with flaCUDA but at least i'm now compressing in about half the time for the same filesize laugh.gif
Thank you for writing this tool.

2 things that i miss are the --cuesheet and the --image parameters. I have a batch file that embeds both the cue and the cover into the flac automatically, but now i have to do it manually... could you please add this? (and possibly other general option switches like replaygain that the reference encoder has?)

edit:

A 59min 16-bit 44,1 kHz wav file,
  • flac 1.2.1b -8 --verify
    • Size: 434MB
    • Encoding time: 2:21
  • flaCUDA 0.6 default compression --verify
    • Size: 434MB
    • Encoding time: 1:13
  • flaCUDA 0.6 -11 --verify
    • Size: 432MB
    • Encoding time: 4:46


The 2 MB difference aren't worth the double encoding time for me, so i kept the default level. Oh and i encoded into a RAMdisk, not on HDD.
That's on a 8600m GT & C2D T7250 CPU (2GHz)

This post has been edited by ChronoSphere: Dec 20 2009, 13:24
Go to the top of the page
+Quote Post
Wombat
post Dec 21 2009, 00:55
Post #110





Group: Members
Posts: 951
Joined: 7-October 01
Member No.: 235



QUOTE (Gregory S. Chudov @ Dec 20 2009, 09:57) *
Verify is secure. It decodes each frame (on CPU) and compares each sample with original, so we can be sure that the result can be decoded into original at least using this decoder.

The fears of GPU errors aren't really justified in general, and are completely unjustified for Flacuda. It uses GPU to calculate best encoding options for each frame, but generates the output on CPU, so even if GPU would produce an error, this would only result in slightly lower compression.


Many thanks for clarifying this!
Go to the top of the page
+Quote Post
Gregory S. Chudo...
post Dec 24 2009, 17:38
Post #111





Group: Developer
Posts: 683
Joined: 2-October 08
From: Ottawa
Member No.: 59035



I noticed that in some cases CPU is still a bottleneck for FLACuda, so i'm experimenting with utilizing multicore processors.
Here is an experimental alpha version: FlaCuda07(08). I strongly recommend not to use it to encode anything valuable.
Experimental features are activated using two new command line parameters:
"--gpu-only" tries to ultilize GPU even for the tasks, which are maybe better suitable for CPU. Use it if you have a fast GPU and/or slow CPU. Note, that it also provides a very slightly better compression ratio.
"--cpu-threads N" tries to utilize N additional CPU cores.
I also somewhat retuned compression levels. -8 is still the maximum compression level compatible with flac subset (used by some hardware implementations). It is however quite impractical now, -7 is much faster and provides almost identical compression.

On my Core2 Duo + GTX 250 it works best with "--gpu-only --cpu-threads 1". Quadcore CPUs in theory might benefit from "--cpu-threads 3" or "--gpu-only --cpu-threads 2". Please, test it.

This post has been edited by Gregory S. Chudov: Dec 31 2009, 00:25


--------------------
CUETools 2.1.4
Go to the top of the page
+Quote Post
Case
post Dec 24 2009, 19:44
Post #112





Group: Developer (Donating)
Posts: 2137
Joined: 19-October 01
From: Finland
Member No.: 322



I got a nice speed boost from --gpu-only option even though my CPU is quite fast (Core i7 940). CPU thread count only affected results very little and with different compression modes the winner changed. The changes were too small to capture in my traditional graph.
Attached Image
Go to the top of the page
+Quote Post
Gregory S. Chudo...
post Dec 24 2009, 19:46
Post #113





Group: Developer
Posts: 683
Joined: 2-October 08
From: Ottawa
Member No.: 59035



Thank you!


--------------------
CUETools 2.1.4
Go to the top of the page
+Quote Post
Wombat
post Dec 25 2009, 00:21
Post #114





Group: Members
Posts: 951
Joined: 7-October 01
Member No.: 235



Thanx Mr. Chudov for improving Flacuda again. --gpu-only --cpu-threads 1 here on my C2D/GTX260 seems to utilize my system best also. With threads 2 it seems encoding gets tiny hickups. Compression at -8 got a bit better again against 0.6
Edit: i saw block size 4096 is default now, that of cause may be the small benefit.

Encoding speed is impressive!

This post has been edited by Wombat: Dec 25 2009, 00:30
Go to the top of the page
+Quote Post
glebe
post Dec 25 2009, 11:04
Post #115





Group: Members
Posts: 20
Joined: 17-February 09
Member No.: 67079



Hi!

Thank you, Gregory, for your great tools!

My setup: Q6600 @ 2.4GHz, 8800GT (660/1675/1900, FW 191.07), foobar2000 as front-end for FlaCUDA.

Verification was done after encoding using "bitcompare tracks" function, verification time is not included in results. All that I can say about verification is that there were no errors at all.

Size of the source WAV-file is 847,696,124 bytes. Encoding was done to another physical hard drive.

Here are results.

First, threading does not affect compression, files produced with a single thread and with multiple threads are bit-identical.

CODE
FlaCUDA 0.6
-7     0:33    545,049,178 bytes
-8     0:34    544,573,469 bytes
-11    1:02    540,785,523 bytes

FlaCUDA 0.7 --cpu-threads 3
-7     0:16    544,274,937 bytes
-8     0:16    544,238,955 bytes
-11    0:24    540,874,729 bytes

FlaCUDA 0.7 --cpu-threads 2
-7     0:16    544,274,937 bytes
-8     0:16    544,238,955 bytes
-11    0:24    540,874,729 bytes

FlaCUDA 0.7 --cpu-threads 1
-11    0:40    540,874,729 bytes

FlaCUDA 0.7 --cpu-threads 0 (i.e. single thread like 0.6)
-8     0:35    544,238,955 bytes
-11    0:53    540,874,729 bytes


Summary on threading tests:
1. In 0.7 version compression of -7 and -8 modes was slightly increased at the expense of speed, compression of -11 mode was decreased with a little speed up.
2. On quad-core Q6600 4 threads are useless, total 3 threads are enough.
3. -7 and -8 modes are practically equal.


CODE
FlaCUDA 0.7 --gpu-only --cpu-threads 3
-7     0:17    544,234,754 bytes
-8     0:18    544,198,701 bytes
-11    0:26    540,846,838 bytes

FlaCUDA 0.7 --gpu-only --cpu-threads 2
-7     0:16    544,234,754 bytes
-8     0:17    544,198,701 bytes
-11    0:26    540,846,838 bytes

FlaCUDA 0.7 --gpu-only --cpu-threads 0
-7     0:22    544,234,754 bytes
-8     0:22    544,198,701 bytes
-11    0:37    540,846,838 bytes


Summary on GPU tests:
1. Compression in all modes is better a bit than in non-GPU-only modes.
2. 4 threads on Q6600 are useless, 3 threads are the best.
3. "GPU-only + 3 threads" is slightly slower than "3 threads".


Final words: encoding 800+ MB file in just 16 (sixteen) seconds looks VERY impressive. It is 50MB/s, and maybe speed is limited by hard drive. Nice work! smile.gif

This post has been edited by glebe: Dec 25 2009, 11:13
Go to the top of the page
+Quote Post
Wombat
post Dec 27 2009, 15:10
Post #116





Group: Members
Posts: 951
Joined: 7-October 01
Member No.: 235



Due earlier testing i found in some rare cases some music compresses worse as flac. With your Flacuda 0.7 these ones all compress better then Flac now. Must be some optimizations working.
Go to the top of the page
+Quote Post
jpl73
post Dec 28 2009, 17:00
Post #117





Group: Members
Posts: 5
Joined: 28-December 09
Member No.: 76433



I have been lurking on HA for several years, but with the release of Flacuda 0.7, I just had to register to say AMAZING!!!!

I could not get v0.6 to run stable on my non-OC'd 8400GS for long periods of time. 1-2 albums would go fine but more than 10-15 would always fail. I think the problem was heat related. Using -- gpu-only and -- num-threads 1, I was able to get 40 albums to convert bit perfect with foobar. Additionally the performance (at level -11) went from 6.5x real time in v0.6 to 20x real time in v0.7. After remounting the heat sink with Arctic Silver 5, I was able to OC from 459/400/918 to 550/475/1500 MHz (core /mem/shaders) and now it runs at over 30x real time.

THANK YOU!

This program is incredible, excellent work and thank you for the X-mass present!
Go to the top of the page
+Quote Post
XAVeRY
post Dec 29 2009, 20:00
Post #118





Group: Members
Posts: 4
Joined: 10-August 09
From: Wroclaw, Poland
Member No.: 72200



Hello.

Even though I'm a newcomer here, it was very difficult for me to simply ignore such a topic. I have performed some of my own tests, based on three different recordings which - I believe - represent very different music types so that we can see how the encoder behaves when fed with various musical styles. I've used Morrissey's "Live at Earls Court", My Bloody Valentine's "Loveless" and David Bowie's "Low" for my tests. All these recordings were ripped from original CDs by EAC in secure mode as single-file WAV CD images.

I've encoded the files with FlaCuda 0.7 with two different sets of switches, FlaCuda 0.6, a 64-bit compile of FLAC v1.2.1 (from here, the exe itself says that it's flac 1.2.0, but the encoder tag is 1.2.1, so I think it's 1.2.1) and the ordinary 32-bit FLAC v1.2.1. I know that the two last ones are probably a bit offtopic, but I've been looking for a chance to try out the 64-bit binary I've found sometime ago.

My setup isn't very impressive, it's a laptop with Core 2 Duo T7300, 2GB of RAM, and GeForce 8600M GT with 256MB of RAM. The input files were stored on the laptop's internal drive, encoder's output files were directed to an external hard drive connected via eSATA. I would've used a ramdisk if only I could find a ramdisk driver for the 64-bit version of Windows 7 Professional, which I have installed.

Without further ado, here are the results :
CODE
Morrissey - Live at Earls Court (1h14m10s = 4450s) :

WAV size = 785 024 732 bytes

FlaCUDA 0.7 --gpu-only --cpu-threads 1 :

/-----------------------------------\
| Lvl |      Time     |   Filesize  |
|-----------------------------------|
|  8  | 1m11s, 62.21x | 580 785 189 |
|  9  | 0m45s, 99.16x | 580 734 747 |
|  10 | 1m10s, 63.93x | 580 665 650 |
|  11 | 1m38s, 45.43x | 580 641 565 |
\-----------------------------------/

FlaCUDA 0.7 --cpu-threads 1 :

/-----------------------------------\
| Lvl |      Time     |   Filesize  |
|-----------------------------------|
|  8  | 1m05s, 68.92x | 580 820 235 |
|  9  | 0m36s, 122.38x| 580 769 799 |
|  10 | 1m01s, 73.30x | 580 700 243 |
|  11 | 1m29s, 50.14x | 580 676 064 |
\-----------------------------------/

FlaCUDA 0.6 :

/-----------------------------------\
| Lvl |      Time     |   Filesize  |
|-----------------------------------|
|  8  | 1m31s, 48.87x | 581 189 153 |
|  9  | 2m07s, 34.84x | 581 016 298 |
|  10 | 3m38s, 20.38x | 580 924 773 |
|  11 | 5m35s, 13.27x | 580 903 271 |
\-----------------------------------/

FLAC v1.2.1, 64-bit binary :

/-----------------------------------\
| Lvl |      Time     |   Filesize  |
|-----------------------------------|
|  5  | 0m57s, 78.07x | 582 393 173 |
|  6  | 0m59s, 75.42x | 582 374 477 |
|  7  | 1m55s, 38.70x | 582 063 693 |
|  8  | 2m32s, 29.28x | 581 599 608 |
\-----------------------------------/

FLAC v1.2.1, 32-bit binary :

/-----------------------------------\
| Lvl |      Time     |   Filesize  |
|-----------------------------------|
|  5  | 0m45s, 98.89x | 582 393 173 |
|  6  | 0m48s, 92.71x | 582 374 477 |
|  7  | 1m53s, 39.38x | 582 063 693 |
|  8  | 2m36s, 28.53x | 581 599 493 |
\-----------------------------------/

~~~~~~~~~~

My Bloody Valentine - Loveless (48m36s = 2916s) :

WAV size = 514 398 908 bytes

FlaCUDA 0.7 --gpu-only --cpu-threads 1 :

/-----------------------------------\
| Lvl |      Time     |   Filesize  |
|-----------------------------------|
|  8  | 0m47s, 61.53x | 313 373 855 |
|  9  | 0m41s, 70.49x | 311 032 216 |
|  10 | 1m09s, 42.55x | 310 584 559 |
|  11 | 1m34s, 30.96x | 310 500 491 |
\-----------------------------------/

FlaCUDA 0.7 --cpu-threads 1 :

/-----------------------------------\
| Lvl |      Time     |   Filesize  |
|-----------------------------------|
|  8  | 1m05s, 65.96x | 313 396 279 |
|  9  | 0m34s, 85.32x | 311 043 003 |
|  10 | 1m02s, 47.24x | 310 595 777 |
|  11 | 1m27s, 33.55x | 310 512 544 |
\-----------------------------------/

FlaCUDA 0.6 :

/-----------------------------------\
| Lvl |      Time     |   Filesize  |
|-----------------------------------|
|  8  | 1m00s, 48.84x | 313 436 464 |
|  9  | 1m24s, 34.70x | 312 379 410 |
|  10 | 2m23s, 20.36x | 310 902 529 |
|  11 | 3m40s, 13.27x | 310 269 981 |
\-----------------------------------/

FLAC v1.2.1, 64-bit binary :

/-----------------------------------\
| Lvl |      Time     |   Filesize  |
|-----------------------------------|
|  5  | 0m36s, 81.00x | 316 076 796 |
|  6  | 0m38s, 76.74x | 316 074 660 |
|  7  | 1m14s, 39.41x | 315 819 946 |
|  8  | 1m42s, 28.59x | 314 357 377 |
\-----------------------------------/

FLAC v1.2.1, 32-bit binary :

/-----------------------------------\
| Lvl |      Time     |   Filesize  |
|-----------------------------------|
|  5  | 0m28s, 104.14x| 316 076 796 |
|  6  | 0m30s, 97.20x | 316 074 660 |
|  7  | 1m09s, 42.26x | 315 819 946 |
|  8  | 1m37s, 30.06x | 314 357 051 |
\-----------------------------------/

~~~~~~~~~~

David Bowie - Low (50m30s = 3030s)

WAV size = 534 492 044 bytes

FlaCUDA 0.7 --gpu-only --cpu-threads 1 :

/-----------------------------------\
| Lvl |      Time     |   Filesize  |
|-----------------------------------|
|  8  | 0m49s, 62.25x | 306 400 307 |
|  9  | 0m30s, 100.43x| 306 223 383 |
|  10 | 0m46s, 65.57x | 306 190 297 |
|  11 | 1m05s, 46.62x | 306 176 876 |
\-----------------------------------/

FlaCUDA 0.7 --cpu-threads 1 :

/-----------------------------------\
| Lvl |      Time     |   Filesize  |
|-----------------------------------|
|  8  | 0m44s, 68.56x | 306 424 702 |
|  9  | 0m24s, 124.51x| 306 245 100 |
|  10 | 0m40s, 75.40x | 306 211 961 |
|  11 | 0m59s, 51.53x | 306 198 420 |
\-----------------------------------/

FlaCUDA 0.6 :

/-----------------------------------\
| Lvl |      Time     |   Filesize  |
|-----------------------------------|
|  8  | 1m02s, 48.95x | 306 476 119 |
|  9  | 1m27s, 34.85x | 306 334 437 |
|  10 | 2m29s, 20.36x | 306 256 934 |
|  11 | 3m48s, 13.27x | 306 158 122 |
\-----------------------------------/

FLAC v1.2.1, 64-bit binary :

/-----------------------------------\
| Lvl |      Time     |   Filesize  |
|-----------------------------------|
|  5  | 0m36s, 84.17x | 307 282 187 |
|  6  | 0m36s, 84.17x | 307 277 929 |
|  7  | 1m13s, 41.51x | 306 994 345 |
|  8  | 1m42s, 29.71x | 306 797 897 |
\-----------------------------------/

FLAC v1.2.1, 32-bit binary :

/-----------------------------------\
| Lvl |      Time     |   Filesize  |
|-----------------------------------|
|  5  | 0m29s, 104.48x| 307 282 187 |
|  6  | 0m30s, 101.00x| 307 277 929 |
|  7  | 1m12s, 42.08x | 306 994 345 |
|  8  | 1m38s, 30.92x | 306 797 499 |
\-----------------------------------/

They seem very interesting to me. I'm the most surprised (and let down) by the inferior performance of the 64-bit binary in comparison to the 32-bit one. FlaCuda performed extremely well, I couldn't believe my eyes when I saw how fast the conversion was going with the newest version and the -9 switch. As expected, enabling --gpu-only slightly reduces the filesize at the cost of slightly longer encode time.

I'm taken aback by all the fabulous work you've done, Mr. Chudov, and hope to see new versions soon. By the way, how long do you intend to keep this project in alpha stage?

This post has been edited by XAVeRY: Dec 29 2009, 20:02


--------------------
some girls are bigger than others
Go to the top of the page
+Quote Post
Wombat
post Dec 30 2009, 01:10
Post #119





Group: Members
Posts: 951
Joined: 7-October 01
Member No.: 235



Earlier in this thread i tried this one with Flacuda 0.4

Dream Theater, Awake

Original 793.976.444 Bytes
Flac 1.21 -8 568.604.561 Bytes ~94 sec. encoding time
Flaccuda -8 567.956.198 Bytes ~53 sec.


Now with Flacuda 0.7 it really seems like my HD speed became the limiting factor. Insane! smile.gif
Flacuda 0.7 -8 567.754.207 Bytes ~13.4 sec. ~337x

There comes me to mind if people playing with the Flac codec before or even Mr. Coalson himself once had an idea how to improve compression but never digged deeper cause of maniac computing power it would need. I bet Mr. Chudov already did some under the hood or at least looses some sleep about that.
Now the time has come wink.gif
And imagine if Fermi hits the road or an alike code works under OpenCL for recent DX11 Ati cards...
Go to the top of the page
+Quote Post
Gregory S. Chudo...
post Dec 30 2009, 02:08
Post #120





Group: Developer
Posts: 683
Joined: 2-October 08
From: Ottawa
Member No.: 59035



Thanks to all for kind words and detailed test results, especially for test results smile.gif

QUOTE (XAVeRY @ Dec 29 2009, 22:00) *
I'm the most surprised (and let down) by the inferior performance of the 64-bit binary in comparison to the 32-bit one.

That's to be expected. 64-bit compile itself doesn't normally make code faster. In some applications you can gain some speed by rewriting parts of code, but more often you loose some speed. In this case, 64-bit compile most probably has SSE optimizations disabled, because SSE assembler code has to be rewritten for 64-bit mode. The increased number of registers in 64-bit mode allowed the compiler to make up for it, and almost reach the speed of SSE code. Modern compilers are that good.

QUOTE (XAVeRY @ Dec 29 2009, 22:00) *
how long do you intend to keep this project in alpha stage?

At least until i can test it using flac test suite by Josh, and make sure it runs ok on the next generation of GPU's (Fermi).

Ideally, i would like to see it incorporated into mainstream flac in some form.

QUOTE (Wombat @ Dec 30 2009, 03:10) *
There comes me to mind if people playing with the Flac codec before or even Mr. Coalson himself once had an idea how to improve compression but never digged deeper cause of maniac computing power it would need. I bet Mr. Chudov already did some under the hood or at least looses some sleep about that.

I did for some time, but now i'm quite sure that we have reached the limit of flac format. There's no room for further compression improvement without a new one.


--------------------
CUETools 2.1.4
Go to the top of the page
+Quote Post
Wombat
post Dec 30 2009, 02:21
Post #121





Group: Members
Posts: 951
Joined: 7-October 01
Member No.: 235



QUOTE (Gregory S. Chudov @ Dec 30 2009, 02:08) *
QUOTE (Wombat @ Dec 30 2009, 03:10) *
There comes me to mind if people playing with the Flac codec before or even Mr. Coalson himself once had an idea how to improve compression but never digged deeper cause of maniac computing power it would need. I bet Mr. Chudov already did some under the hood or at least looses some sleep about that.

I did for some time, but now i'm quite sure that we have reached the limit of flac format. There's no room for further compression improvement without a new one.


I was under the impression there still is some room. At least i think to remember Mr. Beck the TAK developer somewhere mentioned he has some ideas to improve Flacs compression. This may of cause with some changes in the codes structure in mind.

If we reached the end this has of cause at least one positive side. I never have to reencode again smile.gif
Go to the top of the page
+Quote Post
Fandango
post Dec 30 2009, 02:51
Post #122





Group: Members
Posts: 1546
Joined: 13-August 03
Member No.: 8353



Gregory, have you decided yet if you want to give OpenCL a try soon or do you rather want to improve the existing flaCUDA for the time being?
Go to the top of the page
+Quote Post
Gregory S. Chudo...
post Dec 30 2009, 03:06
Post #123





Group: Developer
Posts: 683
Joined: 2-October 08
From: Ottawa
Member No.: 59035



It's a bit early for OpenCL. According to NVIDIA, Fermi will be their first OpenCL-optimized architecture. The only upside to OpenCL is that such code would be easier to modify to work with AMD GPUs. That would require an AMD GPU, and i don't have one yet. I would also probably have to upgrade my computer to get a second PCIe slot, and i'm not even sure that i can have two different sets of GPUS/drivers/SDKs running on one computer.


--------------------
CUETools 2.1.4
Go to the top of the page
+Quote Post
Wombat
post Dec 31 2009, 02:44
Post #124





Group: Members
Posts: 951
Joined: 7-October 01
Member No.: 235



Noticed you updated the cuetools homepage and offering FlaCuda 0.8 smile.gif

One more question:
QUOTE (Gregory S. Chudov @ Dec 30 2009, 02:08) *
QUOTE (XAVeRY @ Dec 29 2009, 22:00) *
how long do you intend to keep this project in alpha stage?

At least until i can test it using flac test suite by Josh, and make sure it runs ok on the next generation of GPU's (Fermi).


What is that "Test Suite" and why isnīt it an easy thing to test? Is it a personal collection of files Mr. Coalson has in his "lab"?
Go to the top of the page
+Quote Post
Gregory S. Chudo...
post Dec 31 2009, 03:25
Post #125





Group: Developer
Posts: 683
Joined: 2-October 08
From: Ottawa
Member No.: 59035



0.8 is basically a re-branded 0.7 with default compression mode changed from -5 to -7 and default mode set to --gpu-only, to provide better results for casual user who doesn't want to bother with command line switches smile.gif It doesn't deserve separate testing.

flac test suite is available in flac's sources, but has to be adapted for FlaCuda. This should be easy, it was done once with flake. I'm just very lazy and didn't find time to do it.


--------------------
CUETools 2.1.4
Go to the top of the page
+Quote Post

17 Pages V  « < 3 4 5 6 7 > » 
Reply to this topicStart new topic
1 User(s) are reading this topic (1 Guests and 0 Anonymous Users)
0 Members:

 



RSS Lo-Fi Version Time is now: 24th April 2014 - 19:28