TAK 2.0 Development |
![]() ![]() |
TAK 2.0 Development |
Nov 11 2009, 13:38
Post
#26
|
|
![]() Group: Members Posts: 338 Joined: 4-June 02 Member No.: 2220 |
And no, that's not the results of the dedicated LossyWav codec i intend to develop later. Pardon the double-post, but I somehow overlooked this part of the OP. Thomas, can you elaborate about this?-------------------- "Something bothering you, Mister Spock?"
|
|
|
|
Nov 11 2009, 15:51
Post
#27
|
|
|
Group: Members Posts: 13 Joined: 3-November 06 Member No.: 37122 |
Yes, the "focus has shifted from the higher to the lower presets". This was happening from the first presentation of YALAC (as TAK was named before it's first stable release). I have always been affected by user comments at hydrogen. Most users (who posted) wanted it to be as fast as possible. A significant amount of posters (not necessarily users...) kept emphasizing TAK's slower decoding speed than FLAC (especially when using foobar). Maybe i have taken this too serious... Possibly a lot of users indeed would welcome a bit stronger compression in exchange for a bit slower decoding. Possibly i should create a poll? I would say, the strongest side of TAK is efficiency - size/speed rate. It's great to have a codec that has presets with encoding/decoding speed comparable to FLAC and presets with compression levels comparable to APE High. So I think you shouldn't have hard time choosing "speed or compression". Lower presets are for better speed, higher ones - for better compression |
|
|
|
Nov 12 2009, 14:46
Post
#28
|
|
|
TAK Developer Group: Developer Posts: 887 Joined: 1-April 06 Member No.: 29051 |
QUOTE Possibly a lot of users indeed would welcome a bit stronger compression in exchange for a bit slower decoding. Possibly i should create a poll? Why not? I predict the people who use -p3 and up would be in favor of better compression, whereas -p2 users want the fastest encode/decode performance. I am not sure how much die-hard users who want the best compression would care about slower decompression speed. As far as -p3 settings and higher I think the poll might as well ask if decode speed is a priority (i.e. DAW and HTPC don't need to be battery-saving).I would say, the strongest side of TAK is efficiency - size/speed rate. It's great to have a codec that has presets with encoding/decoding speed comparable to FLAC and presets with compression levels comparable to APE High. So I think you shouldn't have hard time choosing "speed or compression". Lower presets are for better speed, higher ones - for better compression I think you both are right: Let's regard -p0 to -p2 as the speed settings focussing on maximum decoding speed and -p3/-p4 as the power settings which sacrifice some more decoding speed for a bit better compression, even if the proportion may get somewhat insane efficiencywise. I have alreday raised the maximum predictor count to 256 and will post some results soon. And no, that's not the results of the dedicated LossyWav codec i intend to develop later. Pardon the double-post, but I somehow overlooked this part of the OP. Thomas, can you elaborate about this?I am quite confident that i can modify the codec to achieve at least 1 percent better compression for LossyWav-files. Furthermore it may perform significantly better when compressing files with sections where LossyWav hasn't removed any bits and the efficiency falls behind that of a pure lossless encode. Earlier i wanted to integrate those modifications into the TAK 2.0 codec, but now i plan to create a dedicated codec (the file format supports up to 64 different codecs) . That's a bit easier to do and doesn't stall the 2.0 release. Thomas |
|
|
|
Nov 13 2009, 08:56
Post
#29
|
|
![]() Group: Members Posts: 82 Joined: 14-July 02 From: Lommel (Belgium) Member No.: 2593 |
Thomas,
I'm curious on your views of GPGPU and how it could be beneficial for TAK. Do you have any plans to look at this? Mind you I only have a general understanding of how this stuff works but it seems to be getting more interesting now things are going to get more standardized. The following thread also handles this and even has a proof of concept FLAC implementation based on CUDA. |
|
|
|
Nov 13 2009, 22:43
Post
#30
|
|
|
TAK Developer Group: Developer Posts: 887 Joined: 1-April 06 Member No.: 29051 |
I have alreday raised the maximum predictor count to 256 and will post some results soon. Here they are: CODE AMD Sempron 2.2 GHz Preset Compression Enco-Speed Deco-Speed 1.1.2 2.0 Win 1.1.2 2.0 Win 1.1.2 2.0 Win -p3 56.52 56.32 0.20 55.97 55.53 -0.79% 190.88 210.42 10.24% -p4 56.16 55.96 0.20 32.07 25.10 -21.73% 166.89 148.77 -10.86% -p4m 56.07 55.80 0.27 17.81 9.34 -47.56% Intel Pentium Dual Core 2 GHz Preset Compression Enco-Speed Deco-Speed 1.1.2 2.0 Win 1.1.2 2.0 Win 1.1.2 2.0 Win -p3 56.52 56.32 0.20 64.64 63.45 -1.84% 214.36 234.40 9.35% -p4 56.16 55.96 0.20 36.90 28.64 -22.38% 193.97 178.50 -7.98% -p4m 56.07 55.80 0.27 21.07 10.38 -50.74% Compression is relative to the original file size, Enco- and Deco-Speed expressed as multiple of real time. -p3 is now using 96 instead of 80, -p4 256 instead of 160 predictors. Some results for other sample rates and bit depths: CODE Preset Compression 1.1.2 2.0 Eval Win 24 bit / 44 khz -p4m 56.78 56.67 0.11 24 bit / 96 khz -p4m 50.69 50.61 0.08 24 bit / 192 khz -p4m 44.86 43.76 1.10 8 bit -p4m 38.23 37.08 1.15 Decoding of -p4 is still at least 150 times faster than realtime, what should be acceptable. This post has been edited by TBeck: Nov 13 2009, 22:45 |
|
|
|
Nov 14 2009, 01:06
Post
#31
|
|
![]() Group: Members Posts: 110 Joined: 27-January 03 From: Perth, AU Member No.: 4755 |
CODE Intel Pentium Dual Core 2 GHz Preset Compression Enco-Speed 1.1.2 2.0 Win 1.1.2 2.0 Win -p4 56.16 55.96 0.20 36.90 28.64 -22.38% -p4m 56.07 55.80 0.27 21.07 10.38 -50.74% That's quite an impressive improvement actually, when compare old p4m to new p4. Compression ratio from both presets are about the same (0.11 difference) but encoding speed improves 35.93%. This post has been edited by zombiewerewolf: Nov 14 2009, 01:07 |
|
|
|
Nov 14 2009, 13:13
Post
#32
|
|
![]() Group: Members Posts: 47 Joined: 21-February 07 From: Graz, Austria Member No.: 40824 |
Compression ratio from both presets are about the same (0.11 difference) but encoding speed improves 35.93%. Edit: Now I get it, you compared old p4m with the new p4. Sorry, my bad... This post has been edited by yerma: Nov 14 2009, 13:15 |
|
|
|
Nov 16 2009, 00:45
Post
#33
|
|
|
TAK Developer Group: Developer Posts: 887 Joined: 1-April 06 Member No.: 29051 |
I'm curious on your views of GPGPU and how it could be beneficial for TAK. Do you have any plans to look at this? Mind you I only have a general understanding of how this stuff works but it seems to be getting more interesting now things are going to get more standardized. The following thread also handles this and even has a proof of concept FLAC implementation based on CUDA. I am always interested into new opportunities to optimize TAK and GPGPU is no exception. It's definitely possible to utilize GPGPU for encoding, but i have no practical experience yet. And it will take some time until i will try it. The most important reason: I am very concerned about the reliability of the encoder. I am worried about failures of the GPU-memory resulting in unrecognized encoder errors. Unless i find a trustable study which shows that GPU-memory isn't failing more often than system memory or unless more GPU's come with ECC for their memory, i really dont wan't to take a risk. Currently i would prefer to add multicore support to the encoder. It's easier to do, will probably result in a larger speed gain (especially on systems with slow GPU's) and i am feeling more safe regarding the reliability. This post has been edited by TBeck: Nov 16 2009, 00:45 |
|
|
|
Nov 16 2009, 08:15
Post
#34
|
|
![]() Group: Members Posts: 82 Joined: 14-July 02 From: Lommel (Belgium) Member No.: 2593 |
... I am very concerned about the reliability of the encoder. I am worried about failures of the GPU-memory resulting in unrecognized encoder errors. Unless i find a trustable study which shows that GPU-memory isn't failing more often than system memory or unless more GPU's come with ECC for their memory, i really dont wan't to take a risk. ... Sounds perfectly reasonable to me and it might be the safest approach for new tech, although I think nVidia is actually going to include ECC into their new Fermi architecture. I actually don't know if it will also be made available in normal consumerproducts as marketsegmentation sometimes dictates otherwise. Thanks for the answer and nice to hear that you're looking into multicore CPU support. This post has been edited by Bylie: Nov 16 2009, 08:16 |
|
|
|
Nov 16 2009, 08:29
Post
#35
|
|
|
Group: Members Posts: 556 Joined: 4-May 04 From: Paris, France Member No.: 13875 |
Currently i would prefer to add multicore support to the encoder. It's easier to do, will probably result in a larger speed gain (especially on systems with slow GPU's) and i am feeling more safe regarding the reliability. As I've had to deal with bit-flip errors with RAM myself, I understand your concern. I also recall reading an article recently about how GPU RAM errors weren't considered critical since one wrong pixel in a video game doesn't matter much. Here's the thing with lossless codecs though: you can always compare the encode to the source. So, I'm not too worried. As for the multi-core CPU vs. GPU: think of laptops, HTPCs and generally lower-end computers. They're more likely to have only one or two CPU cores, and a GPU. My new laptop is something of a special case, but a GPU-enabled encoder would run much faster on it: it has an Intel Atom N270 CPU (one core, two threads) and an NVIDIA Ion GPU (GeForce 9400M). The latter's benefit becomes clear when playing high-definition videos, where CPU usage stays below 20% (most of the time, below 10%). Now, the question is: what machines would benefit most from encoding speed improvements on TAK, which is already quite fast? In my case, my Atom/Ion netbook would certainly benefit more from a GPU-enabled encoder than my quad-core AMD Phenom PC would benefit from a multi-threaded implementation, especially since I can already encode multiple files in parallel. |
|
|
|
Nov 16 2009, 15:17
Post
#36
|
|
|
TAK Developer Group: Developer Posts: 887 Joined: 1-April 06 Member No.: 29051 |
I am very concerned about the reliability of the encoder. I am worried about failures of the GPU-memory resulting in unrecognized encoder errors. Unless i find a trustable study which shows that GPU-memory isn't failing more often than system memory or unless more GPU's come with ECC for their memory, i really dont wan't to take a risk. A quick search with google revealed this interesting page: MemtestG80: A Memory and Logic Tester for NVIDIA CUDA-enabled GPUs. It contains a link to an pdf of the study "Hard Data on Soft Errors: A Large-Scale Assessment of Real-World Error Rates in GPGPU". I had no time to really read or even critically rate the article, but here is an excerpt from the conclusions: QUOTE "We have presented the first large-scale study of error rates in GPGPU hardware, conducted over more than 20,000 GPUs on the Folding@home distributed computing network. Our control experiments on consumer-grade and dedicated-GPGPU hardware in a controlled environment found no errors. However, our large-scale experimental results show that approximately two-thirds of tested cards exhibited a pattern-sensitive susceptibility to soft errors in GPU memory or logic, confirming concerns about the reliability of the installed base of GPUs for GPGPU computation. We have further demonstrated that this nonzero error rate cannot be adequately explained by overclocking or time of day of execution (a proxy for ambient temperature). However, it appears to correlate strongly with GPU architecture, with boards based on the newer GT200 GPU having much lower error rates than those based on the older G80/G92 design. While we cannot rule out user error, misconfiguration on the part of Folding@home donors, or environmental effects as the cause behind nonzero error rates, our results strongly suggest that GPGPU is susceptible to soft errors under normal conditions on non-negligible timescales." As I've had to deal with bit-flip errors with RAM myself, I understand your concern. I also recall reading an article recently about how GPU RAM errors weren't considered critical since one wrong pixel in a video game doesn't matter much. Here's the thing with lossless codecs though: you can always compare the encode to the source. So, I'm not too worried. Yes, this seems to be the way to go. I forgot about TAK's verify option, which will decode each frame after encoding and compare the output with the original. Since decoding is so fast, this could be performed by the CPU without sacrificing too much encoding speed. Now, the question is: what machines would benefit most from encoding speed improvements on TAK, which is already quite fast? In my case, my Atom/Ion netbook would certainly benefit more from a GPU-enabled encoder than my quad-core AMD Phenom PC would benefit from a multi-threaded implementation, especially since I can already encode multiple files in parallel. I agree, this is the question... The answer may require a lot of testing of real implementations. Currently i don't want to put too much effort into it, but i will keep an eye on evaluations of GPU-implementations of similar algorithms as TAK is using. |
|
|
|
Nov 16 2009, 15:35
Post
#37
|
|
|
Group: Members Posts: 19 Joined: 6-April 09 Member No.: 68706 |
I'm curious on your views of GPGPU and how it could be beneficial for TAK. Do you have any plans to look at this? Mind you I only have a general understanding of how this stuff works but it seems to be getting more interesting now things are going to get more standardized. The following thread also handles this and even has a proof of concept FLAC implementation based on CUDA. I am always interested into new opportunities to optimize TAK and GPGPU is no exception. It's definitely possible to utilize GPGPU for encoding, but i have no practical experience yet. And it will take some time until i will try it. The most important reason: I am very concerned about the reliability of the encoder. I am worried about failures of the GPU-memory resulting in unrecognized encoder errors. Unless i find a trustable study which shows that GPU-memory isn't failing more often than system memory or unless more GPU's come with ECC for their memory, i really dont wan't to take a risk. Currently i would prefer to add multicore support to the encoder. It's easier to do, will probably result in a larger speed gain (especially on systems with slow GPU's) and i am feeling more safe regarding the reliability. I'm glac that there are more people like me who are concerned about GPU encoding correctness and it's great that you're one of them. I've seen somewhere a study done with NVIDIA collaboration that showed that GPU memory errors were indeed an issue...as well as some silicon issues, IIRC GPUs with few processing units damaged passing validation. But there's a simple solution: Encode on GPU, move to the main memory and then use CPU to verify the results. Retry (on CPU?) in case of a failure. It should warranty correctness with overhead low enough to still provide a great speed boost. Verification that parameters are best chosen would be infeasible, which would hurt compression ratio somehow, but I think that the speed improvement is well worth it...especially that there are reliable GPUs from NVIDIA on the way. This post has been edited by _mē_: Nov 16 2009, 15:37 |
|
|
|
Yesterday, 08:58
Post
#38
|
|
|
TAK Developer Group: Developer Posts: 887 Joined: 1-April 06 Member No.: 29051 |
Now i am preparing a first release... Probably i will call it a beta.
I am looking at code i have written weeks or months ago to find errors. Today i caught one. Fortunately: Even my quite exhaustive script-based test set possibly wouldn't have caught it because it was based upon very rare conditions. Mathematically possible but extremely rare in practice. Addditionally i am feeding the encoder with random data to test the decoder regarding features of the encoder, which will later be implemented. I want to be sure, that the TAK 2.0 decoder will decode files created by later, more sophisticated encoders without any problems. Some bad news for some power hungry users: For now i have again reduced the maximum predictor count from 256 to 160. But the decoder will be laid out to support even up to 320 predictors; this way i am able to add a really insane preset -p5 later ( if i want) and the files will be decodable by the V 2.0 decoder. I hope, this wasn't boaring. I will release a first version as soon as i am feeling confident enough about the reliability of the new codec. Thomas |
|
|
|
Yesterday, 16:37
Post
#39
|
|
|
Group: Members Posts: 714 Joined: 22-October 01 From: the Netherlands Member No.: 335 |
|
|
|
|
![]() ![]() |
|
Lo-Fi Version | Time is now: 21st November 2009 - 21:35 |