aoTuV 5.7/bs1 x64/x86 Compiles for AMD chips |
![]() ![]() |
aoTuV 5.7/bs1 x64/x86 Compiles for AMD chips |
Jan 6 2010, 01:41
Post
#1
|
|
|
Group: Members Posts: 75 Joined: 11-November 08 Member No.: 62144 |
http://www.agner.org/optimize/blog/read.php?i=49
as this artical explains, Intels CPU dispatcher automatically generates multi optimization paths when compiling, Any cpu the dispatcher detects as not being Intel is sent a FAR slower code path, this means that most likely if this guide is followed http://www.agner.org/optimize/#manual_cpp the compiled software would endup being alot faster on non-intel cpus such as amd and via. Was hoping somebody with the skill (john33 if hes got the time) could compile a non-cpu bias version so we can see how this extreamly bias dispatcher is effecting non-intel users. btw, in pcmark the dif on a via nano was 47.4% perf boost by making the program see it as an intel cpu rather then via, thats HUGE...... |
|
|
|
Jan 6 2010, 02:17
Post
#2
|
|
![]() Group: Developer Posts: 2986 Joined: 2-December 07 Member No.: 49183 |
Obviously you can use generic aoTuV 5.7 compile from http://www.rarewares.org/ogg-oggenc.php
You can also test aoTuV 5.7 P4 compile vs. P3 vs. generic... Usually ICC compile is faster than generic (MSVC) not only on Intel but on AMD processors too (usually but not always). |
|
|
|
Jan 6 2010, 02:54
Post
#3
|
|
|
Group: Members Posts: 75 Joined: 11-November 08 Member No.: 62144 |
Obviously you can use generic aoTuV 5.7 compile from http://www.rarewares.org/ogg-oggenc.php You can also test aoTuV 5.7 P4 compile vs. P3 vs. generic... Usually ICC compile is faster than generic (MSVC) not only on Intel but on AMD processors too (usually but not always). yes, but intels compiler sends AMD chips and any non-intel chip a less then optimal code path is my point, IF you can make the software THINK your using a "Genuine_Intel" cpu you get the optimal path, otherwise, intels cpu dispatcher choose a less then optimal path(manytimes still faster then other compilers work, but still slower then it should be) This just just a request to give it a shot to see if it makes a diffrance, till intel puts out an unbias version of their compiler(they already signed an agreement with amd to do this for AMD chips, but that isnt gonna help via if they dont change the use if CPUID strings for choosing code path rather then quaring the cpu for supported features. ars showed a 47.4% boost on the via nano by faking the cpuid string as intel, and a smaller boost(think it was like 10%) by identifying the cup as amd... QUOTE My my. Swap CentaurHauls for AuthenticAMD, and Nano's performance magically jumps about 10 percent. Swap for GenuineIntel, and memory performance goes up no less than 47.4 percent. This is not a test error or random occurrence; I benchmarked each CPUID multiple times across multiple reboots on completely clean Windows XP installations. The gains themselves are not confined to a small group of tests within the memory subsystem evaluation, but stretch across the entire series of read/write tests. Only the memory latency results remain unchanged between the two CPUIDs. http://arstechnica.com/hardware/reviews/20...no-review.ars/6 if the link wont work you can use google cache to view it(its been loading VERY slow the last few days) this is what im talking about, huge diffrances by changing cpuid string. This post has been edited by AshenTech: Jan 6 2010, 03:08 |
|
|
|
Jan 28 2010, 17:52
Post
#4
|
|
|
Group: Members Posts: 14 Joined: 18-March 08 Member No.: 52124 |
And why not manually optimize the time critical part by assembly, just like ffmpeg and x264 devels have done?
|
|
|
|
Feb 13 2010, 03:43
Post
#5
|
|
![]() Group: FB2K Moderator Posts: 2359 Joined: 30-November 07 Member No.: 49158 |
Because unless you can find a nice mathematical or programming trick how to do things differently, the extra work is rarely worth it with todays optimizing compilers.
-------------------- Full-quoting makes you scroll past the same junk over and over.
|
|
|
|
Mar 7 2010, 14:59
Post
#6
|
|
![]() Group: Members Posts: 74 Joined: 10-December 09 From: italy Member No.: 75798 |
Another interesting solution could be adopt Orc* that - according to the newest Schrödinger release - seems to optimize code very mutch:
QUOTE
QUOTE we’ve switched over to using Orc instead of liboil for signal processing code. Dirac is a very configurable format, and normally would require thousands of lines of assembly code — Orc generates this at runtime from simple rules. (Hey, it was easier to write Orc than write all that assembly!) I'm not a developer (nor a binary builder), so I simply don't know if it's applicable to Vorbis (and, why not, Theora) too. BTW, hope that inspire... * note: ORC means Oil Runtime Compiler, not Open Research Compiler... This post has been edited by forart.eu: Mar 7 2010, 15:27 |
|
|
|
Mar 30 2010, 08:07
Post
#7
|
|
|
Group: Members Posts: 13 Joined: 8-October 09 Member No.: 73798 |
"Runtime Compiler" really should give your answer away
This post has been edited by X-Fi6: Mar 30 2010, 08:09 -------------------- Mixing audio perfectly doesn't take more than onboard.
|
|
|
|
![]() ![]() |
|
Lo-Fi Version | Time is now: 26th May 2013 - 09:55 |