aoTuVbeta6.02 |
![]() ![]() |
aoTuVbeta6.02 |
May 5 2011, 11:06
Post
#26
|
|
![]() xcLame and OggDropXPd Developer Group: Developer Posts: 3706 Joined: 30-September 01 From: Bracknell, UK Member No.: 111 |
In the meantime you can test my compile. It doesn't have built-in FLAC reader and resampler but since you use oggenc2 as encoding backend for foobar2000 they are useless anyway. What compiler optimisations are you using? -------------------- John
---------------------------------------------------------------- My compiles and utilities are at http://www.rarewares.org/ |
|
|
|
May 5 2011, 11:49
Post
#27
|
|
|
Group: Members Posts: 339 Joined: 24-November 08 Member No.: 63072 |
I have compared speeds of my test oggenc compile posted earlier with Rarewarez's and the speeds are up to pair, used IntelC++ with maximum optimizations, I'd be curious too about the LancerMod compiling params, maybe reduced floating precission? But still I don't expect so much speedup from only that.
This post has been edited by Anakunda: May 5 2011, 11:51 |
|
|
|
May 5 2011, 13:02
Post
#28
|
|
![]() Group: Members Posts: 373 Joined: 4-October 08 From: Ukraine Member No.: 59301 |
Quick comparison (single thread):
john33 (x64) - 45.15x realtime lvqcl (x64, SSE3) - 60.50x realtime Windows 7 x64, Intel Core i3 530 |
|
|
|
May 5 2011, 17:59
Post
#29
|
|
![]() Group: Developer Posts: 2980 Joined: 2-December 07 Member No.: 49183 |
Some SSE optimizations from Lancer code (for aoTuV 5) are still applicable for aoTuV 6. But the speed increase is 25...30% only.
My tests (Core2 Q9300 @2.5 GHz): CODE venc: 20.9x realtime Rarewares compiles: generic: 21.2x P4: 34.5x x64: 37.1x My compiles of oggenc2 without code from Lancer: 32-bit: 34.2x x64: 36.8x (almost the same as oggenc2 from Rarewares) My compiles of oggenc2 with code from Lancer (these were uploaded): 32-bit SSE: 38.1x 32-bit SSE2: 46.1x 32-bit SSE3: 46.0x 64-bit SSE2: 47.8x 64-bit SSE3: 48.9x I DIDN'T test these compiles on AMD processors. What compiler optimisations are you using? Compiler: MSVS 2010 SP1 + Intel Composer XE 2011 upd3. Options: Whole program optimization = Yes C/C++ optimization: /O3 /Ob2 /Oi /Ot /Qipo Code Generation: /GF /MT /GS- /arch:SSE3 /fp:fast=2 Since your compiles are a bit faster (well, less that 1%, but anyway) may I ask about your compiler options? |
|
|
|
May 5 2011, 18:45
Post
#30
|
|
![]() xcLame and OggDropXPd Developer Group: Developer Posts: 3706 Joined: 30-September 01 From: Bracknell, UK Member No.: 111 |
Some SSE optimizations from Lancer code (for aoTuV 5) are still applicable for aoTuV 6. But the speed increase is 25...30% only. ... I didn't realise that you had ported some of the Lancer mods. Compiler: MSVS 2010 SP1 + Intel Composer XE 2011 upd3. Options: Whole program optimization = Yes C/C++ optimization: /O3 /Ob2 /Oi /Ot /Qipo Code Generation: /GF /MT /GS- /arch:SSE3 /fp:fast=2 Since your compiles are a bit faster (well, less that 1%, but anyway) may I ask about your compiler options? Compiler: MSVS 2008 + Intel Compiler 11.1.067. Options: Whole program optimization = No C/C++ optimisation: /O3 /Ob2 /Oi /Ot /Og /Qip /Qfp-speculation:fast Code Generation: /GF /EHsc /MT /GS /QaxSSSE3 /fp:fast (That's for x64) I've not tried fast=2, does that win you anything? The P4 compile is the same except: /arch:IA32 /QaxSSE2 in place of /QaxSSSE3 This post has been edited by john33: May 5 2011, 18:49 -------------------- John
---------------------------------------------------------------- My compiles and utilities are at http://www.rarewares.org/ |
|
|
|
May 5 2011, 19:38
Post
#31
|
|
![]() Group: Developer Posts: 2980 Joined: 2-December 07 Member No.: 49183 |
QUOTE (john33) I've not tried fast=2, does that win you anything? I just tested and it turns out that /fp:fast=2 is ~0.3% faster (IMHO it is within statistical error). Destroid: can you patch oggenc2 from Rarewares with iccpatch utility (several are mentioned on this page) and test again? |
|
|
|
May 5 2011, 19:48
Post
#32
|
|
![]() xcLame and OggDropXPd Developer Group: Developer Posts: 3706 Joined: 30-September 01 From: Bracknell, UK Member No.: 111 |
QUOTE (john33) I've not tried fast=2, does that win you anything? I just tested and it turns out that /fp:fast=2 is ~0.3% faster (IMHO it is within statistical error). ... I'll give it a try. -------------------- John
---------------------------------------------------------------- My compiles and utilities are at http://www.rarewares.org/ |
|
|
|
May 5 2011, 20:43
Post
#33
|
|
|
Group: Members Posts: 231 Joined: 6-April 09 Member No.: 68706 |
QUOTE I just tested and it turns out that /fp:fast=2 is ~0.3% faster (IMHO it is within statistical error). But how about sound quality? Is it affected? You know, 0.3% ain't much. This post has been edited by _mē_: May 5 2011, 20:43 |
|
|
|
May 6 2011, 12:22
Post
#34
|
|
![]() Group: Members Posts: 512 Joined: 4-June 02 Member No.: 2220 |
Destroid: can you patch oggenc2 from Rarewares with iccpatch utility (several are mentioned on this page) and test again? I am sorry to inform that I have not tried compiling these encoders before.But... I can concur with your some of your other benches: QUOTE My tests (Core2 Q9300 @2.5 GHz): CODE venc: 20.9x realtime ... My compiles of oggenc2 without code from Lancer: 32-bit: 34.2x If in regards to the ICL "bias" in disfavor of AMD, I'm not 100% sure if this is the case. Would it be worth asking john33 like to attempt compiles of MSVC that used SSE/2? I thought the generic compile only ended at ASM (just an half-wit suggestion). edit: lvqcl- just realized it is patch, not compiler thing, report back when later. Also, I seem to recall something about 'early' SSE2 vs. 'true' SSE2 instruction edit2: quick test of iccpatch definitely improved Rarewares P4 compile on my AMD about 15-20 percent at default Vorbis rate -q 3 setting. This post has been edited by Destroid: May 6 2011, 12:51 -------------------- "Something bothering you, Mister Spock?"
|
|
|
|
May 6 2011, 21:10
Post
#35
|
|
![]() Group: Members Posts: 512 Joined: 4-June 02 Member No.: 2220 |
Back with a new batch of test results. Same commentary track as previous test in this thread but at -q3 (still overkill bitrate). Threw in blacksword lancer, which I included only as a perspective on optimizations.
CODE Oggenc 2.83 aoTuv 5 Lancer 20061103 SSE2 31.956x 89.0 kb/s Oggenc 2.87 aoTuv 6.03 lvqcl SSE2 25.679x 89.9 kb/s OggEnc 2.87 aoTuV 6.03 john33 P4 w/ICCPATCH 20.078x 89.9 kb/s Venc aoTuV 6.03 13.381x 89.9 kb/s OggEnc 2.87 aoTuV 6.03 john33 P4 12.335x 89.9 kb/s The ICCPATCH really does have quite an impact on this particular AMD processor running Rarewares P4 compile. -------------------- "Something bothering you, Mister Spock?"
|
|
|
|
May 7 2011, 07:03
Post
#36
|
|
![]() Group: Members Posts: 512 Joined: 4-June 02 Member No.: 2220 |
I re-ran the Vorbis tests again, this time at -q2. Tested effect of ICCpatch on lvqcl's compile and changed to last Blacksword compile (1 whole week newer). I was also curious to test LAME compiles from Rarewares with ICCpatch. Here's the results:
CODE using test WAV 16bit, 48KHz, 2ch, 1,025,507,372 bytes
encoder & version (all run at -q2) time rate filesize _____________________________________________ ______ _______ ________________ Oggenc 2.83 aoTuv 5 Lancer 20061110 SSE2 2m 57s 30.196x 52,521,704 bytes Oggenc 2.87 aoTuv 6.03 lvqcl SSE2 3m 27s 25.801x 51,621,665 bytes Oggenc 2.87 aoTuv 6.03 lvqcl SSE2 w/ICCpatch 3m 34s 24.959x 51,621,665 bytes OggEnc 2.87 aoTuV 6.03 john33 P4 w/ICCpatch 4m 30s 19.782x 51,621,285 bytes Venc aoTuV 6.03 6m 22s 13.978x 51,621,326 bytes OggEnc 2.87 aoTuV 6.03 john33 P4 7m 10s 12.421x 51,621,530 bytes Foobar2000 bit-compare tracks: OGG files of lvqcl patched vs. unpatched = No differences in decoded data found OGG files of john33 patched vs. unpatched = Differences found: 47294972 sample(s), starting at 3.2973333 second(s), peak: 0.0511622 at 4980.8489065 second(s), 2ch version (all run at -V6) time rate filesize ___________________________ ______ _______ ________________ LAME 3.98.4 4m 52s 18.256x 60,421,848 bytes LAME 3.98.4 (ICCpatch) 4m 46s 18.673x 60,421,848 bytes LAME 3.99 beta 0 6m 29s 13.706x 59,409,552 bytes LAME 3.99 beta 0 (ICCpatch) 4m 36s 19.306x 59,409,552 bytes Foobar2000 bit-compare tracks: MP3 files of 3.98.4 patched vs. unpatched = No differences in decoded data found MP3 files of 3.99 beta 0 patched vs. unpatched = No differences in decoded data found -------------------- "Something bothering you, Mister Spock?"
|
|
|
|
May 7 2011, 10:09
Post
#37
|
|
![]() Group: Developer Posts: 2980 Joined: 2-December 07 Member No.: 49183 |
CODE Oggenc 2.87 aoTuv 6.03 lvqcl SSE2 25.679x 89.9 kb/s OggEnc 2.87 aoTuV 6.03 john33 P4 w/ICCPATCH 20.078x 89.9 kb/s As I said, my compiles (with some optimizations from Lancer) are 25...30% faster than pure C code. 25.679/20.078 = 1.28, as expected. CODE OggEnc 2.87 aoTuV 6.03 john33 P4 12.335x 89.9 kb/s IMHO using /arch:.... option in addition to (or instead of) /Qax... should increase encoding speed on non-Intel processors. |
|
|
|
![]() ![]() |
|
Lo-Fi Version | Time is now: 19th May 2013 - 10:55 |