aoTuVbeta6.02

Topic: aoTuVbeta6.02 (Read 28421 times) previous topic - next topic

0 Members and 1 Guest are viewing this topic.

Reply #25 – 2011-05-05 11:06:53

Quote from: lvqcl on 2011-05-01 22:05:36

In the meantime you can test my compile.
It doesn't have built-in FLAC reader and resampler but since you use oggenc2 as encoding backend for foobar2000 they are useless anyway.

What compiler optimisations are you using?

aoTuVbeta6.02

Reply #26 – 2011-05-05 11:49:46

I have compared speeds of my test oggenc compile posted earlier with Rarewarez's and the speeds are up to pair, used IntelC++ with maximum optimizations, I'd be curious too about the LancerMod compiling params, maybe reduced floating precission? But still I don't expect so much speedup from only that.

aoTuVbeta6.02

Reply #27 – 2011-05-05 13:02:06

Quick comparison (single thread):

john33 (x64) - 45.15x realtime

lvqcl (x64, SSE3) - 60.50x realtime

Windows 7 x64, Intel Core i3 530

aoTuVbeta6.02

Reply #28 – 2011-05-05 17:59:51

Some SSE optimizations from Lancer code (for aoTuV 5) are still applicable for aoTuV 6. But the speed increase is 25...30% only.

My tests (Core2 Q9300 @2.5 GHz):

Code: [Select]

venc: 20.9x realtime

Rarewares compiles:
generic: 21.2x
P4: 34.5x
x64: 37.1x

My compiles of oggenc2 without code from Lancer:
32-bit: 34.2x
x64: 36.8x
(almost the same as oggenc2 from Rarewares)

My compiles of oggenc2 with code from Lancer (these were uploaded):
32-bit SSE: 38.1x
32-bit SSE2: 46.1x
32-bit SSE3: 46.0x

64-bit SSE2: 47.8x
64-bit SSE3: 48.9x

I DIDN'T test these compiles on AMD processors.

Quote from: john33 on 2011-05-05 11:06:53

What compiler optimisations are you using?

Compiler: MSVS 2010 SP1 + Intel Composer XE 2011 upd3.
Options:
Whole program optimization = Yes
C/C++ optimization: /O3 /Ob2 /Oi /Ot /Qipo
Code Generation: /GF /MT /GS- /arch:SSE3 /fp:fast=2

Since your compiles are a bit faster (well, less that 1%, but anyway) may I ask about your compiler options?

aoTuVbeta6.02

Reply #29 – 2011-05-05 18:45:40

Quote from: lvqcl on 2011-05-05 17:59:51

Some SSE optimizations from Lancer code (for aoTuV 5) are still applicable for aoTuV 6. But the speed increase is 25...30% only.
...

I didn't realise that you had ported some of the Lancer mods.

Quote from: lvqcl on 2011-05-05 17:59:51

Compiler: MSVS 2010 SP1 + Intel Composer XE 2011 upd3.
Options:
Whole program optimization = Yes
C/C++ optimization: /O3 /Ob2 /Oi /Ot /Qipo
Code Generation: /GF /MT /GS- /arch:SSE3 /fp:fast=2

Since your compiles are a bit faster (well, less that 1%, but anyway) may I ask about your compiler options?

Compiler: MSVS 2008 + Intel Compiler 11.1.067.
Options:
Whole program optimization = No
C/C++ optimisation: /O3 /Ob2 /Oi /Ot /Og /Qip /Qfp-speculation:fast
Code Generation: /GF /EHsc /MT /GS /QaxSSSE3 /fp:fast
(That's for x64)
I've not tried fast=2, does that win you anything?

The P4 compile is the same except: /arch:IA32 /QaxSSE2 in place of /QaxSSSE3

aoTuVbeta6.02

Reply #30 – 2011-05-05 19:38:03

Quote

I've not tried fast=2, does that win you anything?

I just tested and it turns out that /fp:fast=2 is ~0.3% faster (IMHO it is within statistical error).

Destroid: can you patch oggenc2 from Rarewares with iccpatch utility (several are mentioned on this page) and test again?

aoTuVbeta6.02

Reply #31 – 2011-05-05 19:48:15

Quote from: lvqcl on 2011-05-05 19:38:03

Quote
I've not tried fast=2, does that win you anything?

I just tested and it turns out that /fp:fast=2 is ~0.3% faster (IMHO it is within statistical error).
...

I'll give it a try.

aoTuVbeta6.02

Reply #32 – 2011-05-05 20:43:37

Quote

I just tested and it turns out that /fp:fast=2 is ~0.3% faster (IMHO it is within statistical error).

But how about sound quality? Is it affected? You know, 0.3% ain't much.

aoTuVbeta6.02

Reply #33 – 2011-05-06 12:22:22

Quote from: lvqcl on 2011-05-05 19:38:03

Destroid: can you patch oggenc2 from Rarewares with iccpatch utility (several are mentioned on this page) and test again?

I am sorry to inform that I have not tried compiling these encoders before.
But... I can concur with your some of your other benches:

Quote

My tests (Core2 Q9300 @2.5 GHz):
CODE
venc: 20.9x realtime
...
My compiles of oggenc2 without code from Lancer:
32-bit: 34.2x

If in regards to the ICL "bias" in disfavor of AMD, I'm not 100% sure if this is the case.

Would it be worth asking john33 like to attempt compiles of MSVC that used SSE/2? I thought the generic compile only ended at ASM (just an half-wit suggestion).

edit: lvqcl- just realized it is patch, not compiler thing, report back when later. Also, I seem to recall something about 'early' SSE2 vs. 'true' SSE2 instruction after all, this is early Athlon64 processor and dilapidated :\

edit2: quick test of iccpatch definitely improved Rarewares P4 compile on my AMD about 15-20 percent at default Vorbis rate -q 3 setting.

aoTuVbeta6.02

Reply #34 – 2011-05-06 21:10:15

Back with a new batch of test results. Same commentary track as previous test in this thread but at -q3 (still overkill bitrate). Threw in blacksword lancer, which I included only as a perspective on optimizations.

Code: [Select]

Oggenc 2.83 aoTuv 5 Lancer 20061103 SSE2	31.956x	  89.0 kb/s
Oggenc 2.87 aoTuv 6.03 lvqcl SSE2		25.679x   89.9 kb/s
OggEnc 2.87 aoTuV 6.03 john33 P4 w/ICCPATCH	20.078x	  89.9 kb/s
Venc aoTuV 6.03					13.381x	  89.9 kb/s
OggEnc 2.87 aoTuV 6.03 john33 P4		12.335x	  89.9 kb/s

The ICCPATCH really does have quite an impact on this particular AMD processor running Rarewares P4 compile.

aoTuVbeta6.02

Reply #35 – 2011-05-07 07:03:13

I re-ran the Vorbis tests again, this time at -q2. Tested effect of ICCpatch on lvqcl's compile and changed to last Blacksword compile (1 whole week newer). I was also curious to test LAME compiles from Rarewares with ICCpatch. Here's the results:

Code: [Select]

using test WAV 16bit, 48KHz, 2ch, 1,025,507,372 bytes

encoder & version  (all run at -q2)            time    rate     filesize
_____________________________________________  ______  _______  ________________
Oggenc 2.83 aoTuv 5 Lancer 20061110 SSE2       2m 57s  30.196x  52,521,704 bytes
Oggenc 2.87 aoTuv 6.03 lvqcl SSE2              3m 27s  25.801x  51,621,665 bytes
Oggenc 2.87 aoTuv 6.03 lvqcl SSE2 w/ICCpatch   3m 34s  24.959x  51,621,665 bytes
OggEnc 2.87 aoTuV 6.03 john33 P4 w/ICCpatch    4m 30s  19.782x  51,621,285 bytes
Venc aoTuV 6.03                                6m 22s  13.978x  51,621,326 bytes
OggEnc 2.87 aoTuV 6.03 john33 P4               7m 10s  12.421x  51,621,530 bytes

Foobar2000 bit-compare tracks:
OGG files of lvqcl patched vs. unpatched = No differences in decoded data found
OGG files of john33 patched vs. unpatched = Differences found: 47294972 sample(s), starting at 3.2973333 second(s), peak: 0.0511622 at 4980.8489065 second(s), 2ch


version (all run at -V6)      time    rate     filesize
___________________________  ______  _______  ________________
LAME 3.98.4                  4m 52s  18.256x  60,421,848 bytes
LAME 3.98.4 (ICCpatch)       4m 46s  18.673x  60,421,848 bytes
LAME 3.99 beta 0             6m 29s  13.706x  59,409,552 bytes
LAME 3.99 beta 0 (ICCpatch)  4m 36s  19.306x  59,409,552 bytes

Foobar2000 bit-compare tracks:
MP3 files of 3.98.4 patched vs. unpatched = No differences in decoded data found
MP3 files of 3.99 beta 0 patched vs. unpatched = No differences in decoded data found

aoTuVbeta6.02

Reply #36 – 2011-05-07 10:09:14

Quote from: Destroid on 2011-05-06 21:10:15

Code: [Select]

Oggenc 2.87 aoTuv 6.03 lvqcl SSE2        25.679x   89.9 kb/s
OggEnc 2.87 aoTuV 6.03 john33 P4 w/ICCPATCH    20.078x      89.9 kb/s

As I said, my compiles (with some optimizations from Lancer) are 25...30% faster than pure C code. 25.679/20.078 = 1.28, as expected.

Quote from: Destroid on 2011-05-06 21:10:15

Code: [Select]

OggEnc 2.87 aoTuV 6.03 john33 P4        12.335x      89.9 kb/s

IMHO using /arch:.... option in addition to (or instead of) /Qax... should increase encoding speed on non-Intel processors.

Notice