Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: aoTuVbeta6.02 (Read 27173 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

aoTuVbeta6.02

Reply #25
In the meantime you can test my compile.
It doesn't have built-in FLAC reader and resampler but since you use oggenc2 as encoding backend for foobar2000 they are useless anyway.

What compiler optimisations are you using?

aoTuVbeta6.02

Reply #26
I have compared speeds of my test oggenc compile posted earlier with Rarewarez's and the speeds are up to pair, used IntelC++ with maximum optimizations, I'd be curious too about the LancerMod compiling params, maybe reduced floating precission? But still I don't expect so much speedup from only that.

aoTuVbeta6.02

Reply #27
Quick comparison (single thread):

john33 (x64) -  45.15x realtime

lvqcl (x64, SSE3) - 60.50x realtime

Windows 7 x64, Intel Core i3 530
🇺🇦 Glory to Ukraine!

aoTuVbeta6.02

Reply #28
Some SSE optimizations from Lancer code (for aoTuV 5) are still applicable for aoTuV 6. But the speed increase is 25...30% only.

My tests (Core2 Q9300 @2.5 GHz):
Code: [Select]
venc: 20.9x realtime

Rarewares compiles:
generic: 21.2x
P4: 34.5x
x64: 37.1x

My compiles of oggenc2 without code from Lancer:
32-bit: 34.2x
x64: 36.8x
(almost the same as oggenc2 from Rarewares)

My compiles of oggenc2 with code from Lancer (these were uploaded):
32-bit SSE: 38.1x
32-bit SSE2: 46.1x
32-bit SSE3: 46.0x

64-bit SSE2: 47.8x
64-bit SSE3: 48.9x

I DIDN'T test these compiles on AMD processors.


What compiler optimisations are you using?

Compiler: MSVS 2010 SP1 + Intel Composer XE 2011 upd3.
Options:
Whole program optimization = Yes
C/C++ optimization: /O3 /Ob2 /Oi /Ot /Qipo
Code Generation: /GF /MT /GS- /arch:SSE3 /fp:fast=2

Since your compiles are a bit faster (well, less that 1%, but anyway) may I ask about your compiler options?

aoTuVbeta6.02

Reply #29
Some SSE optimizations from Lancer code (for aoTuV 5) are still applicable for aoTuV 6. But the speed increase is 25...30% only.
...

I didn't realise that you had ported some of the Lancer mods.
Compiler: MSVS 2010 SP1 + Intel Composer XE 2011 upd3.
Options:
Whole program optimization = Yes
C/C++ optimization: /O3 /Ob2 /Oi /Ot /Qipo
Code Generation: /GF /MT /GS- /arch:SSE3 /fp:fast=2

Since your compiles are a bit faster (well, less that 1%, but anyway) may I ask about your compiler options?

Compiler: MSVS 2008 + Intel Compiler 11.1.067.
Options:
Whole program optimization = No
C/C++ optimisation: /O3 /Ob2 /Oi /Ot /Og /Qip /Qfp-speculation:fast
Code Generation: /GF /EHsc /MT /GS /QaxSSSE3 /fp:fast
(That's for x64)
I've not tried fast=2, does that win you anything?

The P4 compile is the same except: /arch:IA32 /QaxSSE2 in place of /QaxSSSE3

 

aoTuVbeta6.02

Reply #30
Quote
I've not tried fast=2, does that win you anything?

I just tested and it turns out that /fp:fast=2 is ~0.3% faster (IMHO it is within statistical error).

Destroid: can you patch oggenc2 from Rarewares with iccpatch utility (several are mentioned on this page) and test again?

aoTuVbeta6.02

Reply #31
Quote
I've not tried fast=2, does that win you anything?

I just tested and it turns out that /fp:fast=2 is ~0.3% faster (IMHO it is within statistical error).
...

I'll give it a try.

aoTuVbeta6.02

Reply #32
Quote
I just tested and it turns out that /fp:fast=2 is ~0.3% faster (IMHO it is within statistical error).

But how about sound quality? Is it affected? You know, 0.3% ain't much.

aoTuVbeta6.02

Reply #33
Destroid: can you patch oggenc2 from Rarewares with iccpatch utility (several are mentioned on this page) and test again?
I am sorry to inform that I have not tried compiling these encoders before.
But... I can concur with your some of your other benches:
Quote
My tests (Core2 Q9300 @2.5 GHz):
CODE
venc: 20.9x realtime
...
My compiles of oggenc2 without code from Lancer:
32-bit: 34.2x

If in regards to the ICL "bias" in disfavor of AMD, I'm not 100% sure if this is the case.

Would it be worth asking john33 like to attempt compiles of MSVC that used SSE/2? I thought the generic compile only ended at ASM (just an half-wit suggestion).

edit: lvqcl- just realized it is patch, not compiler thing, report back when later. Also, I seem to recall something about 'early' SSE2 vs. 'true' SSE2 instruction  after all, this is early Athlon64 processor and dilapidated :\

edit2: quick test of iccpatch definitely improved Rarewares P4 compile on my AMD about 15-20 percent at default Vorbis rate -q 3 setting.
"Something bothering you, Mister Spock?"

aoTuVbeta6.02

Reply #34
Back with a new batch of test results. Same commentary track as previous test in this thread but at -q3 (still overkill bitrate). Threw in blacksword lancer, which I included only as a perspective on optimizations.
Code: [Select]
Oggenc 2.83 aoTuv 5 Lancer 20061103 SSE2	31.956x	  89.0 kb/s
Oggenc 2.87 aoTuv 6.03 lvqcl SSE2 25.679x  89.9 kb/s
OggEnc 2.87 aoTuV 6.03 john33 P4 w/ICCPATCH 20.078x   89.9 kb/s
Venc aoTuV 6.03 13.381x   89.9 kb/s
OggEnc 2.87 aoTuV 6.03 john33 P4 12.335x   89.9 kb/s
The ICCPATCH really does have quite an impact on this particular AMD processor running Rarewares P4 compile.
"Something bothering you, Mister Spock?"

aoTuVbeta6.02

Reply #35
I re-ran the Vorbis tests again, this time at -q2. Tested effect of ICCpatch on lvqcl's compile and changed to last Blacksword compile (1 whole week newer). I was also curious to test LAME compiles from Rarewares with ICCpatch. Here's the results:

Code: [Select]
using test WAV 16bit, 48KHz, 2ch, 1,025,507,372 bytes

encoder & version  (all run at -q2)            time    rate    filesize
_____________________________________________  ______  _______  ________________
Oggenc 2.83 aoTuv 5 Lancer 20061110 SSE2      2m 57s  30.196x  52,521,704 bytes
Oggenc 2.87 aoTuv 6.03 lvqcl SSE2              3m 27s  25.801x  51,621,665 bytes
Oggenc 2.87 aoTuv 6.03 lvqcl SSE2 w/ICCpatch  3m 34s  24.959x  51,621,665 bytes
OggEnc 2.87 aoTuV 6.03 john33 P4 w/ICCpatch    4m 30s  19.782x  51,621,285 bytes
Venc aoTuV 6.03                                6m 22s  13.978x  51,621,326 bytes
OggEnc 2.87 aoTuV 6.03 john33 P4              7m 10s  12.421x  51,621,530 bytes

Foobar2000 bit-compare tracks:
OGG files of lvqcl patched vs. unpatched = No differences in decoded data found
OGG files of john33 patched vs. unpatched = Differences found: 47294972 sample(s), starting at 3.2973333 second(s), peak: 0.0511622 at 4980.8489065 second(s), 2ch


version (all run at -V6)      time    rate    filesize
___________________________  ______  _______  ________________
LAME 3.98.4                  4m 52s  18.256x  60,421,848 bytes
LAME 3.98.4 (ICCpatch)      4m 46s  18.673x  60,421,848 bytes
LAME 3.99 beta 0            6m 29s  13.706x  59,409,552 bytes
LAME 3.99 beta 0 (ICCpatch)  4m 36s  19.306x  59,409,552 bytes

Foobar2000 bit-compare tracks:
MP3 files of 3.98.4 patched vs. unpatched = No differences in decoded data found
MP3 files of 3.99 beta 0 patched vs. unpatched = No differences in decoded data found
"Something bothering you, Mister Spock?"

aoTuVbeta6.02

Reply #36
Code: [Select]
Oggenc 2.87 aoTuv 6.03 lvqcl SSE2        25.679x   89.9 kb/s
OggEnc 2.87 aoTuV 6.03 john33 P4 w/ICCPATCH    20.078x      89.9 kb/s

As I said, my compiles (with some optimizations from Lancer) are 25...30% faster than pure C code.  25.679/20.078 = 1.28, as expected.

Code: [Select]
OggEnc 2.87 aoTuV 6.03 john33 P4        12.335x      89.9 kb/s

IMHO using /arch:.... option in addition to (or instead of) /Qax...  should increase encoding speed on non-Intel processors.