Help - Search - Members - Calendar
Full Version: Ogg Vorbis optimized for speed
Hydrogenaudio Forums > Lossy Audio Compression > Ogg Vorbis > Ogg Vorbis - Tech
Pages: 1, 2, 3, 4, 5, 6, 7
rt87
Lancer 20060317 is released!

Translated by excite.co.jp, fine tuned by me wink.gif
--
2006/03/15 Lancer 20060317

When the DownMix function of oggdropXPd is used, the hang issue is corrected.
Oggpack_write new is added to the vorbis side for the performance improvement of DLL.
The SSE optimization code of _ve_amp is updated.
An unnecessary SSE optimization code in floor1.c is deleted.
The code in which the bug of GCC is evaded with inspect_error is added.
The enroll and the register renaming processing are executed by the code related to mdct_forward,
bark_noise_hybridmp, and fft.
The loop division point of bark_noise_hybridmp is calculated beforehand and it changes.
VEG
Lancer 20060331 is released!
esa372
QUOTE(VEG @ Mar 31 2006, 07:14 AM)
Thank you!
biggrin.gif
yong
Lancer 20060506
Now support SSE3 and Multithreading too. cool.gif
Tiis
QUOTE(yong @ May 7 2006, 16:01) *

Lancer 20060506
Now support SSE3 and Multithreading too. cool.gif


Nice! I love Lancer's Vorbis Tunings smile.gif SSE2 and MT for me.
toot
Some encode times with an AMD X2 4400+.. MT is very nice smile.gif

oggenc283_sse3mt_lancer20060506
File length: 75m 58.0s
Elapsed time: 1m 28.0s
Rate: 51.8139
Average bitrate: 192.8 kb/s

oggenc283_sse2mt_lancer20060506
File length: 75m 58.0s
Elapsed time: 1m 32.4s
Rate: 49.3090
Average bitrate: 192.8 kb/s

oggenc283_sse3_lancer20060506
File length: 75m 58.0s
Elapsed time: 2m 00.8s
Rate: 37.7426
Average bitrate: 192.8 kb/s

oggenc283_sse2_lancer20060506
File length: 75m 58.0s
Elapsed time: 2m 01.9s
Rate: 37.3896
Average bitrate: 192.8 kb/s

oggenc283_sse_lancer20060506
File length: 75m 58.0s
Elapsed time: 2m 01.4s
Rate: 37.5532
Average bitrate: 192.8 kb/s
pepoluan
Uhhh... so which one is which?

I have AthlonXP 2400+ (IIRC Barton core). Which one should I get.

Sorry my mind is a bit swimming at the moment... wacko.gif

ilikedirtthe2nd
QUOTE(pepoluan @ May 8 2006, 11:23) *

Uhhh... so which one is which?

I have AthlonXP 2400+ (IIRC Barton core). Which one should I get.

Sorry my mind is a bit swimming at the moment... wacko.gif


SSE Version
rt87
Lancer's dll crash my winamp+oddcast3 since 200603010 build.
till now, it is still now fixied. sad.gif
foxyshadis
Hmm, I guess this is still based off the old code and not Aoyumi's recent tunings? Multithreading is so cool, the problem is that it takes longer to read off the hard drive (or decode a flac) than to encode now. XD
jetpower
From the site:
QUOTE
Based on aotuv-b4.51_20051117

4.51beta is the latest version from Aoyumi.
http://www.geocities.jp/aoyoume/aotuv/index.html
No need to worry smile.gif
VEG
Lancer 20060512 (only MT) Released!
Home page
Cartman_Sr
Hey thanks for that link. I tried going there last night but couldn't figure out a single thing smile.gif I'm just getting into ogg vorbis now...

But I just tried those (oggenc2) and I got a Fatal error: This program is not designed to run on this machine. I have a P4, 2.1 (Dell), with Windows XP sp2. Are these new builds meant for AMD processors only? sad.gif

edit: Oh wait a minute, I was trying the SSE3 version, give me a few minutes to try the SSE2 version. BTW, what is SSE2/3 anyway? I think I'm in way over my head here.

edit 2: The SSE2 one does work. Ok, I'm a hoser laugh.gif
pepoluan
QUOTE(Cartman_Sr @ May 14 2006, 01:02) *

Hey thanks for that link. I tried going there last night but couldn't figure out a single thing smile.gif I'm just getting into ogg vorbis now...

But I just tried those (oggenc2) and I got a Fatal error: This program is not designed to run on this machine. I have a P4, 2.1 (Dell), with Windows XP sp2. Are these new builds meant for AMD processors only? sad.gif

edit: Oh wait a minute, I was trying the SSE3 version, give me a few minutes to try the SSE2 version. BTW, what is SSE2/3 anyway? I think I'm in way over my head here.

edit 2: The SSE2 one does work. Ok, I'm a hoser laugh.gif
biggrin.gif Hey I got confused too for a moment (see my post up there). I should've checked wikipedia first... it lists processors with SSE, SSE2, and SSE3.

What are these SSE-thingies? In a nutshell, they are special instructions to enable CPUs perform exotic calculations faster. SSE2 adds some instructions to SSE. SSE3 adds more instructions to SSE2. Of course there are CPU architecture evolution but let's KISS.

Soooo... I put in a (very very very) simplified guide on which version of Lancer you should use, in the Lancer page of HA Wiki.

Cartman_Sr
Hey thanks for adding that wiki page, makes way more sense now! But does the type of processor you have (32 bit vs. 64 bit) factor into it in any way? I know I have a 32 bit processor (does that sound right?), and a single core computer. The SSE2MT version does work on my computer.
pepoluan
QUOTE(Cartman_Sr @ May 14 2006, 03:45) *
Hey thanks for adding that wiki page, makes way more sense now! But does the type of processor you have (32 bit vs. 64 bit) factor into it in any way? I know I have a 32 bit processor (does that sound right?), and a single core computer. The SSE2MT version does work on my computer.
Um, the bit-ness of your processor is not related strictly to SSEx. For instance, compare Intel: P4 is 32-bit, yet it support SSE2 instructions. AMD did not get the opportunity to embed SSE2 instructions into their 32-bit line, and opt to add SSE2 into their 64-bit line.

So, whether your processor supports a certain version of SSEx or not, depends more on its release date than its bit-ness.

Edit: Updated the wiki page above slightly to explain the (theoretical) benefit of MT versions.
HotshotGG
QUOTE
What are these SSE-thingies? In a nutshell, they are special instructions to enable CPUs perform exotic calculations faster. SSE2 adds some instructions to SSE. SSE3 adds more instructions to SSE2. Of course there are CPU architecture evolution but let's KISS.


Yeah that really needs to be clarfied for a lot of folks. I gathered some information about them and rewrote that section in the wiki. It does help in the long run though.
pepoluan
QUOTE(HotshotGG @ May 16 2006, 04:45) *
QUOTE
What are these SSE-thingies? In a nutshell, they are special instructions to enable CPUs perform exotic calculations faster. SSE2 adds some instructions to SSE. SSE3 adds more instructions to SSE2. Of course there are CPU architecture evolution but let's KISS.
Yeah that really needs to be clarfied for a lot of folks. I gathered some information about them and rewrote that section in the wiki. It does help in the long run though.
The most sure-fire way to know which SSEx version your processor supports is to download all 5 Lancer OggEnc2 encoders and run them one by one. If your processor does not support the SSEx, OggEnc2 will exit gracefully, informing you so.

I've added this to the Lancer wiki page. Hope it helps.

Edit: stupid typo. Note to self: don't type something long while holding a lighted cigarette.
Mr_Rabid_Teddybear
There are also Windows programs that easily tell you the processor instructions for your system; wcpuid and cpu-z

Does people know of programs for Linux and Mac with similar function?
toot
QUOTE(Mr_Rabid_Teddybear @ May 18 2006, 13:56) *

There are also Windows programs that easily tell you the processor instructions for your system; wcpuid and cpu-z

Does people know of programs for Linux and Mac with similar function?


cat /proc/cpuinfo will work on Linux at least.. and maybe on Mac too since the newer OSs are Unix based I think..
pepoluan
Info about wcpuid and cpu-z is now part of the Lancer page. I'm not into Linux or any Unix, so please complement the info there if need be. Thanx.
tgb
I have and Athlon 64 x2 3800.
This new sse3 version is the cat's meow for fully utilizing both cores.
On comparison, I did find a strange phenomenon though.
Encode a whole album with 10 songs and the total time for the sse3mt version
took 10 seconds longer than if I boot two 2006/03/31 versions and oggdrop
5 songs in each simultaneously. Can anyone else reproduce this?
Is the threading overhead higher in the sse3mt version?
Not that I'm complaining mind you. The convenience factor is great with the sse3mt version!
Excellent work. The encoding speed is typically over 50x regardless!
My idea of a "killer app" here!

minor edit for grammar.
toot
QUOTE(tgb @ May 25 2006, 06:16) *

I have and Athlon 64 x2 3800.
This new sse3 version is the cat's meow for fully utilizing both cores.
On comparison, I did find a strange phenomenon though.
Encode a whole album with 10 songs and the total time for the sse3mt version
took 10 seconds longer than if I boot two 2006/03/31 versions and oggdrop
5 songs in each simultaneously. Can anyone else reproduce this?
Is the threading overhead higher in the sse3mt version?
Not that I'm complaining mind you. The convenience factor is great with the sse3mt version!
Excellent work. The encoding speed is typically over 50x regardless!
My idea of a "killer app" here!

minor edit for grammar.


It's normal that using a multi-threaded app over two cores won't give you a 100% boost over using one core.. It's usually something like 70% faster. In fact, a mere 10 seconds difference for a whole album is very good!
MedO
Of course double speed is seldom possible using multithreading since many problems can't be parallelised (or whatever it's called) completely. But in the case of encoding a batch of files with OggdropXpd, wouldn't it make more sense then to run one normal encoding thread per core, because that's 100% parallelised? Of course, when only one file is encoded, the multithreaded version is (if above post is correct) only slightly slower, but I think if it gives a speed advantage (which is what Lancer builds are all about I believe) one could implement this parallel encoding into the frontend.
Hope I'm making sense, I need some sleep...

MedO
PatchWorKs
Here we go: Lancer 20060529 Release

Changelog (by babelfish):

- Correcting the trouble of the decoding section.
iGold
Is it a chance for any version in near future to work correctly after compilation by GCC (preferably by 4.x branches)?
Latest version which correctly works after compile by gcc 3.3.6 is 20051121 (tested on Athlon XP 2200+ with SSE only support in Ubuntu Linux). All versions after this one give differrent bitrate in generated .ogg (compared to 'standard' - aoTuV b4.51).
Ogg Vorbis is standard lossy codec in Linux world and SSE(2,3) optimized version for Linux is a good support for the community.
MedO
If the bitrate difference is only slight, this is probably normal. IIRC, even the P3-Optimised version of the original AoTuV-encoder gives slightly different results than the generic build.
iGold
The difference is significant, about 2-3 kbps for -q 3 (on my test.wav it gives 112.3 kbps instead of 115.3 kbps in aoTuV 4.51).
I've tried to compile aoTuV by gcc 3.3/3.4/4.0 with or without compiler SSE optimization (-march=athlon-xp -mfpmath=sse) and bitrate was different only in hundredths. So this is definitely a bug.
pepoluan
Remember that standard aoTuV uses the FPU and Lancer uses SSE. They have different bit-length to represent real nums, and may thus cause different compression.

If in doubt, ABX.
iGold
Already compiled oggenc2/oggDropXPd for Windows give the same bitrate as unoptimized aoTuV, but its were compiled MSVC or similar, not by GCC. I think GCC just untested in new versions of Lancer.
haregoo
Lancer 20060616 released.

Edit: Fixed bug in decoding(SSE2)
VEG
Translated by google:
2006/06/16 Lancer 20060616
In one for AMD CPU replacing the CPU distinction processing of the DLL file
Optimizing vorbis_oggpack_look with the inline assembler
Adding SSE3 optimization processing to _mm_add_horz*
SSE optimizing oggdec, it adds
Correcting the trouble of the SSE2 optimization of ov_read_float2pcm
The decoding section of oggdropXPd SSE optimization
Optimization profile for multithread operation for single thread and joint ownership conversion
sony666
Didn't use Vorbis for some time but now I needed an encoder for some previews and tried the Lancer (2006 06 16th) one.

The speed is just sick, thanks to all involved in that smile.gif
Works great on my Athlon XP, normal SSE version.
pepoluan
LOL yeah I still got the warm-fuzzy-feeling everytime I encode using Lancer biggrin.gif
eloj
New release out today.

CODE
Changes:
* inline assembly replaces as much as possible to intrinsic
* abolish original memory transfer code in block.c
* bitreverse use looking up table
* fix speed down vorbis_book_decodevv_add's regression in lancer
  20060529
* remove optimization prevention code in vorbis_book_decodevv_add
* pre-calculate tables for triggers in mdct
* simplifying a code in which high frequency removed by mdct_backward
* add decode only funcs: mdct_butterflies_backward,
  dct_butterfly_first_backward
* improve SSE optimization: bark_noise_hybridmp
* add SSE optimization: render_line, vorbis_noise_normalize,
  _vp_noise_normalize
* add SSE3 optimization: mdct_bitreverse
* add pre-calculation code: seed_loop, max_seeds
* optimize: seed_chase
* add SORT16 to psy.c
* auto loop unrolling: SORT8, SORT32 in psy.c
* use lddqu in non SSE environment for unaligned memory load
* improve loop condiution code in inline assembly code
* add t option for oggdec benchmarks (without outputting file)
(courtesy of pub at cyanet.jp)

Good to see the asm being replaced with intrinsics.
HotshotGG
QUOTE
* add SSE optimization: render_line, vorbis_noise_normalize,
_vp_noise_normalize


SSE optimizations to the noise normalization code ey? that's interesting. Must be very fast wink.gif
PatchWorKs
Hope this guy will work on theora or dirac in the future !
I also hope to see SSE/SSE2/SS3 builds merged together and autoselects the optimizations on fly (like FLAC...)
haregoo
Lancer 20060722 is temporarily unavailable due to memory issue(unconfirmed).
rudefyet
What kind of memory issue? I've been using 20060722 with no problems.
Josef K.
QUOTE(rudefyet @ Aug 2 2006, 09:53) *

What kind of memory issue? I've been using 20060722 with no problems.

It's impossible to download from the page.
If someone could post Lancer 20060722 release (at least "oggenc2.83"), that would be great. Or just a link for dl, of course.
haregoo
Lancer 20060802 released.

This is bug fixed release.
20060722 had a memory leak according to author.
rt87
And the crash issue of Lancer DLL is going to be fixed.
eloj
Experimental Lancer 20060806 is up.

Altavista says:
"Being heap memory access error occurs with vorbis_oggpack_write it abolishes, the optimization module in oggpack_write movement
oggpack_look SSE optimization of optimization
_ve_amp cash control processing of modification
accumulate_fit being imperfect with correction
MDCT-RELATED cash control rearranging unnecessary zero data exception processing the SSE optimization description section of deletion _encodepart with correction inspect_error"

Err.. right.
skelly831
Wow! Optimization of optimization, that's gotta be fast!
MedO
QUOTE(skelly831 @ Aug 5 2006, 20:09) *

Wow! Optimization of optimization, that's gotta be fast!


None of the many "recent" optimizations provided a major speedup for me. Maybe it's 20x encoding with an old lancer and 22x with the sse2-optimized version. I have a Celeron M 1400Mhz, so the MT speedups don't help here. Still, the speed is great. What are your experiences/speeds/setups? Just curious...
Mo0zOoH
QUOTE(MedO @ Aug 5 2006, 22:27) *

What are your experiences/speeds/setups? Just curious...

I'm experiencing a speedup of about ~1.7x with Lancer 20060806 compared to the latest OggEnc from rarewares (average speed is something around 29x vs 17x, significantly depending on the sound material).

EDIT: My system is WinXP SP2 on an Athlon 64 3400+ (Venice).
PrakashP
If anybody has a Core 2 Duo, I would be interested how fast the SSE2 version is, as this new CPU doesn't break up SSE2 instructions intto 2 parts and thus can compute them directly.
HbG
Athlon XP 3200+ (2.2GHz), SSE:

Lancer 20060616 - 37.85x
Lancer 20060805 - 38.56x

So it's about 1.9% faster, nice but hardly significant.

I also recall that a previous version of lancer was about .5x faster than 06/16. Like linux kernels, not every next version is faster on every system. smile.gif

The SSE3 multithreading version should fly on Core2. I wouldn't be surprised if it encodes over 100x.
eloj
QUOTE(MedO @ Aug 5 2006, 19:27) *

What are your experiences/speeds/setups? Just curious...

Machine: AMD Athlon64 X2 3800+ (2GHz) @ 2.4GHz
Track: Machinae Supremacy - Elite.wav (4m 24.0s)
Options: -q 5

OggEnc (vorbis-tools Rev.10381): 13.9284x (19s)
OggEnc v2.83 (Lancer [20060805](SSE3MT) based on aoTuV b4b): 57.605572x (4.594s)
R.A.F.
It seems that there was something wrong with the latest Lancer-version (Lancer 20060802(Based on aotuv-b4.51_20051117)) for vorbis, because all files were taken (scratched) from the download-server. See actually his page.
This is a "lo-fi" version of our main content. To view the full version with more information, formatting and images, please click here.
Invision Power Board © 2001-2008 Invision Power Services, Inc.