Lancer 20060317 is released!
Translated by excite.co.jp, fine tuned by me

--
2006/03/15 Lancer 20060317
When the DownMix function of oggdropXPd is used, the hang issue is corrected.
Oggpack_write new is added to the vorbis side for the performance improvement of DLL.
The SSE optimization code of _ve_amp is updated.
An unnecessary SSE optimization code in floor1.c is deleted.
The code in which the bug of GCC is evaded with inspect_error is added.
The enroll and the register renaming processing are executed by the code related to mdct_forward,
bark_noise_hybridmp, and fft.
The loop division point of bark_noise_hybridmp is calculated beforehand and it changes.
esa372
Mar 31 2006, 10:42
QUOTE(VEG @ Mar 31 2006, 07:14 AM)
Thank you!
Lancer 20060506 Now support SSE3 and Multithreading too.
QUOTE(yong @ May 7 2006, 16:01)

Lancer 20060506 Now support SSE3 and Multithreading too.

Nice! I love Lancer's Vorbis Tunings

SSE2 and MT for me.
Some encode times with an AMD X2 4400+.. MT is very nice

oggenc283_sse3mt_lancer20060506
File length: 75m 58.0s
Elapsed time:
1m 28.0s Rate:
51.8139 Average bitrate: 192.8 kb/s
oggenc283_sse2mt_lancer20060506
File length: 75m 58.0s
Elapsed time:
1m 32.4s Rate:
49.3090 Average bitrate: 192.8 kb/s
oggenc283_sse3_lancer20060506
File length: 75m 58.0s
Elapsed time:
2m 00.8s Rate:
37.7426 Average bitrate: 192.8 kb/s
oggenc283_sse2_lancer20060506
File length: 75m 58.0s
Elapsed time:
2m 01.9s Rate:
37.3896 Average bitrate: 192.8 kb/s
oggenc283_sse_lancer20060506
File length: 75m 58.0s
Elapsed time:
2m 01.4s Rate:
37.5532 Average bitrate: 192.8 kb/s
pepoluan
May 8 2006, 05:23
Uhhh... so which one is which?
I have AthlonXP 2400+ (IIRC Barton core). Which one should I get.
Sorry my mind is a bit swimming at the moment...
ilikedirtthe2nd
May 8 2006, 06:06
QUOTE(pepoluan @ May 8 2006, 11:23)

Uhhh... so which one is which?
I have AthlonXP 2400+ (IIRC Barton core). Which one should I get.
Sorry my mind is a bit swimming at the moment...

SSE Version
Lancer's dll crash my winamp+oddcast3 since 200603010 build.
till now, it is still now fixied.
foxyshadis
May 8 2006, 18:41
Hmm, I guess this is still based off the old code and not Aoyumi's recent tunings? Multithreading is so cool, the problem is that it takes longer to read off the hard drive (or decode a flac) than to encode now. XD
jetpower
May 8 2006, 19:05
From the site:
QUOTE
Based on aotuv-b4.51_20051117
4.51beta is the latest version from Aoyumi.
http://www.geocities.jp/aoyoume/aotuv/index.htmlNo need to worry
Lancer 20060512 (only MT) Released!
Home page
Cartman_Sr
May 13 2006, 12:02
Hey thanks for that link. I tried going there last night but couldn't figure out a single thing

I'm just getting into ogg vorbis now...
But I just tried those (oggenc2) and I got a Fatal error: This program is not designed to run on this machine. I have a P4, 2.1 (Dell), with Windows XP sp2. Are these new builds meant for AMD processors only?

edit: Oh wait a minute, I was trying the SSE3 version, give me a few minutes to try the SSE2 version. BTW, what is SSE2/3 anyway? I think I'm in way over my head here.
edit 2: The SSE2 one does work. Ok, I'm a hoser
pepoluan
May 13 2006, 12:49
QUOTE(Cartman_Sr @ May 14 2006, 01:02)

Hey thanks for that link. I tried going there last night but couldn't figure out a single thing

I'm just getting into ogg vorbis now...
But I just tried those (oggenc2) and I got a Fatal error: This program is not designed to run on this machine. I have a P4, 2.1 (Dell), with Windows XP sp2. Are these new builds meant for AMD processors only?

edit: Oh wait a minute, I was trying the SSE3 version, give me a few minutes to try the SSE2 version. BTW, what is SSE2/3 anyway? I think I'm in way over my head here.
edit 2: The SSE2 one does work. Ok, I'm a hoser


Hey I got confused too for a moment (see my post up there). I should've checked wikipedia first... it lists processors with SSE, SSE2, and SSE3.
What are these SSE-thingies? In a nutshell, they are special instructions to enable CPUs perform exotic calculations faster. SSE2 adds some instructions to SSE. SSE3 adds more instructions to SSE2. Of course there are CPU architecture evolution but let's KISS.
Soooo... I put in a (very very very) simplified guide on which version of Lancer you should use, in the
Lancer page of HA Wiki.
Cartman_Sr
May 13 2006, 14:45
Hey thanks for adding that wiki page, makes way more sense now! But does the type of processor you have (32 bit vs. 64 bit) factor into it in any way? I know I have a 32 bit processor (does that sound right?), and a single core computer. The SSE2MT version does work on my computer.
pepoluan
May 15 2006, 11:47
QUOTE(Cartman_Sr @ May 14 2006, 03:45)

Hey thanks for adding that wiki page, makes way more sense now! But does the type of processor you have (32 bit vs. 64 bit) factor into it in any way? I know I have a 32 bit processor (does that sound right?), and a single core computer. The SSE2MT version does work on my computer.
Um, the bit-ness of your processor is not related strictly to SSEx. For instance, compare Intel: P4 is 32-bit, yet it support SSE2 instructions. AMD did not get the opportunity to embed SSE2 instructions into their 32-bit line, and opt to add SSE2 into their 64-bit line.
So, whether your processor supports a certain version of SSEx or not, depends more on its release date than its bit-ness.
Edit: Updated the wiki page above slightly to explain the (theoretical) benefit of MT versions.
HotshotGG
May 15 2006, 15:45
QUOTE
What are these SSE-thingies? In a nutshell, they are special instructions to enable CPUs perform exotic calculations faster. SSE2 adds some instructions to SSE. SSE3 adds more instructions to SSE2. Of course there are CPU architecture evolution but let's KISS.
Yeah that really needs to be clarfied for a lot of folks. I gathered some information about them and rewrote that section in the wiki. It does help in the long run though.
pepoluan
May 16 2006, 00:58
QUOTE(HotshotGG @ May 16 2006, 04:45)

QUOTE
What are these SSE-thingies? In a nutshell, they are special instructions to enable CPUs perform exotic calculations faster. SSE2 adds some instructions to SSE. SSE3 adds more instructions to SSE2. Of course there are CPU architecture evolution but let's KISS.
Yeah that really needs to be clarfied for a lot of folks. I gathered some information about them and rewrote that section in the wiki. It does help in the long run though.
The most sure-fire way to know which SSEx version your processor supports is to download all 5 Lancer OggEnc2 encoders and run them one by one. If your processor does not support the SSEx, OggEnc2 will exit gracefully, informing you so.
I've added this to
the Lancer wiki page. Hope it helps.
Edit: stupid typo. Note to self: don't type something long while holding a lighted cigarette.
Mr_Rabid_Teddybear
May 18 2006, 06:56
There are also Windows programs that easily tell you the processor instructions for your system;
wcpuid and
cpu-zDoes people know of programs for Linux and Mac with similar function?
QUOTE(Mr_Rabid_Teddybear @ May 18 2006, 13:56)

There are also Windows programs that easily tell you the processor instructions for your system;
wcpuid and
cpu-zDoes people know of programs for Linux and Mac with similar function?
cat /proc/cpuinfo will work on Linux at least.. and maybe on Mac too since the newer OSs are Unix based I think..
pepoluan
May 18 2006, 11:46
Info about wcpuid and cpu-z is now part of the
Lancer page. I'm not into Linux or any Unix, so please complement the info there if need be. Thanx.
I have and Athlon 64 x2 3800.
This new sse3 version is the cat's meow for fully utilizing both cores.
On comparison, I did find a strange phenomenon though.
Encode a whole album with 10 songs and the total time for the sse3mt version
took 10 seconds longer than if I boot two 2006/03/31 versions and oggdrop
5 songs in each simultaneously. Can anyone else reproduce this?
Is the threading overhead higher in the sse3mt version?
Not that I'm complaining mind you. The convenience factor is great with the sse3mt version!
Excellent work. The encoding speed is typically over 50x regardless!
My idea of a "killer app" here!
minor edit for grammar.
QUOTE(tgb @ May 25 2006, 06:16)

I have and Athlon 64 x2 3800.
This new sse3 version is the cat's meow for fully utilizing both cores.
On comparison, I did find a strange phenomenon though.
Encode a whole album with 10 songs and the total time for the sse3mt version
took 10 seconds longer than if I boot two 2006/03/31 versions and oggdrop
5 songs in each simultaneously. Can anyone else reproduce this?
Is the threading overhead higher in the sse3mt version?
Not that I'm complaining mind you. The convenience factor is great with the sse3mt version!
Excellent work. The encoding speed is typically over 50x regardless!
My idea of a "killer app" here!
minor edit for grammar.
It's normal that using a multi-threaded app over two cores won't give you a 100% boost over using one core.. It's usually something like 70% faster. In fact, a mere 10 seconds difference for a whole album is very good!
Of course double speed is seldom possible using multithreading since many problems can't be parallelised (or whatever it's called) completely. But in the case of encoding a batch of files with OggdropXpd, wouldn't it make more sense then to run one normal encoding thread per core, because that's 100% parallelised? Of course, when only one file is encoded, the multithreaded version is (if above post is correct) only slightly slower, but I think if it gives a speed advantage (which is what Lancer builds are all about I believe) one could implement this parallel encoding into the frontend.
Hope I'm making sense, I need some sleep...
MedO
PatchWorKs
May 29 2006, 09:04
Here we go:
Lancer 20060529 ReleaseChangelog (by babelfish):
- Correcting the trouble of the decoding section.
Is it a chance for any version in near future to work correctly after compilation by GCC (preferably by 4.x branches)?
Latest version which correctly works after compile by gcc 3.3.6 is 20051121 (tested on Athlon XP 2200+ with SSE only support in Ubuntu Linux). All versions after this one give differrent bitrate in generated .ogg (compared to 'standard' - aoTuV b4.51).
Ogg Vorbis is standard lossy codec in Linux world and SSE(2,3) optimized version for Linux is a good support for the community.
If the bitrate difference is only slight, this is probably normal. IIRC, even the P3-Optimised version of the original AoTuV-encoder gives slightly different results than the generic build.
The difference is significant, about 2-3 kbps for -q 3 (on my test.wav it gives 112.3 kbps instead of 115.3 kbps in aoTuV 4.51).
I've tried to compile aoTuV by gcc 3.3/3.4/4.0 with or without compiler SSE optimization (-march=athlon-xp -mfpmath=sse) and bitrate was different only in hundredths. So this is definitely a bug.
pepoluan
May 30 2006, 07:02
Remember that standard aoTuV uses the FPU and Lancer uses SSE. They have different bit-length to represent real nums, and may thus cause different compression.
If in doubt, ABX.
Already compiled oggenc2/oggDropXPd for Windows give the same bitrate as unoptimized aoTuV, but its were compiled MSVC or similar, not by GCC. I think GCC just untested in new versions of Lancer.
haregoo
Jun 15 2006, 09:47
Lancer 20060616 released.
Edit: Fixed bug in decoding(SSE2)
Translated by google:
2006/06/16 Lancer 20060616
In one for AMD CPU replacing the CPU distinction processing of the DLL file
Optimizing vorbis_oggpack_look with the inline assembler
Adding SSE3 optimization processing to _mm_add_horz*
SSE optimizing oggdec, it adds
Correcting the trouble of the SSE2 optimization of ov_read_float2pcm
The decoding section of oggdropXPd SSE optimization
Optimization profile for multithread operation for single thread and joint ownership conversion
sony666
Jul 14 2006, 04:20
Didn't use Vorbis for some time but now I needed an encoder for some previews and tried the Lancer (2006 06 16th) one.
The speed is just sick, thanks to all involved in that

Works great on my Athlon XP, normal SSE version.
pepoluan
Jul 14 2006, 12:14
LOL yeah I still got the warm-fuzzy-feeling everytime I encode using Lancer
New release out today.
CODE
Changes:
* inline assembly replaces as much as possible to intrinsic
* abolish original memory transfer code in block.c
* bitreverse use looking up table
* fix speed down vorbis_book_decodevv_add's regression in lancer
20060529
* remove optimization prevention code in vorbis_book_decodevv_add
* pre-calculate tables for triggers in mdct
* simplifying a code in which high frequency removed by mdct_backward
* add decode only funcs: mdct_butterflies_backward,
dct_butterfly_first_backward
* improve SSE optimization: bark_noise_hybridmp
* add SSE optimization: render_line, vorbis_noise_normalize,
_vp_noise_normalize
* add SSE3 optimization: mdct_bitreverse
* add pre-calculation code: seed_loop, max_seeds
* optimize: seed_chase
* add SORT16 to psy.c
* auto loop unrolling: SORT8, SORT32 in psy.c
* use lddqu in non SSE environment for unaligned memory load
* improve loop condiution code in inline assembly code
* add t option for oggdec benchmarks (without outputting file)
(courtesy of pub at cyanet.jp)
Good to see the asm being replaced with intrinsics.
HotshotGG
Jul 22 2006, 12:57
QUOTE
* add SSE optimization: render_line, vorbis_noise_normalize,
_vp_noise_normalize
SSE optimizations to the noise normalization code ey? that's interesting. Must be very fast
PatchWorKs
Jul 24 2006, 14:44
Hope this guy will work on theora or dirac in the future !
I also hope to see SSE/SSE2/SS3 builds merged together and autoselects the optimizations on fly (like FLAC...)
haregoo
Aug 1 2006, 15:09
Lancer 20060722 is temporarily unavailable due to memory issue(unconfirmed).
rudefyet
Aug 2 2006, 01:53
What kind of memory issue? I've been using 20060722 with no problems.
Josef K.
Aug 2 2006, 04:59
QUOTE(rudefyet @ Aug 2 2006, 09:53)

What kind of memory issue? I've been using 20060722 with no problems.
It's impossible to download from the page.
If someone could post Lancer 20060722 release (at least "oggenc2.83"), that would be great. Or just a link for dl, of course.
haregoo
Aug 2 2006, 05:09
Lancer 20060802 released.
This is bug fixed release.
20060722 had a memory leak according to author.
And the crash issue of Lancer DLL is going to be fixed.
Experimental Lancer 20060806 is up.
Altavista says:
"Being heap memory access error occurs with vorbis_oggpack_write it abolishes, the optimization module in oggpack_write movement
oggpack_look SSE optimization of optimization
_ve_amp cash control processing of modification
accumulate_fit being imperfect with correction
MDCT-RELATED cash control rearranging unnecessary zero data exception processing the SSE optimization description section of deletion _encodepart with correction inspect_error"
Err.. right.
skelly831
Aug 5 2006, 12:09
Wow! Optimization of optimization, that's gotta be fast!
QUOTE(skelly831 @ Aug 5 2006, 20:09)

Wow! Optimization of optimization, that's gotta be fast!
None of the many "recent" optimizations provided a major speedup for me. Maybe it's 20x encoding with an old lancer and 22x with the sse2-optimized version. I have a Celeron M 1400Mhz, so the MT speedups don't help here. Still, the speed is great. What are your experiences/speeds/setups? Just curious...
Mo0zOoH
Aug 5 2006, 14:31
QUOTE(MedO @ Aug 5 2006, 22:27)

What are your experiences/speeds/setups? Just curious...
I'm experiencing a speedup of about ~1.7x with Lancer 20060806 compared to the latest OggEnc from rarewares (average speed is something around 29x vs 17x,
significantly depending on the sound material).
EDIT: My system is WinXP SP2 on an Athlon 64 3400+ (Venice).
PrakashP
Aug 5 2006, 16:05
If anybody has a Core 2 Duo, I would be interested how fast the SSE2 version is, as this new CPU doesn't break up SSE2 instructions intto 2 parts and thus can compute them directly.
Athlon XP 3200+ (2.2GHz), SSE:
Lancer 20060616 - 37.85x
Lancer 20060805 - 38.56x
So it's about 1.9% faster, nice but hardly significant.
I also recall that a previous version of lancer was about .5x
faster than 06/16. Like linux kernels, not every next version is faster on every system.

The SSE3 multithreading version should fly on Core2. I wouldn't be surprised if it encodes over 100x.
QUOTE(MedO @ Aug 5 2006, 19:27)

What are your experiences/speeds/setups? Just curious...
Machine: AMD Athlon64 X2 3800+ (2GHz) @ 2.4GHz
Track: Machinae Supremacy - Elite.wav (4m 24.0s)
Options: -q 5
OggEnc (vorbis-tools Rev.10381):
13.9284x (19s)
OggEnc v2.83 (Lancer [20060805](SSE3MT) based on aoTuV b4b):
57.605572x (4.594s)
It seems that there was something wrong with the latest Lancer-version (Lancer 20060802(Based on aotuv-b4.51_20051117)) for vorbis, because all files were taken (scratched) from the download-server. See actually
his page.
This is a "lo-fi" version of our main content. To view the full version with more information, formatting and images, please
click here.