Ogg Vorbis optimized for speed, ca. 1.5x faster than 1.1 original ver. |
Ogg Vorbis optimized for speed, ca. 1.5x faster than 1.1 original ver. |
Nov 4 2004, 20:11
Post
#1
|
|
|
Group: Members Posts: 169 Joined: 30-September 01 From: Tokyo, Japan Member No.: 99 |
Some Japanese guys work on speed optimization of libvorbis by using SSE. Blacksword (or 637) launched an Ogg Vorbis acceleration project (in Japanese only) and releases oggenc binary and libvorbis patch based on libvorbis 1.1. This optimization includes SSE implementations of FFT, MDCT, windowing, channel coupling, sorting, psymodel, floor/residue encode, and so on. In my computer (Pentium IV 2.4GHz), ICL8.1 compiled oggenc binary of the optimized version (Archer Beta03) encodes at 23.4x while the one without optimization (ICL8.1 compiled but no SSE patches) does at 15.5x. Hence, this optimization archives ca. 1.5x speed gain.
Unlike GoGo-no-coder, it's not forking: he releases a patch for libvorbis source code without absolutely changing algorithm or data structure. This is very good for source code maintenance to keep up with up-to-date official libvorbis, but limits optimization possibility in some degree. Actually, the author says in readme.txt that there's little room left for optimization. So I think it's time for quality evaluation although this optimization is in development stage. After several bugs are found and fixed for the last week, bitrates are quite similar to the reference encoder for all quality values. If you find any bugs or quality degressions from official 1.1 one, please tell us. Contributors are: - Blacksword (or 637)'s SSE optimization (Japanese only): A number of functions in libvorbis are vectorized to take advantage of SSE instruction set as well as Opt-Sort and wuvorbis. For complete list of optimized functions, see readme.txt (in Japanese but you may easily find it) attached with the binary. - Manuke's OptSort: Optimization of qsort function that consumes 20% of compression processing time, by assuming that _vp_quantize_couple_sort and _vp_noise_normalize_sort functions in psy.c call qsort with 8 or 32 element. This accelerates the whole compression process by 10%. - W.Dee's wuvorbisfile (Japanese only?): wuvorbis.dll is a fast Ogg Vorbis decoder with SSE and 3DNow!, which is a part of KiriKiri software (useful for developing multi-media contents or adventure games). wuvorbis.dll decodes 1.4x-1.8x faster (SSE) and 1.5x-1.9x faster (3DNow!) than official libvorbis. Happy encoding! |
|
|
|
![]() |
Mar 18 2005, 14:41
Post
#2
|
|
|
Group: Members Posts: 71 Joined: 24-March 02 Member No.: 1614 |
Archer RC2 is out.
|
|
|
|
Mar 18 2005, 15:32
Post
#3
|
|
![]() Group: Members Posts: 111 Joined: 25-November 04 From: village Member No.: 18344 |
QUOTE (eloj @ Mar 18 2005, 03:41 PM) Regrettably, exactly the same problem (with the same sample) as RC1 detected here (RC1 bug report can be found here) -------------------- Is there a difference between yes and no?
|
|
|
|
Mar 18 2005, 16:36
Post
#4
|
|
![]() Rarewares admin Group: Members Posts: 7515 Joined: 30-September 01 From: Brazil Member No.: 81 |
QUOTE (Josef K. @ Mar 18 2005, 11:32 AM) Regrettably, exactly the same problem (with the same sample) as RC1 detected here (RC1 bug report can be found here) What's the point of posting the report at a forum the developer probably doesn't read? If I were you I would send him an e-mail, and hope that he speaks at least some english. -------------------- Get up-to-date binaries of Lame, AAC, Vorbis and much more at RareWares:
http://www.rarewares.org |
|
|
|
Mar 18 2005, 16:42
Post
#5
|
|
|
Group: Members Posts: 71 Joined: 24-March 02 Member No.: 1614 |
QUOTE (rjamorim @ Mar 18 2005, 04:36 PM) If I were you I would send him an e-mail, and hope that he speaks at least some english. I just sent him an email referencing this thread and that specific post. It would probably be helpful if someone could supply a test file. Edit: I finally found a file that I have that crashes the encoder. It's track 14 off the Pain of Salvation - Be album. Let's see.. crash at around 65,3% completed.... Edit 2: Very tricky to pin down. This track only crashes at -q 3 of the different qualities I tried. Edit 3: Okay, running under debugger: "oggenc_archer.exe The instruction at 0x0042D568 referenced memory at 0xBF4EB730. The memory could not be read." CODE .text:0042D512 cvtss2si ecx, [eax+edi*4+0Ch] .text:0042D518 cvtss2si ebx, [eax+edi*4+8] .text:0042D51E cvtss2si esi, [eax+edi*4+4] .text:0042D524 cvtss2si eax, [eax+edi*4] .text:0042D529 mov edi, [esp+50h+var_20] .text:0042D52D add ecx, edi .text:0042D52F add ebx, edi .text:0042D531 add esi, edi .text:0042D533 add edi, eax .text:0042D535 mov eax, [esp+50h+var_18] .text:0042D539 imul eax, [edx+8] .text:0042D53D mov [esp+50h+var_20], edi .text:0042D541 mov edi, [esp+50h+var_14] .text:0042D545 add eax, [edi+ecx*4] .text:0042D548 imul eax, [edx+8] .text:0042D54C add eax, [edi+ebx*4] .text:0042D54F imul eax, [edx+8] .text:0042D553 add eax, [edi+esi*4] .text:0042D556 imul eax, [edx+8] .text:0042D55A mov edx, [esp+50h+var_20] .text:0042D55E add eax, [edi+edx*4] .text:0042D561 mov edx, [esp+50h+var_10] .text:0042D565 mov edx, [edx+8] .text:0042D568 cmp dword ptr [edx+eax*4], 0 <-------------- .text:0042D56C jle loc_42D6FE .text:0042D572 mov edx, [ebp+arg_C] .text:0042D575 mov ecx, [edx+10h] It's a function that starts at 0x42D2FC and takes four parameters. Seems to only get called explicitly from one place, but its address is taken twice, so it could called as a function pointer too?. I'm not familiar enough with the code to identify it any further, and I don't think I even have the tools to build the source. This post has been edited by eloj: Mar 18 2005, 17:32 |
|
|
|
nyaochi Ogg Vorbis optimized for speed Nov 4 2004, 20:11
dev0 fefe was working on a (apparently buggy) SSE optim... Nov 4 2004, 20:37
ilikedirtthe2nd I archived almost 100% (rather 85%, actually ) sp... Nov 4 2004, 23:04
TedFromAccounting Wow Now that is FAST. My results were similar to... Nov 5 2004, 01:22
nyaochi QUOTE (dev0 @ Nov 5 2004, 04:37 AM)fefe was w... Nov 5 2004, 03:14
Josef K. QUOTE (JensRex @ Nov 8 2004, 03:20 PM)I'd... Feb 23 2005, 20:29
QuantumKnot Whoa, it's really fast
On my P4 2.4 GHz:
I... Nov 6 2004, 02:05
Bonzi Pretty nice speedup here too:
oggenc from rareware... Nov 6 2004, 08:02
Music Mixer Hello!
Well, I have got an older machine (p3 ... Nov 6 2004, 08:10
Sebastian Mares According to my tests...
ICL 8.1 Standard:
CODEF... Nov 6 2004, 10:18
esa372 I got a good increase, too...
SSE2
CODE Fi... Nov 6 2004, 16:10
ilikedirtthe2nd QUOTE (esa372 @ Nov 6 2004, 03:10 PM)But I ca... Nov 6 2004, 16:24
dev0 QUOTE (ilikedirtthe2nd @ Nov 6 2004, 04:24 PM... Nov 6 2004, 16:46
john33 QUOTE (dev0 @ Nov 6 2004, 03:46 PM)The standa... Nov 6 2004, 16:52
esa372 QUOTE (ilikedirtthe2nd @ Nov 6 2004, 08:24 AM... Nov 6 2004, 17:01
ilikedirtthe2nd QUOTE It's a compile-time option AFAIK.
That ... Nov 6 2004, 17:19
esa372 QUOTE (ilikedirtthe2nd @ Nov 6 2004, 09:19 AM... Nov 6 2004, 17:25
nyaochi QUOTE (Music Mixer @ Nov 6 2004, 04:10 PM)Hav... Nov 6 2004, 18:20
Sebastian Mares QUOTE (esa372 @ Nov 6 2004, 04:10 PM)I got a ... Nov 6 2004, 19:05
esa372 QUOTE (Sebastian Mares @ Nov 6 2004, 11:05 AM... Nov 6 2004, 19:13
kjoonlee OK, here are some partial translations:
OggEnc_SS... Nov 6 2004, 19:17
nyaochi QUOTE (kjoonlee @ Nov 7 2004, 03:17 AM)OK, he... Nov 6 2004, 20:31
QuantumKnot IIRC, SSE2 is optimised for double point precision... Nov 7 2004, 02:55
Benjamin Lebsanft Tested on my AMD64 3400+, 1GB RAM
ICL 8.1:
File ... Nov 7 2004, 09:04
john33 As QK says, there's very little use of double ... Nov 7 2004, 10:41
Sebastian Mares QUOTE (john33 @ Nov 7 2004, 10:41 AM)As QK sa... Nov 7 2004, 11:51
nyaochi QUOTE (QuantumKnot @ Nov 7 2004, 10:55 AM)IIR... Nov 7 2004, 10:54
Poromenos OK, for the newb with no ability for critical thin... Nov 8 2004, 11:31
QuantumKnot QUOTE (Poromenos @ Nov 8 2004, 08:31 PM)OK, f... Nov 8 2004, 11:56
Sebastian Mares I see no speed gain when compared to the Pentium 4... Nov 8 2004, 13:26
JensRex I'd be more interested in decoder speedups - e... Nov 8 2004, 14:20
Gecko Here's a late reply. I tested on two titles an... Nov 11 2004, 22:49
[solid] how should i apply the patch? i get all hunks fail... Nov 12 2004, 01:05
ak I remeber trying to apply it, there were bunch of ... Nov 12 2004, 10:36
[solid] QUOTE (ak @ Nov 12 2004, 10:36 AM)For 1.1.0 r... Nov 12 2004, 10:58
nyaochi QUOTE (Sebastian Mares @ Nov 8 2004, 09:26 PM... Nov 12 2004, 21:44
Sebastian Mares QUOTE (nyaochi @ Nov 12 2004, 09:44 PM)QUOTE ... Nov 17 2004, 21:50
Benjamin Lebsanft on the first run i got 38.2381x, on the second run... Nov 12 2004, 22:15
jg123 It looks like the resample option is broken? I get... Nov 15 2004, 17:53
kuniklo Does anyone have the sse optimizations in the form... Nov 15 2004, 18:15
Bogalvator The patch is the first file on the project web pag... Nov 15 2004, 19:02
maacruz QUOTE (Bogalvator @ Nov 15 2004, 08:02 PM)The... Nov 16 2004, 18:21
nyaochi QUOTE (maacruz @ Nov 17 2004, 02:21 AM)It doe... Nov 17 2004, 09:18
nyaochi QUOTE (jg123 @ Nov 16 2004, 01:53 AM)It looks... Nov 17 2004, 16:30
maacruz QUOTE (nyaochi @ Nov 17 2004, 05:30 PM)QUOTE ... Nov 17 2004, 18:59
Benjamin Lebsanft Could anybody please provide a linux binary. As my... Nov 17 2004, 18:05
nyaochi QUOTE (maacruz @ Nov 18 2004, 02:59 AM)Hi nya... Nov 17 2004, 20:53
vearutop does anyone have binary aotuvb3 oggenc w/ sse patc... Dec 9 2004, 05:27
skamp QUOTE (vearutop @ Dec 9 2004, 05:27 AM)does a... Dec 11 2004, 06:48
vearutop thank you
do you have one for windows? Dec 15 2004, 05:15
QuantumKnot QUOTE (vearutop @ Dec 15 2004, 02:15 PM)thank... Dec 15 2004, 05:22
vearutop thnx
something wrong was with my eyes... i visite... Dec 15 2004, 05:32
vearutop strange thing...
i compressed track via standart a... Dec 15 2004, 05:45
rjamorim QUOTE (vearutop @ Dec 15 2004, 01:45 AM)i com... Dec 15 2004, 13:51
bluesky I used the build from this url.
Here's my res... Feb 6 2005, 03:05
nyaochi QUOTE (bluesky @ Feb 6 2005, 11:05 AM)Ideas?
... Feb 6 2005, 11:02
QuantumKnot Seems like a significant difference. Which specif... Feb 6 2005, 03:11
bluesky My mistake... correct data:
CODEDone encoding fil... Feb 6 2005, 19:41
Toe Has any testing been done on these builds with reg... Feb 6 2005, 22:00
DarkAvenger BTW, GCC 4.0 alpha snapshot from yesterday compile... Feb 21 2005, 12:46
Emanuel Do I dare asking John33 for an english OggdropXPd ... Feb 21 2005, 14:01
rjamorim QUOTE (Emanuel @ Feb 21 2005, 11:01 AM)Do I d... Feb 21 2005, 14:25
john33 QUOTE (Emanuel @ Feb 21 2005, 01:01 PM)Do I d... Feb 21 2005, 14:35
Emanuel QUOTE (rjamorim @ Feb 21 2005, 02:25 PM)I won... Feb 21 2005, 15:02
miscellanea QUOTE (Josef K. @ Feb 24 2005, 04:29 AM)OK, m... Mar 12 2005, 11:41
eloj Archer Release-Candidate 1 is out. Mar 12 2005, 13:13
Josef K. QUOTE (eloj @ Mar 12 2005, 02:13 PM)Archer Re... Mar 12 2005, 20:29
miscellanea Thanks. Now is the time to test again. Mar 12 2005, 13:17
rutra80 I also have a WAV which fails to encode with RC1 (... Mar 12 2005, 22:31
Zoom I can confirm the bug here too:
CODEOpening with ... Mar 12 2005, 23:22
Josef K. QUOTE (Zoom @ Mar 13 2005, 12:22 AM)20 second... Mar 12 2005, 23:32
Josef K. QUOTE (rjamorim @ Mar 18 2005, 05:36 PM)What... Mar 19 2005, 01:01
eloj Alright, the author got back to me. I'm going ... Mar 18 2005, 20:29
Josef K. QUOTE (eloj @ Mar 18 2005, 09:29 PM)Edit 2: G... Mar 19 2005, 01:26
DreamTactix291 Archer RC3 is out. Mar 19 2005, 06:53
eloj F:\wav\archer>oggenc_archer -v
OggEnc... Mar 19 2005, 11:30
rutra80 Well, bad news I think - my WAV still doesn't ... Mar 19 2005, 12:51
eloj I can confirm that 32KHz files don't work at a... Mar 19 2005, 13:52
eloj ... and RC4 is out. Mar 19 2005, 16:14
rutra80 Seems to work fine now Mar 19 2005, 20:17
rt87 Bump for new version of Lancer 2005028 Release (Ba... May 28 2005, 07:45
rudefyet oh great....you made me wet my pants again
EDIT: ... May 28 2005, 08:00
ilikedirtthe2nd Speed increased slightly on my system (AMD XP 1800... May 28 2005, 13:25
de Mon QUOTE (ilikedirtthe2nd @ May 28 2005, 04:25 A... May 28 2005, 21:31
Josef K. QUOTE (de Mon @ May 28 2005, 10:31 PM)QUOTE (... May 28 2005, 23:10
rutra80 QUOTE (de Mon @ May 28 2005, 10:31 PM)are the... May 29 2005, 02:48
Bonzi QUOTE (rutra80 @ May 28 2005, 05:48 PM)QUOTE ... May 29 2005, 03:10
eloj Run with the input file disk-cache hot.
Archer -q... May 28 2005, 14:20
Latexxx The next generation consoles won't ne pushing ... May 28 2005, 14:37
rutra80 QUOTE (rudefyet @ May 28 2005, 09:00 AM)EDIT:... May 29 2005, 05:02
rudefyet the bitrates are identical
but the resulting file... May 29 2005, 05:04
sh1leshk4 Is different vendor strings may be the cause of it... May 29 2005, 07:42
rutra80 QUOTE (sh1leshk4 @ May 29 2005, 08:42 AM)Is d... May 29 2005, 09:18
Gecko But the differences are only sporadic.
If you do a... May 29 2005, 10:18
Vax the size of aoTuV pre-beta4 [20050412] is 1.36 Mo
... Jun 7 2005, 21:14
rutra80 Lancer is probably packed with UPX or something an... Jun 7 2005, 23:27![]() ![]() |
|
Lo-Fi Version | Time is now: 24th May 2013 - 05:40 |