Ogg Vorbis optimized for speed, ca. 1.5x faster than 1.1 original ver. |
![]() ![]() |
Ogg Vorbis optimized for speed, ca. 1.5x faster than 1.1 original ver. |
Nov 4 2004, 20:11
Post
#1
|
|
|
Group: Members Posts: 169 Joined: 30-September 01 From: Tokyo, Japan Member No.: 99 |
Some Japanese guys work on speed optimization of libvorbis by using SSE. Blacksword (or 637) launched an Ogg Vorbis acceleration project (in Japanese only) and releases oggenc binary and libvorbis patch based on libvorbis 1.1. This optimization includes SSE implementations of FFT, MDCT, windowing, channel coupling, sorting, psymodel, floor/residue encode, and so on. In my computer (Pentium IV 2.4GHz), ICL8.1 compiled oggenc binary of the optimized version (Archer Beta03) encodes at 23.4x while the one without optimization (ICL8.1 compiled but no SSE patches) does at 15.5x. Hence, this optimization archives ca. 1.5x speed gain.
Unlike GoGo-no-coder, it's not forking: he releases a patch for libvorbis source code without absolutely changing algorithm or data structure. This is very good for source code maintenance to keep up with up-to-date official libvorbis, but limits optimization possibility in some degree. Actually, the author says in readme.txt that there's little room left for optimization. So I think it's time for quality evaluation although this optimization is in development stage. After several bugs are found and fixed for the last week, bitrates are quite similar to the reference encoder for all quality values. If you find any bugs or quality degressions from official 1.1 one, please tell us. Contributors are: - Blacksword (or 637)'s SSE optimization (Japanese only): A number of functions in libvorbis are vectorized to take advantage of SSE instruction set as well as Opt-Sort and wuvorbis. For complete list of optimized functions, see readme.txt (in Japanese but you may easily find it) attached with the binary. - Manuke's OptSort: Optimization of qsort function that consumes 20% of compression processing time, by assuming that _vp_quantize_couple_sort and _vp_noise_normalize_sort functions in psy.c call qsort with 8 or 32 element. This accelerates the whole compression process by 10%. - W.Dee's wuvorbisfile (Japanese only?): wuvorbis.dll is a fast Ogg Vorbis decoder with SSE and 3DNow!, which is a part of KiriKiri software (useful for developing multi-media contents or adventure games). wuvorbis.dll decodes 1.4x-1.8x faster (SSE) and 1.5x-1.9x faster (3DNow!) than official libvorbis. Happy encoding! |
|
|
|
Nov 4 2004, 20:37
Post
#2
|
|
![]() Group: Developer Posts: 1679 Joined: 23-December 01 From: Germany Member No.: 731 |
fefe was working on a (apparently buggy) SSE optimization of libvorbis too.
Do the optimizations only effect encoding or decoding as well? -------------------- "To understand me, you'll have to swallow a world." Or maybe your words.
|
|
|
|
Nov 4 2004, 23:04
Post
#3
|
|
|
Group: Members Posts: 470 Joined: 26-October 01 From: Germany Member No.: 352 |
I archived almost 100% (rather 85%, actually
ICL 8.1: 9,8x Optimized 18,0x. Pretty good This post has been edited by ilikedirtthe2nd: Nov 4 2004, 23:06 |
|
|
|
Nov 5 2004, 01:22
Post
#4
|
|
![]() Group: Members Posts: 92 Joined: 11-March 04 From: The Forest Member No.: 12650 |
Wow
|
|
|
|
Nov 5 2004, 03:14
Post
#5
|
|
|
Group: Members Posts: 169 Joined: 30-September 01 From: Tokyo, Japan Member No.: 99 |
QUOTE (dev0 @ Nov 5 2004, 04:37 AM) fefe was working on a (apparently buggy) SSE optimization of libvorbis too. Do the optimizations only effect encoding or decoding as well? Oh, I didn't know fefe's optimization. IMHO this optimization effects on both encoding and decoding sides although optimized oggdec is not tested or released. Several functions for decodnig (e.g., vorbis_synthesis_blockin, mapping0_inverse, mdct_backward, etc.) are optimized too. |
|
|
|
Nov 6 2004, 02:05
Post
#6
|
|
![]() Group: Developer Posts: 1245 Joined: 16-December 02 From: Australia Member No.: 4097 |
Whoa, it's really fast
On my P4 2.4 GHz: ICL compiled oggenc from rarewares: 13.2x SSE optimised oggenc: 20.5x This post has been edited by QuantumKnot: Nov 6 2004, 02:11 |
|
|
|
Nov 6 2004, 08:02
Post
#7
|
|
|
A/V Moderator Group: Members Posts: 278 Joined: 22-February 03 Member No.: 5132 |
Pretty nice speedup here too:
oggenc from rarewares 10.4x SSE optimized 15.3x |
|
|
|
Nov 6 2004, 08:10
Post
#8
|
|
|
Group: Members Posts: 4 Joined: 31-October 04 Member No.: 17931 |
Hello!
Well, I have got an older machine (p3 700) and recieved a speedup from 4.4 to 9.3x realtime. Have you guys tested the SSE2 optimized build at http://homepage3.nifty.com/blacksword/ ? I wonder how big the speedup with this build is for p 4 and amd 64 cpus. This post has been edited by Music Mixer: Nov 6 2004, 08:12 |
|
|
|
Nov 6 2004, 10:18
Post
#9
|
|
![]() Group: Members Posts: 3620 Joined: 14-May 03 From: Bad Herrenalb Member No.: 6613 |
According to my tests...
ICL 8.1 Standard: CODE File length: 4m 58,0s Elapsed time: 0m 18,0s Rate: 16,5778 Average bitrate: 236,7 kb/s ICL 8.1 Pentium 4: CODE File length: 4m 58,0s Elapsed time: 0m 17,0s Rate: 17,5529 Average bitrate: 236,7 kb/s SSE: CODE File length: 4m 58,0s Elapsed time: 0m 18,0s Rate: 16,5778 Average bitrate: 236,7 kb/s SSE2: CODE File length: 4m 58,0s Elapsed time: 0m 18,0s Rate: 16,5778 Average bitrate: 236,7 kb/s Tested with "Toto - Africa" on a Pentium 4 with 3.2 GHz, 512 MB RAM, running Windows XP Professional Service Pack 1. -------------------- http://listening-tests.hydrogenaudio.org/sebastian/
|
|
|
|
Nov 6 2004, 16:10
Post
#10
|
|
![]() Group: Members (Donating) Posts: 429 Joined: 5-September 04 From: Los Angeles Member No.: 16796 |
I got a good increase, too...
SSE2 CODE File length: 5m 23.0s Elapsed time: 0m 12.0s Rate: 26.9556 Average bitrate: 175.3 kb/s ILC 8.1 CODE File length: 5m 23.0s Elapsed time: 0m 19.0s Rate: 17.0246 Average bitrate: 175.3 kb/s But I can't seem to get it to work on FLAC files... CODE ERROR: Input file "01.flac" is not a supported format. Am I missing something?? Thanks, ~esa :edit: typo This post has been edited by esa372: Nov 6 2004, 19:15 -------------------- Clowns love haircuts; so should Lee Marvin's valet.
|
|
|
|
Nov 6 2004, 16:24
Post
#11
|
|
|
Group: Members Posts: 470 Joined: 26-October 01 From: Germany Member No.: 352 |
QUOTE (esa372 @ Nov 6 2004, 03:10 PM) But I can't seem to get it to work on FLAC files... CODE ERROR: Input file "01.flac" is not a supported format. Am I missing something?? Standard oggenc doesn't input lossless files directly. Only Oggenc2.3 from rarewares does. Regards; ilikedirt |
|
|
|
Nov 6 2004, 16:46
Post
#12
|
|
![]() Group: Developer Posts: 1679 Joined: 23-December 01 From: Germany Member No.: 731 |
QUOTE (ilikedirtthe2nd @ Nov 6 2004, 04:24 PM) QUOTE (esa372 @ Nov 6 2004, 03:10 PM) But I can't seem to get it to work on FLAC files... CODE ERROR: Input file "01.flac" is not a supported format. Am I missing something?? Standard oggenc doesn't input lossless files directly. Only Oggenc2.3 from rarewares does. Regards; ilikedirt The standard oggenc supports FLAC input perfectly. It's a compile-time option AFAIK. -------------------- "To understand me, you'll have to swallow a world." Or maybe your words.
|
|
|
|
Nov 6 2004, 16:52
Post
#13
|
|
![]() xcLame and OggDropXPd Developer Group: Developer Posts: 3706 Joined: 30-September 01 From: Bracknell, UK Member No.: 111 |
QUOTE (dev0 @ Nov 6 2004, 03:46 PM) It sure is. -------------------- John
---------------------------------------------------------------- My compiles and utilities are at http://www.rarewares.org/ |
|
|
|
Nov 6 2004, 17:01
Post
#14
|
|
![]() Group: Members (Donating) Posts: 429 Joined: 5-September 04 From: Los Angeles Member No.: 16796 |
QUOTE (ilikedirtthe2nd @ Nov 6 2004, 08:24 AM) Standard oggenc doesn't input lossless files directly. QUOTE (dev0 @ Nov 6 2004, 08:46 AM) The standard oggenc supports FLAC input perfectly. Well, I can't say that the issue is any clearer for me now... -------------------- Clowns love haircuts; so should Lee Marvin's valet.
|
|
|
|
Nov 6 2004, 17:19
Post
#15
|
|
|
Group: Members Posts: 470 Joined: 26-October 01 From: Germany Member No.: 352 |
QUOTE It's a compile-time option AFAIK. That means, oggenc is able to input flac, if this is enabled when compiling. So: generally it is able to read flac, but this version is not. |
|
|
|
Nov 6 2004, 17:25
Post
#16
|
|
![]() Group: Members (Donating) Posts: 429 Joined: 5-September 04 From: Los Angeles Member No.: 16796 |
QUOTE (ilikedirtthe2nd @ Nov 6 2004, 09:19 AM) ...oggenc is able to input flac, if this is enabled when compiling. So: generally it is able to read flac, but this version is not. Ah... thank you for the clarification!~esa -------------------- Clowns love haircuts; so should Lee Marvin's valet.
|
|
|
|
Nov 6 2004, 18:20
Post
#17
|
|
|
Group: Members Posts: 169 Joined: 30-September 01 From: Tokyo, Japan Member No.: 99 |
QUOTE (Music Mixer @ Nov 6 2004, 04:10 PM) Have you guys tested the SSE2 optimized build at http://homepage3.nifty.com/blacksword/ ? I wonder how big the speedup with this build is for p 4 and amd 64 cpus. I could not find speed difference between SSE and SSE2 versions on my Pentium IV machine. Is there anybody who gets speed increase? The author wants to know the effect to determine whether if he should continue SSE2 version or not. QUOTE (Sebastian Mares @ Nov 6 2004, 06:18 PM) According to my tests... ICL 8.1 Standard: CODE File length: 4m 58,0s Elapsed time: 0m 18,0s Rate: 16,5778 Average bitrate: 236,7 kb/s SSE: CODE File length: 4m 58,0s Elapsed time: 0m 18,0s Rate: 16,5778 Average bitrate: 236,7 kb/s Are SSE and SSE2 binaries your own builds? If so, don't forget to define a symbol __SSE__ to activate the optimization when compiling. |
|
|
|
Nov 6 2004, 19:05
Post
#18
|
|
![]() Group: Members Posts: 3620 Joined: 14-May 03 From: Bad Herrenalb Member No.: 6613 |
QUOTE (esa372 @ Nov 6 2004, 04:10 PM) I got a good increase, too... ILC 8.1 CODE File length: 5m 23.0s Elapsed time: 0m 12.0s Rate: 26.9556 Average bitrate: 175.3 kb/s SSE2 CODE File length: 5m 23.0s Elapsed time: 0m 19.0s Rate: 17.0246 Average bitrate: 175.3 kb/s But I can't seem to get it to work on FLAC files... CODE ERROR: Input file "01.flac" is not a supported format. Am I missing something?? Thanks, ~esa Huh? The ICL 8.1 compile is faster. QUOTE (nyaochi @ Nov 6 2004, 06:20 PM) QUOTE (Music Mixer @ Nov 6 2004, 04:10 PM) Have you guys tested the SSE2 optimized build at http://homepage3.nifty.com/blacksword/ ? I wonder how big the speedup with this build is for p 4 and amd 64 cpus. I could not find speed difference between SSE and SSE2 versions on my Pentium IV machine. Is there anybody who gets speed increase? The author wants to know the effect to determine whether if he should continue SSE2 version or not. QUOTE (Sebastian Mares @ Nov 6 2004, 06:18 PM) According to my tests... ICL 8.1 Standard: CODE File length: 4m 58,0s Elapsed time: 0m 18,0s Rate: 16,5778 Average bitrate: 236,7 kb/s SSE: CODE File length: 4m 58,0s Elapsed time: 0m 18,0s Rate: 16,5778 Average bitrate: 236,7 kb/s Are SSE and SSE2 binaries your own builds? If so, don't forget to define a symbol __SSE__ to activate the optimization when compiling. Nope, they're not my own compiles. -------------------- http://listening-tests.hydrogenaudio.org/sebastian/
|
|
|
|
Nov 6 2004, 19:13
Post
#19
|
|
![]() Group: Members (Donating) Posts: 429 Joined: 5-September 04 From: Los Angeles Member No.: 16796 |
QUOTE (Sebastian Mares @ Nov 6 2004, 11:05 AM) Huh? The ICL 8.1 compile is faster. Whoops! No, that's a typo... I'll edit immediately...
-------------------- Clowns love haircuts; so should Lee Marvin's valet.
|
|
|
|
Nov 6 2004, 19:17
Post
#20
|
|
![]() Group: Members Posts: 2525 Joined: 25-July 02 From: South Korea Member No.: 2782 |
OK, here are some partial translations:
OggEnc_SSE_20041101ArcherB03.zip Changes regarding/surrounding comments Improved low-bitrate quality Current problems are:
This post has been edited by kjoonlee: Nov 6 2004, 19:18 -------------------- http://blacksun.ivyro.net/vorbis/vorbisfaq.htm
|
|
|
|
Nov 6 2004, 20:31
Post
#21
|
|
|
Group: Members Posts: 169 Joined: 30-September 01 From: Tokyo, Japan Member No.: 99 |
QUOTE (kjoonlee @ Nov 7 2004, 03:17 AM) OK, here are some partial translations: OggEnc_SSE_20041101ArcherB03.zip Changes regarding/surrounding comments Improved low-bitrate quality Current problems are:
Thanks for the translation. I think all of the current problems listed above are solved in Archer B03. These problems existed in Archer B02. |
|
|
|
Nov 7 2004, 02:55
Post
#22
|
|
![]() Group: Developer Posts: 1245 Joined: 16-December 02 From: Australia Member No.: 4097 |
IIRC, SSE2 is optimised for double point precision so maybe there isn't that much difference with SSE since libvorbis doesn't use many of them?
|
|
|
|
Nov 7 2004, 09:04
Post
#23
|
|
|
Group: Members Posts: 761 Joined: 29-September 01 Member No.: 40 |
Tested on my AMD64 3400+, 1GB RAM
ICL 8.1: File length: 4m 27.0s Elapsed time: 0m 14.0s Rate: 19.1190 Average bitrate: 132.9 kb/s ICL 8.1 (John33): File length: 4m 27.0s Elapsed time: 0m 11.0s Rate: 24.3333 Average bitrate: 132.9 kb/s SSE/SSE2 Optimized: File length: 4m 27.0s Elapsed time: 0m 08.0s Rate: 33.4583 Average bitrate: 132.9 kb/s SSE2 optimization doesn't change encoding speed This post has been edited by Benjamin Lebsanft: Nov 7 2004, 09:21 |
|
|
|
Nov 7 2004, 10:41
Post
#24
|
|
![]() xcLame and OggDropXPd Developer Group: Developer Posts: 3706 Joined: 30-September 01 From: Bracknell, UK Member No.: 111 |
As QK says, there's very little use of double precision in libvorbis, so the use of SSE2 optimisation is virtually a waste of effort.
-------------------- John
---------------------------------------------------------------- My compiles and utilities are at http://www.rarewares.org/ |
|
|
|
Nov 7 2004, 10:54
Post
#25
|
|
|
Group: Members Posts: 169 Joined: 30-September 01 From: Tokyo, Japan Member No.: 99 |
QUOTE (QuantumKnot @ Nov 7 2004, 10:55 AM) IIRC, SSE2 is optimised for double point precision so maybe there isn't that much difference with SSE since libvorbis doesn't use many of them? QUOTE (john33 @ Nov 7 2004, 06:41 PM) As QK says, there's very little use of double precision in libvorbis, so the use of SSE2 optimisation is virtually a waste of effort. Actually, he expects higher quality (or speed) of float to integer and vice-versa conversion but, at the same time, doubts the effect. I'll tell him these results. |
|
|
|
![]() ![]() |
|
Lo-Fi Version | Time is now: 21st May 2013 - 04:48 |