Help - Search - Members - Calendar
Full Version: LAME version Optimized for AMD CPU's?
Hydrogenaudio Forums > Lossy Audio Compression > MP3 > MP3 - General
deeswift
Hello,

I was wondering if there's a version of LAME optimized for AMD processors. I am using v3.90.3 Modified (why, of course!) and I noticed versions optimized for Intel processors (yuk) but never heard of one for us AMD enthusiasts. My CPU is an Athlon 64 3200+, if that makes any difference.
kungfujoe
QUOTE(deeswift @ Jul 23 2004, 07:52 PM)
I was wondering if there's a version of LAME optimized for AMD processors. I am using v3.90.3 Modified (why, of course!) and I noticed versions optimized for Intel processors (yuk) but never heard of one for us AMD enthusiasts. My CPU is an Athlon 64 3200+, if that makes any difference.
*


Ditto. I just upgraded from 3.93.1 to 3.96 only to find that the speed has dropped to about 1/3 of what it used to be (same settings in RazorLAME)! If this is in the name of better quality, I'll take it, but I'd also really like to be able to take advantage of AMD's special instruction sets to speed things up
dreamliner77
3.96 should be faster. Unless you are using the -q0 switch. See:
http://www.hydrogenaudio.org/forums/index....showtopic=24247
kungfujoe
QUOTE(dreamliner77 @ Jul 23 2004, 11:15 PM)
3.96 should be faster.  Unless you are using the -q0 switch.  See:
http://www.hydrogenaudio.org/forums/index....showtopic=24247
*
Funny, that link goes to a threat that I just started. biggrin.gif

Point taken about the q0 switch (though it should offer ever so slightly better encodes), but it'd still be nice to see an AMD optimized build; that is, if it would offer any significant performance boost.
metaller
Forget about significant performance boost.

The optimization technique is generally the same for all CPUs; the only AMD specific instruction set is 3DNow!, but both Intel and AMD CPUs have SSE instruction set which provides similar functionality. Modern compilers can optimize quite well for those. I tried to manually optimize one function from the inner loop (which takes most of the CPU time) with SSE, but it wasn't any faster than C code compiled with Intel C 5.0.
damaki
There are only major speed differences between a really AMD optimized build and an Intel C++ compile. I was using the 3.96 intel compile from rarewares and with mine it's about 3 times faster.
My flags are :
CODE
export CFLAGS="-O3 -march=athlon-tbird \
-funroll-loops -fexpensive-optimizations"
export CXXFLAGS=$CFLAGS
export FFLAGS=$CFLAGS
john33
There are run time selected processor optimisations in assembly language built in to LAME to take care of MMX, SSE, SSE2 and 3DNow!. Normally, and assuming a compile that includes the nasm routines, the addition of processor specific optimisations in the compile make little, or no, difference.
damaki
QUOTE(john33 @ Jul 24 2004, 10:50 AM)
There are run time selected processor optimisations in assembly language built in to LAME to take care of MMX, SSE, SSE2 and 3DNow!. Normally, and assuming a compile that includes the nasm routines, the addition of processor specific optimisations in the compile make little, or no, difference.
*


Yeah I know, the 3dnow optimisation is displayed even in the ICC build but still I get astonishing speed differences. blink.gif
plonk420
looks like mmx had the most influence o_O

also, some strange results, but i didn't feel like duplicating tests...

all used: 8.0644x RT
no sse: 8.0997x
no mmx: 7.3787x
no 3dnow: 8.0420x
no mmx/sse/3dnow: 7.3861x

done on an athy 2400 with APS...
metaller
QUOTE
There are only major speed differences between a really AMD optimized build and an Intel C++ compile. I was using the 3.96 intel compile from rarewares and with mine it's about 3 times faster.

3 times faster is a very suspicious resut. It may be so that one of lame versions you are talking about was built with misconfigured compiler. Or maybe gcc just can optimize better than Intel C, but in that case it will optimize for Intel CPUs too, and then your results aren't too processor specific.

BTW once I compiled lame that was about 3 times faster than the same version, compiled from the same sources, but with different compiler version. Later I found out that the slower version was compiled with MSVC6 but with headers and libraries from MSVC7.1, and that caused such a great performance degradation.

QUOTE
looks like mmx had the most influence o_O

also, some strange results, but i didn't feel like duplicating tests...

all used: 8.0644x RT
no sse: 8.0997x
no mmx: 7.3787x
no 3dnow: 8.0420x
no mmx/sse/3dnow: 7.3861x

I see nothing too strange here.
First, SSE optimization is not really used in lame. It is not even gets compiled in (well, at least in official source releases). Even if you change makefile to include SSE optimization, keep in mind that lame developers do NOT recommend to use it.

Second, FFT calculation is not the most time consuming part or lame encoder (it is what is actually optimized for 3dnow or SSE), while MMX used for choosing best huffman tables in the inner loop, which has greater speed impact.
This is a "lo-fi" version of our main content. To view the full version with more information, formatting and images, please click here.
Invision Power Board © 2001-2008 Invision Power Services, Inc.