QUOTE
There are only major speed differences between a really AMD optimized build and an Intel C++ compile. I was using the 3.96 intel compile from rarewares and with mine it's about 3 times faster.
3 times faster is a very suspicious resut. It may be so that one of lame versions you are talking about was built with misconfigured compiler. Or maybe gcc just can optimize better than Intel C, but in that case it will optimize for Intel CPUs too, and then your results aren't too processor specific.
BTW once I compiled lame that was about 3 times faster than the same version, compiled from the same sources, but with different compiler version. Later I found out that the slower version was compiled with MSVC6 but with headers and libraries from MSVC7.1, and that caused such a great performance degradation.
QUOTE
looks like mmx had the most influence o_O
also, some strange results, but i didn't feel like duplicating tests...
all used: 8.0644x RT
no sse: 8.0997x
no mmx: 7.3787x
no 3dnow: 8.0420x
no mmx/sse/3dnow: 7.3861x
I see nothing too strange here.
First, SSE optimization is not really used in lame. It is not even gets compiled in (well, at least in official source releases). Even if you change makefile to include SSE optimization, keep in mind that lame developers do NOT recommend to use it.
Second, FFT calculation is not the most time consuming part or lame encoder (it is what is actually optimized for 3dnow or SSE), while MMX used for choosing best huffman tables in the inner loop, which has greater speed impact.