Mike Giacomelli
Sep 26 2003, 18:27
Check this
http://www.anandtech.com/cpu/showdoc.html?i=1884&p=17QUOTE
The performance improvement here is astounding - in 64-bit mode the Athlon 64 FX managed to finish the encode 34% quicker than in 32-bit mode, if these results are any hint of what could be in store for Windows users, there's a lot of promise behind the Athlon 64...assuming we get software support in time.
Now I realize this is GCC and not IOC like is often used with LAME, but still these results seem amazing. Is it that LAME uses 64 bit ints or the extra GPRs? Either way I would not have expected LAME to get such a huge boost.
From my work in compilers, it would not be at all surprising if a much or all of that gain is due only to the additional registers. I honestly am not that knowledgeable about Lame's use of integer or FPU ops, but the new floating point instructions (normal instead of stack based) could also be a a cause.
NatGun
Sep 26 2003, 19:48
in 32bit emulation, the a64 uses the other 32bits to increase commands per clock cycle. basically it can do more in the same ammount of time. rather ingenious really.
Actually, that's not really how it works. In theory a compiler could do a "poor man's" MMX with the wider registers, but it wouldn't work very well under most circumstances.
Also, the Athlon-64 does not run 32-bit code in emulation any more than our 32-bit x86 processors run 16-bit code in emulation. Also, in this test, they were running 64-bit code (well, seeing how much faster it was than the 32-bit equivalent).
Mike Giacomelli
Sep 27 2003, 06:05
QUOTE (NatGun @ Sep 26 2003, 10:48 AM)
in 32bit emulation, the a64 uses the other 32bits to increase commands per clock cycle. basically it can do more in the same ammount of time. rather ingenious really.
What are you talking about?