ICC build performance on Intel Atom

Topic: ICC build performance on Intel Atom (Read 14710 times) previous topic - next topic

0 Members and 1 Guest are viewing this topic.

ICC build performance on Intel Atom

2008-09-12 00:16:03

I've been compiling codecs with Intel's compiler (latest version, 10.1.018) under linux on my netbook, which has an Intel Atom N270 processor running at 1.6GHz. So-called reviews on the interwebs have dismissed it as a (dog) "slow" processor when it comes to audio encoding, but I found out that using optimized codecs yields pretty good encoding speeds (18x with the multi-threaded, SSE3-enabled Vorbis encoder, 42x with Flake…). I also read a blog post about using ICC on the netbook I own (Acer Aspire One), so I decided to give it a try, using "-xL -O3" as CFLAGS (neither -ipo and -faster seemed to work).

I got slight improvements with Flake, LAME, and to a lesser extent, FLAC, and none with Monkey's Audio. But WavPack takes the cake: it runs over 23% faster (55.2s vs. 71.9s) than the GCC 4.3.2 optimized build (-march=native) in the Phoronix benchmark (-hhx2 on a 7m44s WAV file from Nine Inch Nails). Without -x2 (just -hh), it runs 11% faster. That's still quite an improvement. I also tried -hx4, -hx3 and -hx1, which yielded a 23%, 24% and 22% improvement, respectively. I assume MMX intrinsics play a big role here, with the -x parameter. Makes me wonder if LAME and Monkey's Audio would run faster if they used intrinsics instead of hand-coded ASM…

For some reason I didn't manage to build the shared libraries though: a process called "mcpcom" seemed to hang with 100% CPU usage during the compilation.

ICC build performance on Intel Atom

Reply #1 – 2008-09-12 11:18:14

Quote from: skamp on 2008-09-12 00:16:03

Makes me wonder if LAME and Monkey's Audio would run faster if they used intrinsics instead of hand-coded ASM…

I have tried once and I thought that nothing could beat a hand-optimized assembly (optimised to have as few instructions as possible which would supposedly take less CPU cycles - apparently this is not the case with superscalar architecture). Well, C++ with intrinsics and VC++ compiler quickly changed my opinion about my assembly skills.
I tried programming some filtering algorithm all by hand MMX/SSE assembly vs. using C++ and intrinsic instructions. I was quite suprised to see the C++ version run faster. When I looked at the assembler the C++ code produced I saw it was unrolling loops, precalculating offsets and all the stuff I had no idea about (such as overuse of LEA instruction). This was the day I gave up on manual assembly (well, not entirely).

ICC build performance on Intel Atom

Reply #2 – 2008-09-12 19:37:40

I'm wondering how much the speed will go up if/when Atom-specific optimizations are built into gcc, so that the in-order architecture is fully taken into account.

ICC build performance on Intel Atom

Reply #3 – 2008-09-13 00:02:09

Quote from: MedO on 2008-09-12 19:37:40

I'm wondering how much the speed will go up if/when Atom-specific optimizations are built into gcc, so that the in-order architecture is fully taken into account.

Lots, probably to within a few percent of the Intel Compiler. Scheduling for the pipeline of the specific processor matters for in-order designs in ways that it doesn't for aggressive out of order designs.

On the point of inline assembler and even intrinsics most of the time they actually slow things down because they get in the way of the compiler's optimisation passes and register allocation. For example even under the latest Microsoft compiler the optimiser basically just gives up as soon as you use any SSE intrinsics. This will change when their grand rewrite, codenamed Phoenix, ships but when that will be is not certain yet.

The ARM C Compiler takes a different tack it supports "embedded assembler" the compiler is allowed to do register allocation and optimisation passes over this assembler. This means that you don't necessarily get exactly the instructions that you wrote. However because the compiler's register allocator and optimisation framework aren't being clobbered you generally do get a speed if you can do better than the compiler on the actual implementation of your algorithm.

ICC build performance on Intel Atom

Reply #4 – 2008-12-31 15:54:09

Sorry to bump an older thread, but I've just bought a Samsung NC10 netbook which uses a 1.6GHz Intel Atom chip (like most netbooks).

I've been using the latest generic AoTuV Ogg Vorbis encoder (b5.61) found at the RareWares site, but I only get 5x-6x encoding speed. Can anyone share an Atom-optimized Vorbis encoder? I'm not really tech-savvy enough to compile my own.

Thanks!

ICC build performance on Intel Atom

Reply #5 – 2008-12-31 16:31:26

http://homepage3.nifty.com/blacksword/ogge...cer20061110.zip

ICC build performance on Intel Atom

Reply #6 – 2008-12-31 17:59:56

Thanks for that, but the latest Lancer encoders are quite old now. Is there a way to compile a newer version? The latest AoTuV encoder is very good.

Notice