Help - Search - Members - Calendar
Full Version: P4 Win32 Compile of Flac 1.1.3
Hydrogenaudio Forums > Lossless Audio Compression > FLAC
gharris999
I downloaded trialware versions of MS Visual Studio 2005 and the Intel C++ Compiler v. 9.1 and attempted to compile and link some Flac 1.1.3 Win32 binaries optimized for Pentium 4 Processors. I'm certainly no expert when it comes to picking compile time optimizations, but, through trial and error, I came up with a binary that is faster than the "stock" flac 1.1.3 from sourceforge.

I'll keep the binaries posted at http://home.earthlink.net/~gharris999/flac-1.1.3-IC9.zip for the next couple of days.

My benchmark test for comparing the different versions consisted of encoding an EAC produced whole-cd image wav file at -5, adding replay-gain tags, followed by having metaflac import a cuesheet and a picture jpg to the flac file.

Here are the results, in seconds:

CODE

On a Pentium 4 3.06GHz Hyper-Threading system with a 533 MHz bus speed:

Flac and MetaFlac 1.1.3 stock =        134.125
Flac and MetaFlac 1.1.3 IC9 =           87.78
Flac and MetaFlac 1.1.3 john33 =       135.827 (MSVC6 compile from rarewares.org)

On a Core2 Duo CPU T7200 @ 2.00GHz with a 667 MHz bus speed:

Flac and MetaFlac 1.1.3 stock =         94.78
Flac and MetaFlac 1.1.3 IC9 =           68.547
Flac and MetaFlac 1.1.3 john33 =        95.780

The IC9 binaries seem to be about 28% to 35% faster than the stock version. All three versions of the binaries produced identical flac files and the wavs decoded from those flacs were identical to the original wav. This is hardly exhaustive testing, so I suggest you do some serious tire-kicking before you use these IC9 binaries for "real" encoding.

Please post to this thread any evidence that these IC9 binaries are in any way broken in comparison to the stock 1.1.3 windows binaries.

PS: this IC9 compile uses the source files from CVS from a couple of days ago. I believe that this includes Josh's fix for the comma decimal locale bug.

edit: spelling
Emanuel
Thank you, I will certanly test this.
pepoluan
Will it run on AthlonXP 2400+ and/or Athlon64 ?
gharris999
QUOTE(pepoluan @ Dec 11 2006, 07:17) *

Will it run on AthlonXP 2400+ and/or Athlon64 ?

I doubt it, but there is no harm in trying. The Intel linker links in a dll which will give you an error message and abort the program if the CPU is incompatible.
pepoluan
QUOTE(gharris999 @ Dec 11 2006, 23:09) *
QUOTE(pepoluan @ Dec 11 2006, 07:17) *
Will it run on AthlonXP 2400+ and/or Athlon64 ?
I doubt it, but there is no harm in trying. The Intel linker links in a dll which will give you an error message and abort the program if the CPU is incompatible.
Okay I'll try it and post the results smile.gif

Edit: Every single binary I execute as administrator returns:
CODE
The system cannot execute the specified program.
gharris999
Sorry. I forgot to include that Intel dll (libmmd.dll) in the post. I've fixed that now. Redownload and try again.
pepoluan
QUOTE(gharris999 @ Dec 12 2006, 00:02) *
Sorry. I forgot to include that Intel dll (libmmd.dll) in the post. I've fixed that now. Redownload and try again.
tongue.gif bummer. No prob, will download again smile.gif

pepoluan
Okay, downloaded the latest zip (it has the .dll). Ran it. No good. Same error.
gharris999
Well, perhaps AMD has some compiler tools that could be used to make a version optimized for those CPUs.
Klyith
a) It might be adding SSE3 instructions in the binary, which would need a fairly recent AMD processor to run (revision E athlon X2). But IFAIK the error message in this case is "Fatal Error : This program was not built to run on the processor in your system". This is what I get when I run gharris999's flac encode (mine does SSE1&2).

b) Intel's compiler will definitely make AMD-compatible binaries, assuming the AMD processor supports all of the used x86 extensions.

c) AMD doesn't make any compilers of their own, though they have partnered with other companies to add optimizations, better support, etc to compilers.

d) If you have a AMD processor you're probably better off with a MS compile. There were some verified instances of Intel compilers "sandbagging" on AMD, though the main example was the Fortran compiler. Anyways, here is a good example. Interestingly, the new Core2Duo also works better with MS.

e) Why does everyone on HA call the Intel C Compiler "ICL" instead of "ICC" like everyone else?
john33
QUOTE(Klyith @ Dec 13 2006, 23:51) *

.......
e) Why does everyone on HA call the Intel C Compiler "ICL" instead of "ICC" like everyone else?

Because the Intel compiler executable is called 'icl.exe'. Good enough reason?!? tongue.gif
gharris999
QUOTE(Klyith @ Dec 13 2006, 15:51) *

a) It might be adding SSE3 instructions in the binary, which would need a fairly recent AMD processor to run

I think the optimization options I chose for that compile only require SSE2. I'll be happy to fiddle with some of the options and see if I can make a binary that will run on an AMD cpu. Unfortunately, I don't have an AMD machine to test with, so I'll just post a url to the binary here and someone else can test.
mat128
QUOTE(gharris999 @ Dec 11 2006, 03:26) *

I downloaded trialware versions of MS Visual Studio 2005 and the Intel C++ Compiler v. 9.1 and attempted to compile and link some Flac 1.1.3 Win32 binaries optimized for Pentium 4 Processors. I'm certainly no expert when it comes to picking compile time optimizations, but, through trial and error, I came up with a binary that is faster than the "stock" flac 1.1.3 from sourceforge.

I'll keep the binaries posted at http://home.earthlink.net/~gharris999/flac-1.1.3-IC9.zip for the next couple of days.

My benchmark test for comparing the different versions consisted of encoding an EAC produced whole-cd image wav file at -5, adding replay-gain tags, followed by having metaflac import a cuesheet and a picture jpg to the flac file.

Here are the results, in seconds:

CODE

On a Pentium 4 3.06GHz Hyper-Threading system with a 533 MHz bus speed:

Flac and MetaFlac 1.1.3 stock =        134.125
Flac and MetaFlac 1.1.3 IC9 =           87.78
Flac and MetaFlac 1.1.3 john33 =       135.827 (MSVC6 compile from rarewares.org)

On a Core2 Duo CPU T7200 @ 2.00GHz with a 667 MHz bus speed:

Flac and MetaFlac 1.1.3 stock =         94.78
Flac and MetaFlac 1.1.3 IC9 =           68.547
Flac and MetaFlac 1.1.3 john33 =        95.780

The IC9 binaries seem to be about 28% to 35% faster than the stock version. All three versions of the binaries produced identical flac files and the wavs decoded from those flacs were identical to the original wav. This is hardly exhaustive testing, so I suggest you do some serious tire-kicking before you use these IC9 binaries for "real" encoding.

Please post to this thread any evidence that these IC9 binaries are in any way broken in comparison to the stock 1.1.3 windows binaries.

PS: this IC9 compile uses the source files from CVS from a couple of days ago. I believe that this includes Josh's fix for the comma decimal locale bug.

edit: spelling


Hey, I was going to test your flac build on a high end prescott (3.6ghz) but could not find any option to output the time taken to compile... How did you get it? I can't find anything similar to "time" under linux.

Edit: Just tried on my AMD computer... Wouldn't run, its an old Athlon XP with only SSE1 extensions, see details: IPB Image
And here's the error message the command prompt gave me:
CODE
C:\Documents and Settings\Mat128>"C:\Documents and Settings\Mat128\Desktop\flac-
1.1.3-IC9\flac-1.1.3-IC9\bin\flac.exe" "C:\Documents and Settings\Mat128\Desktop
\whats.wav"
The system cannot execute the specified program.
guruboolez
Tested this compile on my new Core2Duo E6300 ('Allendale').
I used two different files of same length and encoded both with foobar2000 in order to use both cores of my CPU. I used -8 -V setting and tested it with three compiles:
- the official one on sourceforge official site
- case one which fixed the bug on apodization on local system (mine is french and uses comma instead of dots)
- the IC9 one

Results:
CODE

--------------|-------------|
encoder       |  enc. rate  |
--------------|-------------|
sourceforge   |    x23      |
case          |    x15      |
IC9           |    x40      |
--------------|-------------|


Your IC9 compile is really great. On my computer the encoder is near 75% faster than sourceforge's one and 266% faster than case's one I used these last days. The good point is that I don't have to correct the commandline to benefits from the new apodization function. Really nice! Thank you!

IMO this encoder or a similar one should be uploaded on an official place like sourceforge and be maintained when needed.

Edit : Conroe -> Allendale
Wombat
QUOTE(guruboolez @ Dec 14 2006, 11:45) *

Tested this compile on my new Core2Duo E6300 ('Conroe').

Pretender! wink.gif

It would be nice someone with some IC knowledge could dig deeper. I had many so called P4 compiles that had no problem at all with with my A64 San Diego core.

Thanks in advance!
pepoluan
QUOTE(gharris999 @ Dec 14 2006, 13:28) *
QUOTE(Klyith @ Dec 13 2006, 15:51) *
a) It might be adding SSE3 instructions in the binary, which would need a fairly recent AMD processor to run
I think the optimization options I chose for that compile only require SSE2. I'll be happy to fiddle with some of the options and see if I can make a binary that will run on an AMD cpu. Unfortunately, I don't have an AMD machine to test with, so I'll just post a url to the binary here and someone else can test.

Some additional info: The errors I posted above happened when I tested on AMD Athlon64 3000+ (Orleans). CPU-Z says SSE3 is supported.

If you need help testing on AMD machines, I'll volunteer. I have Athlon64 and AthlonXP Barton.

Edit: PRating & Core name
Synthetic Soul
Won't run on my AMD Athlon 64 3000+ (Venice).

QUOTE
This program was not built to run on the processor in your system.
john33
Version 9 of the Intel compiler, unlike previous versions, can build versions that will only run on a real Intel processor and not another processor type even though it also supports the Intel microcode. This is another reason why I've been a little hesitant in offering some of these optimised compiles lately. When I have the time, I'll look into this more closely and see what can be done. Blacksword (of Lancer fame) did provide me with some code that is used in his Lancer builds for runtime processor optimisations, but I've not seen much difference in the speed of the compiles. Another thing I need to look at more closely. wink.gif

Just while we're talking hardware tongue.gif , I can manage to test many setups as I use a PentiumD 940 (3.2 Presler) for development, an E6400 (Allendale) modestly clocked to 2.4 for video work and testing, a Pentium 4 2.8 (Northwood M0/SL6Z5) as an HTPC and also for testing. Additionally, I have access to an Athlon64 FX57 system for testing, plus a PIII866 for further testing. All of these are running XP Pro SP2 except the FX57 which is running XP Home.
mat128
QUOTE(guruboolez @ Dec 14 2006, 04:45) *

Tested this compile on my new Core2Duo E6300 ('Allendale').
I used two different files of same length and encoded both with foobar2000 in order to use both cores of my CPU. I used -8 -V setting and tested it with three compiles:
- the official one on sourceforge official site
- case one which fixed the bug on apodization on local system (mine is french and uses comma instead of dots)
- the IC9 one

Results:
CODE

--------------|-------------|
encoder       |  enc. rate  |
--------------|-------------|
sourceforge   |    x23      |
case          |    x15      |
IC9           |    x40      |
--------------|-------------|


Your IC9 compile is really great. On my computer the encoder is near 75% faster than sourceforge's one and 266% faster than case's one I used these last days. The good point is that I don't have to correct the commandline to benefits from the new apodization function. Really nice! Thank you!

IMO this encoder or a similar one should be uploaded on an official place like sourceforge and be maintained when needed.

Edit : Conroe -> Allendale


How are you getting the encoding rate?

Thanks.
guruboolez
foobar2000 reports it accurately.
Klyith
QUOTE(guruboolez @ Dec 14 2006, 04:45) *
Tested this compile on my new Core2Duo E6300 ('Allendale').
<snip>

Wow, that's quite significant. Interesting that the default build would be so unoptimized. There are chucks of assembler in there that I would have thought were doing all the heavy lifting already.

QUOTE
IMO this encoder or a similar one should be uploaded on an official place like sourceforge and be maintained when needed.
Not this one, seeing as it doesn't work with AMD processors at all. A universal ICC compile would be ok. We should also test all three sets of SSE extensions -- if using only SSE1 gets 99% of the performance gain, that should be used for maximum compatibility.

...
As soon as I have time I will try to put the source into MS VC2005 and do a compile. If so I can produce both SSE1 and SSE1+2 builds and we can see what makes the most difference. Though at the moment I can't even get the nightly tarball, the SF.net cvs server is fubar. I don't want to waste time with the standard 1.13 when it will still have the comma bug.
jcoalson
there are hand-written asm routines on both the encoding and decoding side that use SSE, but the binaries I build do not have them turned on because they will crash if the OS doesn't support SSE, and it's a pain to check that portably at runtime.

you can build them with that if you define FLAC__SSE_OS, e.g. add
CODE
/D FLAC__SSE_OS
to the right parts of src/libFLAC/libFLAC_*.dsp

Josh
gharris999
OK. Check out http://home.earthlink.net/~gharris999/flac-1.1.3-IC9alt.zip I removed the "Require Intel Processor Extensions: Intel Pentium 4 and compatible Intel processors (/Qxn)" option, so maybe these binaries will work on AMD processors. Also, I've included my test timing script in the zip file for those who are interested in how to measure elapsed time in a batch file.


PS: also, I think the binaries in the 1st post need msvcr80.dll, which is included in the 2nd zip. This may explain why some folks couldn't execute the binaries.
Klyith
QUOTE(gharris999 @ Dec 14 2006, 17:37) *

OK. Check out http://home.earthlink.net/~gharris999/flac-1.1.3-IC9alt.zip I removed the "Require Intel Processor Extensions: Intel Pentium 4 and compatible Intel processors (/Qxn)" option, so maybe these binaries will work on AMD processors. Also, I've included my test timing script in the zip file for those who are interested in how to measure elapsed time in a batch file.
This one works but offers no speed improvements for me. Encodes at the same ~26x as the official compile.

QUOTE
PS: also, I think the binaries in the 1st post need msvcr80.dll, which is included in the 2nd zip. This may explain why some folks couldn't execute the binaries.
Doesn't help either. Your first build must have the intel-only block in it.

I tried using this interesting code that is supposed to remove the cpu check, but it didn't work. I'm not sure if things have changed since 2004 in the Intel compiler or what. Anyways gharris999, could you throw the source tarball that you have been working from on your webhost? I'd like to give it a shot but the sourceforge server is still down...
gharris999
QUOTE(Klyith @ Dec 14 2006, 16:47) *

Anyways gharris999, could you throw the source tarball that you have been working from on your webhost? I'd like to give it a shot but the sourceforge server is still down...

I really haven't touched anything in the code, aside from commenting out a couple of #defines. I just tried getting the source via CVS and got the whole package in under a minute:

cvs -d:pserver:anonymous@flac.cvs.sourceforge.net:/cvsroot/flac login

cvs -z3 -d:pserver:anonymous@flac.cvs.sourceforge.net:/cvsroot/flac co -P flac

gharris999
As per Josh above, I added the /D FLAC__SSE_OS define to the compile of libFLAC_static and libFLAC_dynamic and relinked flac.exe and metaflac.exe. It seems to knock a few more seconds off encode times.

See http://home.earthlink.net/~gharris999/flac-1.1.3-IC9sse.zip

Klyith
QUOTE
I just tried getting the source via CVS and got the whole package in under a minute:
Ok, that worked. I wonder why the nightly tarball link was totally unresponsive.

QUOTE
As per Josh above, I added the /D FLAC__SSE_OS define to the compile of libFLAC_static and libFLAC_dynamic and relinked flac.exe and metaflac.exe. It seems to knock a few more seconds off encode times.
Huh, it was slower for me. With all default settings, standard was 60x, icl9alt 61x, and icl9sse 58x.

...
Well, I just spent the morning trying to get a working compile out of MSVC 2005. I can't get it to work, it always dies in a hail of unresolved externals in the linking stage. They're all flac named functions so I don't think the problem is missing libraries or whatever. Ugh, this is bringing back all the bad memories of why I decided not to pursue compsci in college, and why I only work with scripted languages these days.
gharris999
QUOTE
Well, I just spent the morning trying to get a working compile out of MSVC 2005. I can't get it to work, it always dies in a hail of unresolved externals in the linking stage. They're all flac named functions so I don't think the problem is missing libraries or whatever. Ugh, this is bringing back all the bad memories of why I decided not to pursue compsci in college, and why I only work with scripted languages these days.


To get it to compile and link, I had to first make sure I had the latest version of nasm installed to the VS bin directory, and then copy the correct custom command lines and output dirs into the "Custom Built Step" fields for the property pages for the asm files in libFLAC_static and libFLAC_dynamic. The correct entries are:

CODE

custom build step for .nasm files:

Command Line: nasmw.exe -f win32 -d OBJ_FORMAT_win32 -i ia32/ ia32/cpu_asm.nasm -o ia32/cpu_asm.obj
Outputs: ia32/cpu_asm.obj

Command Line: nasmw.exe -f win32 -d OBJ_FORMAT_win32 -i ia32/ ia32/fixed_asm.nasm -o ia32/fixed_asm.obj
Outputs: ia32/fixed_asm.obj

Command Line: nasmw.exe -f win32 -d OBJ_FORMAT_win32 -i ia32/ ia32/lpc_asm.nasm -o ia32/lpc_asm.obj
Outputs: ia32/lpc_asm.obj


I then had to explicitly compile the asm files one at a time.

Then, make sure you have ogg_static.lib in /obj/release/lib. You'll need to download the libogg-1.1.3 project and build that too to get the library.

Then, back in the flac project, try commenting out #if __MSCVER <= 1200 and the corresponding #endif in decode.c, encode.c, metadata_iterators.c, and stream_decoder.c.

Also, comment out the #include <mathf.h> in fast_float_math_hack.h
Also, comment out the #include <float.h> in replaygain_synthesis.c

This all was stuff I had to comment out to get the Intel compiler to work. You might not have to comment out all of those for the MSVC2005 compiler.

Then try building some of the smaller libraries first: utf8_static, replaygain_*, getopt_static, etc.

Then try building lib_FLAC_static & lib_FLAC_dynamic. Finally, build grabbag_static, flac & metaflac.

Basically, that's how I did it. That, and making myself ignore the hail of compiler warnings. Essentially, the MSVC2005 project converter does a less than perfect job converting the MSVC6 project file. Dependencies are left broken, etc.

Edit: mistakes
indybrett
For some reason, this build of FLAC doesn't work for me when used with EAC (.95 prebeta3). I'm using the exact same command line as I did with Josh's official build. The compression window is never displayed, and I'm left with just WAV files. I can't see what the exact error is.

-T "artist=%a" -T "title=%t" -T "album=%g" -T "date=%y" -T "tracknumber=%n" -T "genre=%m" -6 %s
gharris999
QUOTE(indybrett @ Dec 17 2006, 10:13) *

For some reason, this build of FLAC doesn't work for me when used with EAC (.95 prebeta3). I'm using the exact same command line as I did with Josh's official build. The compression window is never displayed, and I'm left with just WAV files. I can't see what the exact error is.

-T "artist=%a" -T "title=%t" -T "album=%g" -T "date=%y" -T "tracknumber=%n" -T "genre=%m" -6 %s

I don't use flac directly from EAC...I always extract to a whole-CD wav + cue and then use a separate batch file to encode, so I'm not sure I can help you. I will say, however, that you should check to make sure that you have libmmd.dll and msvcr80.dll either in a directory in your PATH or in the same directory as flac.exe.
bing
Hello, I would like to report my results.

Before I do that, I have one question - guruboolez exactly how do you get foobar to report the encoding factor? It's reported in real-time in the converter dialog box but disappears once the task is done.

I have an overclocked 2.89ghz P4 (single core, no HT, 644mhz bus speed) with 1gb single-channel DDR RAM running Windows XP SP2. Real-time firewall, virus scanner, spyware scanner, and IM client were running in background.

I used foobar 0.9.4.2 with the following compiles of flac, 1.13b, IC9, and IC9-SSE.

I first ripped The Fray album, How To Save A Life (12 tracks, 46:10) to WAV on my hard drive, and then used the foobar Converter with the following string, -s -8 -V - -o %d. No Replaygain was calculated or applied.

Version, converting time, x real-time (my calculation), % faster than stock
1.13b, 4:54, 9.4x
IC9, 2:56, 15.7x, 40% faster than 1.13b
IC9-SSE, 2:45, 16.8x, 44% faster than 1.13b

It confirms what most other people are posting - the IC9 and IC9-SSE compiles are faster for the P4 and later chips, and it is consistent with gharris999's benchmarks (his 3.06ghz P4 IC9 was 35% faster than 1.13b).

What's interesting to me is that gharris999 reports that there is only a 28% improvement with the IC9 compile for the Core 2 Duo over stock - does this represent Intel compiler immaturity or that the Core microarchitecture runs existing code more efficiently? Hopefully the former and we'll see a steady improvement in future compilers.

Happy holidays to all.
guruboolez
look on the bar: \view\console
speed computation appears once the whole task is finished.
Jebus
QUOTE(jcoalson @ Dec 14 2006, 15:07) *

there are hand-written asm routines on both the encoding and decoding side that use SSE, but the binaries I build do not have them turned on because they will crash if the OS doesn't support SSE, and it's a pain to check that portably at runtime.

you can build them with that if you define FLAC__SSE_OS, e.g. add
CODE
/D FLAC__SSE_OS
to the right parts of src/libFLAC/libFLAC_*.dsp

Josh


This post seems to have been ignored... anyone (John33?) able to do a compile with the SSE assembly activated?

It seems silly to play with compiler flags when the hand-rolled assembly functions are simply being skipped.
john33
QUOTE(Jebus @ Dec 21 2006, 18:44) *

This post seems to have been ignored... anyone (John33?) able to do a compile with the SSE assembly activated?

It seems silly to play with compiler flags when the hand-rolled assembly functions are simply being skipped.

In a way, I suppose I have kind of ignored it. However, so far, and I'll admit not having spent a large amount of time on it, I've tried numbers of different compiler optimisations, and included Josh's compiler switch, and on my PentiumD 940 and E6400 systems not one optimisation effort has made even the smallest difference to the encoding times I see!! When I have the time I'll look at it more closely, but I'm not having much luck at the moment. I should add that the OPs compiles also run no faster here than my various attempts. I haven't given up on it but I'm not full of inspiration just now!! blink.gif
jcoalson
there are two things that could be happening

1) the sse functions are not being triggered by FLAC__SSE_OS because of compiler or runtime settings (e.g. the cpu detection code in libFLAC may not be working). this will take some stepping through the code in the debugger to find out

2) the speed gain as a % of runtime wil not be as significant for -8 as for -5 as only the autocorrelation routine has a specific SSE version right now. autocorrelation is less of a % of runtime in -8

Josh
gharris999
QUOTE(Jebus @ Dec 21 2006, 10:44) *

This post seems to have been ignored... anyone (John33?) able to do a compile with the SSE assembly activated?

It seems silly to play with compiler flags when the hand-rolled assembly functions are simply being skipped.

I didn't ignore that. See this post: http://www.hydrogenaudio.org/forums/index....0917&st=25#
Jebus
QUOTE(gharris999 @ Dec 21 2006, 13:10) *


Ah, missed that. Thanks!
gharris999
QUOTE(bing @ Dec 21 2006, 08:45) *

What's interesting to me is that gharris999 reports that there is only a 28% improvement with the IC9 compile for the Core 2 Duo over stock - does this represent Intel compiler immaturity or that the Core microarchitecture runs existing code more efficiently?

I think you are right about that. I'm seeing the stock flac 1.1.3 binary run 93% faster on the 2 ghz Core 2 Duo vs. on a 2.4 ghz Pentium 4.

I'm also seeing real differences in terms of the % speed gain of my IC9sse compile vs. stock 1.1.3 on the three different machines with which I can test. On my oldest machine, "Papa Bear", a 3 ghz P4 circa 2002 with an 850 chipset and rambus (remember them?) memory, the IC9sse compile is 63% faster than the stock 1.1.3 binary from source forge. [B is % faster than A == ((A/B)-1)*100] On my newest machine, "Baby Bear", a Thinkpad with a Core 2 Duo, the IC9sse compile shows the lowest % speed increase over the stock binary: 43%.

What is interesting to me, though, is that my "Mama Bear" system, which while it does have a slower CPU, has a newer CPU and a faster bus and faster memory. Yet, the "Papa Bear" system executes both the stock binary and the IC9sse binary faster than the raw difference in CPU ghz would suggest.

Clearly, these IC9 compiles seem to like some Intel processors better than others. I can't speak to the results on AMD processors as I don't have one to test with.

Here is the test script I'm using:
CODE

@echo off
rem Test file to test various FLAC & Metaflac routines..
del elapsed.txt
del time.txt
del *.flac

copy cpuinfo.txt elapsed.txt
echo %DATE% %TIME% >>elapsed.txt
echo %LOGONSERVER% %PROCESSOR_IDENTIFIER% >>elapsed.txt

echo FLAC and Metaflac 1.1.3 stock
stopwatch start > time.txt
flac_113.exe  -s -5 --replay-gain  --padding=98304 "blip.wav" -o "blip_113.flac"
metaflac_113.exe --no-utf8-convert --set-tag-from-file="CUESHEET=blip.cue" "--import-cuesheet-from=blip.cue" "blip_113.flac"
metaflac_113.exe --import-picture-from="|image/jpeg|||blip.jpg" "blip_113.flac"
echo Flac and MetaFlac 1.1.3 stock = >>elapsed.txt
stopwatch stop < time.txt >>elapsed.txt


echo FLAC and Metaflac 1.1.3 IC9sse
stopwatch start > time.txt
flac_113_IC9sse.exe -s -5 --replay-gain  --padding=98304 "blip.wav" -o "blip_113_IC9sse.flac"
metaflac_113_IC9sse.exe --no-utf8-convert --set-tag-from-file="CUESHEET=blip.cue" "--import-cuesheet-from=blip.cue" "blip_113_IC9sse.flac"
metaflac_113_IC9sse.exe --import-picture-from="|image/jpeg|||blip.jpg" "blip_113_IC9sse.flac"
echo Flac and MetaFlac 1.1.3 IC9sse =  >>elapsed.txt
stopwatch stop < time.txt >>elapsed.txt

:end
pause


..and here are the results from my three machines:
CODE

System: PapaBear, circa 2002 Asus P4T533-C
850 chipset, memory controller: 82850/82850E, I/O: 8280 1BA (ICH2)
Intel(R) Processor Identification Utility
Version: 3.2.20061121
Number of processors in system: 1
Current processor: #1
Cores per processor: 1
Processor Name: Intel(R) Pentium(R) 4 CPU 3.06GHz
Type: 0
Family: F
Model: 2
Stepping: 7
Revision: 37
L1 Trace Cache: 12 Kµops
L1 Data Cache: 8 KB
L2 Cache: 512 KB
Packaging: FC-PGA2
EIST: No
MMX(TM): Yes
SIMD: Yes
SIMD2: Yes
SIMD3: No
Enhanced Halt State: No
Execute Disable Bit: No
Hyper-Threading Technology: Yes
Intel(R) Extended Memory 64 Technology: No
Intel(R) Virtualization Technology: No
Expected Processor Frequency: 3.06 GHz
Reported Processor Frequency: 3.06 GHz
Expected System Bus Frequency: 533 MHz
Reported System Bus Frequency: 533 MHz
*************************************************************
Thu 12/21/2006 14:12:00.48
\\OVERLOOK x86 Family 15 Model 2 Stepping 7, GenuineIntel
Flac and MetaFlac 1.1.3 stock =     136.562
Flac and MetaFlac 1.1.3 IC9sse =       83.672

=============================================================
System: MamaBear, circa 2004, ASUS P4C800E-DX
875 chipset, memory controller: 82875P, I/O: 8280 1EB/ER (ICH5/ICH5R)
Intel(R) Processor Identification Utility
Version: 3.2.20061121
Number of processors in system: 1
Current processor: #1
Cores per processor: 1
Processor Name: Intel(R) Pentium(R) 4 CPU 2.40C GHz
Type: 0
Family: F
Model: 2
Stepping: 9
Revision: 2E
L1 Trace Cache: 12 Kµops
L1 Data Cache: 8 KB
L2 Cache: 512 KB
Packaging: FC-PGA2
EIST: No
MMX(TM): Yes
SIMD: Yes
SIMD2: Yes
SIMD3: No
Enhanced Halt State: No
Execute Disable Bit: No
Hyper-Threading Technology: Yes
Intel(R) Extended Memory 64 Technology: No
Intel(R) Virtualization Technology: No
Expected Processor Frequency: 2.40 GHz
Reported Processor Frequency: 2.40 GHz
Expected System Bus Frequency: 800 MHz
Reported System Bus Frequency: 800 MHz
*************************************************************
Thu 12/21/2006 13:23:34.60
\\MEDIASRV x86 Family 15 Model 2 Stepping 9, GenuineIntel
Flac and MetaFlac 1.1.3 stock =     181.156
Flac and MetaFlac 1.1.3 IC9sse =      122.688

=============================================================
System: BabyBear, Lenovo ThinkPad T60 2613-HNU
945PM Express Chipset, memory controller: 82945PM, I/O: 82801GBM/GHM
Intel(R) Processor Identification Utility
Version: 3.2.20061121
Number of processors in system: 1
Current processor: #1
Cores per processor: 2
Processor Name: Intel(R) Core(TM)2 Duo CPU T7200 @ 2.00GHz
Type: 0
Family: 6
Model: F
Stepping: 6
Revision: 48
L1 Instruction Cache: 2 x 32 KB
L1 Data Cache: 2 x 32 KB
L2 Cache: 4 MB
Packaging: µFCPGA/µFCBGA
EIST: Yes
MMX(TM): Yes
SIMD: Yes
SIMD2: Yes
SIMD3: Yes
Enhanced Halt State: No
Execute Disable Bit: Yes
Hyper-Threading Technology: No
Intel(R) Extended Memory 64 Technology: Yes
Intel(R) Virtualization Technology: Yes
Expected Processor Frequency: 2.0 GHz
Reported Processor Frequency: 2.0 GHz
Expected System Bus Frequency: 667 MHz
Reported System Bus Frequency: 667 MHz
*************************************************************
Thu 12/21/2006 14:16:36.70
\\WGHT60 x86 Family 6 Model 15 Stepping 6, GenuineIntel
Flac and MetaFlac 1.1.3 stock =     93.969
Flac and MetaFlac 1.1.3 IC9sse =      65.843
=============================================================

edit: spelling
Khaine
Have you tried compiling with gcc ?

I would be interested to see how it compares to icc
yong
A simple FLAC encoding speed test on my P4 2.4GHz :
FLAC binary from gharris999, and GCC compiled FLAC(cvs) with asm, sse enabled.
Too bad i forget which GCC version ive used during compile FLAC tongue.gif

This log was copied from fb2k 0.9.4.2
CODE
[13:50:28] CLI encoder: C:\CODEC\flac-icl.exe
[13:50:28] Destination file: D:\Documents and Settings\Chen Yong\Desktop\M07.flac
[13:50:28] "C:\CODEC\flac-icl.exe" -8 - -o "M07.flac"
[13:50:28] directory: D:\Documents and Settings\Chen Yong\Desktop\
[13:50:37] Total encoding time: 0:09.766, 11.28x realtime

[13:51:07] CLI encoder: C:\CODEC\flac-gcc.exe
[13:51:07] Destination file: D:\Documents and Settings\Chen Yong\Desktop\M07.flac
[13:51:07] "C:\CODEC\flac-gcc.exe" -8 - -o "M07.flac"
[13:51:07] directory: D:\Documents and Settings\Chen Yong\Desktop\
[13:51:27] Total encoding time: 0:19.969, 5.51x realtime

[13:55:30] CLI encoder: C:\CODEC\flac-icl.exe
[13:55:30] Destination file: D:\Documents and Settings\Chen Yong\Desktop\M07.flac
[13:55:30] "C:\CODEC\flac-icl.exe" -5 - -o "M07.flac"
[13:55:30] directory: D:\Documents and Settings\Chen Yong\Desktop\
[13:55:33] Total encoding time: 0:03.125, 35.25x realtime

[13:55:52] CLI encoder: C:\CODEC\flac-gcc.exe
[13:55:52] Destination file: D:\Documents and Settings\Chen Yong\Desktop\M07.flac
[13:55:52] "C:\CODEC\flac-gcc.exe" -5 - -o "M07.flac"
[13:55:52] directory: D:\Documents and Settings\Chen Yong\Desktop\
[13:55:56] Total encoding time: 0:03.203, 34.40x realtime


FLAC gcc(asm,sse) compiled encoding speed at -8, is about twice slower than FLAC icl from gharris999 ohmy.gif
http://www.geocities.com/y0ngc/flac-gcc.zip Uploaded gcc 3.4.5(Mingw special) compiled FLAC for those who are interested to test it.

edit: Uploaded GCC4.2.0 compiled FLAC with follwing compiler options:
configure --enable-sse --disable-shared CFLAGS="-O3 -march=i686 -mtune=i686 -msse -ffast-math -s"
http://geocities.com/y0ngc/flac-gcc4.zip
Khaine
I don't know if -ffast-math is a good idea, from the gcc manual:

QUOTE

-ffast-math
Sets -fno-math-errno, -funsafe-math-optimizations,
-fno-trapping-math, -ffinite-math-only and
-fno-signaling-nans.

This option causes the preprocessor macro __FAST_MATH__ to be defined.

This option should never be turned on by any -O option since it can result in incorrect output for programs which depend on an exact implementation of IEEE or ISO rules/specifications for math functions.


Also mtune implies march, so its redundant
Jebus
I'd use:

"CFLAGS="-O3 -march=pentium3 -mfpmath=sse -ffast-math -fomit-frame-pointer -s"

As mentioned, mtune is superfluous.

Try with and without -ffast-math... it COULD be dangerous, but normally its pretty safe to use.

-fomit-frame-pointer breaks debugging, so its disabled for x86 by default. Might as well turn it on though since we don't care about debugging, and the extra register could speed things up.

Since we're turning on sse anyhow, might as well build for the pentium 3 (lowest common SSE denominator).

Withouth -mfpmath=sse, the 387 floating point unit is used for all FP math, even with -msse (which is included when using -march=pentium3, by the way).

It is also possible that -O3 is slowing things down vs. -O2. Its probably okay, but you might want to test both.
This is a "lo-fi" version of our main content. To view the full version with more information, formatting and images, please click here.
Invision Power Board © 2001-2008 Invision Power Services, Inc.