Help - Search - Members - Calendar
Full Version: FLAC, SSE3 support?
Hydrogenaudio Forums > Lossless Audio Compression > FLAC
Pages: 1, 2
agentk7
QUOTE(Synthetic Soul @ Jan 18 2007, 06:21) *

Here's my results for my AMD Athlon XP 2400+ (CPU-Z report):

CODE
          |               Official                  |               IC9sseW
Setting   | Filesize       Comp %    Enc     Dec    | Filesize       Comp %    Enc     Dec
==========+=========================================+=====================================
0         | 1506304162    70.744%    95x    108x    | 1506304162    70.744%    90x    126x
5         | 1403710148    65.926%    40x     97x    | 1403845807    65.932%    37x    112x
6         | 1402766383    65.881%    36x     97x    | 1402903529    65.888%    37x    111x
7         | 1400824078    65.790%    12x     97x    | 1400902731    65.794%    12x    112x
8         | 1397210593    65.621%    10x     96x    | 1397336348    65.626%     9x    111x
8 -Ax2    | 1395443983    65.538%     6x     95x    | 1395572750    65.544%     5x    108x

As you can see, encoding speeds appear to be slightly impaired, but decoding speeds are reasonably improved. As noted previously, the IC9sseW compile produces slightly larger files, although curiously not with -0 (presumably this is because it does not use the same filters as the others).


Hi SS,

I thought the idea of using the IC9sseW was to test the sse2 optimizations. Unfortunately, Athlon XP processors do not have SSE2, just SSE.

Athlon64 processors introduced SSE2 support from the start. AMD later introduced SSE3 with their revision E 754/939/940 sockets. All X2 Dual Core 939 socket and AM2 socket chips support SSE3.

If you're not sure what permutation of Athlon64 you have, just use a utility like CPU-Z which will show exactly what the processor supports.
Martin H
@agentk7

Exactly, mate smile.gif

I forgot to ask Synthetic Soul which AMD CPU he had, when i said which Compiles to test.


CU, Martin.
Synthetic Soul
QUOTE(Martin H @ Jan 15 2007, 16:54) *
From the "Quick-Reference Guide to Optimization with Intel® Compilers" :
http://cache-www.intel.com/cd/00/00/22/23/222300_222300.pdf
...
W – Generate SSE2 and SSE instructions and optimize for the Intel Pentium4
processor, Intel Xeon processor with SSE2, and other compatible processors that
include SSE2 and SSE such as AMD* processors.
It seems that the switch does provide some benefit for Althon XP processors/SSE. I didn't need this quote to make that statement, my results bear this out; the fact that the IC9sseW compile decompresses 15% faster than the stock build surely is worth noting?

QUOTE(agentk7 @ Jan 19 2007, 17:30) *
I thought the idea of using the IC9sseW was to test the sse2 optimizations. Unfortunately, Athlon XP processors do not have SSE2, just SSE.
I must admit that I am quite confused by the whole affair (although I knew from a previous thread that my XP did not support SSE2), but in my understanding this thread has actually become a testbed to decide what is the best compile for all types of processor. It would be nice for us to have such varied and comprehensive test data that we could categorically state which build should be used for which PC. Considering that decompresion rate is high on the list of pros for FLAC I would have thought that even us lowly Athlon XP owners (wink.gif) would be pleased to have the option of a 15% increase.

You guys may only be interested in SSE2 and SSE3 improvements, but XP owners couldn't give a fetted dingo's kidney about that!

QUOTE(agentk7 @ Jan 19 2007, 17:30) *
If you're not sure what permutation of Athlon64 you have, just use a utility like CPU-Z which will show exactly what the processor supports.
You will notice that I link to a CPU-Z report in the post you have quoted. I uploaded CPU-Z reports for my PC at home, personal laptop, and PC at work a little while back, explicitly for such circumstances.

You will see from the report, that my laptop does support SSE2 and SSE3. I would never use it to encode important files, but I will try to run my tests on it this weekend to get some more (relevant?) data for this thread.

The level of testing is a bit disappointing thus far. Considering the number of FLAC users, and all those members raving about the speed etc., it never fails to amaze me the number that are actually prepared to spend a little time helping the authors to improve their work...
Jebus
QUOTE(Synthetic Soul @ Jan 19 2007, 12:50) *

The level of testing is a bit disappointing thus far. Considering the number of FLAC users, and all those members raving about the speed etc., it never fails to amaze me the number that are actually prepared to spend a little time helping the authors to improve their work...


I've been busy bug-squashing Omni Encoder during my free time, but i'd be happy to run a large batch test this weekend.

Athlon64 X2 (has SSE3)
Martin H
QUOTE(Synthetic Soul @ Jan 19 2007, 20:50) *

It seems that the switch does provide some benefit for Althon XP processors/SSE. I didn't need this quote to make that statement, my results bear this out; the fact that the IC9sseW compile decompresses 15% faster than the stock build surely is worth noting?

Hi Synthetic Soul smile.gif

According to the docs, then AMDs not supporting SSE2 would be using the genericly generated codepath and not the SSE/SSE2 optimized one. However, in that compile you've tested, then there has been enabled in the FLAC code some hand-written SSE assembly, which would benefit AMD SSE CPU's. Also, i must admit that i went alittle fast over your test results as i was in a hurry at the time and i missed the part about you getting better decoding speeds - sorry smile.gif
QUOTE

You guys may only be interested in SSE2 and SSE3 improvements, but XP owners couldn't give a fetted dingo's kidney about that!

Dear Synthetic Soul. I am very interested in having optimized SSE compiles of flac to help the SSE only owners out, and my comment was not meant like that at all smile.gif I studied the docs two days ago to find out which icl9.1 flag would benefit the SSE AMD's most(/QxK) and i was about to post to gharis999 if he would please consider making one, but then i decided not to, just before i posted the message actually, since i allready had pushed my luck in getting him to make two extra compiles besides his original ones, and then i thought that it would be better if another user would suggest it smile.gif

CU, Martin.
Synthetic Soul
Thanks for your response Martin.

QUOTE(Martin H @ Jan 19 2007, 22:00) *
However, in that compile you've tested, then there has been enabled in the FLAC code some hand-written SSE assembly, which would benefit AMD SSE CPU's.
Yes, I did wonder about that. Either way, the improvement is there. smile.gif (Well, it would be nice to see some other peoples' decoding tests to confirm...)

QUOTE(Martin H @ Jan 19 2007, 22:00) *
I studied the docs two days ago to find out which icl9.1 flag would benefit the SSE AMD's most(/QxK) and i was about to post to gharis999 if he would please consider making one, but then i decided not to, just before i posted the message actually, since i allready had pushed my luck in getting him to make two extra compiles besides his original ones, and then i thought that it would be better if another user would suggest it smile.gif
Well, I think you just have wink.gif , but if not I'll request it. smile.gif I'll gladly re-test it on my PC.

As you know, I'm not a FLAC user, but I really do appreciate your input into this thread, and gharis999's hard work in compiling the source, and starting this all with his P4 compiles. It's active members like yourselves that go toward improving things for everyone. I just like seeing the whole scene go from strength to strength; personally I'm really looking forward to David's forthcoming processor optimisations to WavPack.

Cheers.
clb3092
QUOTE(Synthetic Soul @ Jan 19 2007, 11:50) *

The level of testing is a bit disappointing thus far. Considering the number of FLAC users, and all those members raving about the speed etc., it never fails to amaze me the number that are actually prepared to spend a little time helping the authors to improve their work...


You made me feel guilty so I ran the tests. I did things a little differently. First, I used a very large wav file for input (687,477,884 bytes). Second I ran the test of the stock flac binarry against the SSE3 and the SSE3SSE(the version with "Also added /D "FLAC__SSE_OS" as per Josh Coalson) binaries. I'm more interested in SSE3 because I have an Intel sse3 capable cpu.

My basic CPU-Z specs can be obtained at http://valid.x86-secret.com/show_oc.php?id=158682

In case your curious what the heck sort of wav file is that big, has a cue sheet, and album art, you can get the details at http://www.archive.org/details/pf2005-11-26_dsbd
I just uncompressed the entire first set into a single wav file with cuesheet using foobar. The album art I downloaded from http://www.phillesh.net/philzonepages/frie...f/download.html

The results:
Flac and MetaFlac 1.1.3 stock: 203.172
Flac and MetaFlac 1.1.3 IC9sse3: 159.125
Flac and MetaFlac 1.1.3 IC9sse3sse: 142.625

Damn! I'd say Josh was correct!

clb3092

****By the way, this test was done at compression level -5
Jebus
Okay, ran 3 wav files (by TV on the Radio, if you care) on my Athlon64 X2 3800+ (@ 2500MHz, overclocked). This chip has SSE, SSE2 and SSE3 instruction sets.

Results for standard settings (no options):
CODE

Flac stock:      10.687
Flac IC9sseW:    11.422

That's right; the "optimized" build was a bit slower sad.gif

Results at high compression (-8):
CODE

Flac stock -8:   61.797
Flac IC9sseW -8: 58.094

A bit of an improvement here, but nothing to write home about.

How about a 64-bit build? The extra registers available in x86-64 mode might make more of a difference. Also, omitting the frame pointer might help.
Synthetic Soul
QUOTE(clb3092 @ Jan 20 2007, 06:21) *
You made me feel guilty so I ran the tests.
...
Damn! I'd say Josh was correct!
Good. tongue.gif

Hopefully, from the exclamation above, you found the results interesting, and beneficial.

I wonder what the result of this testing will be? I would hope to see various binaries being made available either on the official site, or perhaps Rarewares, if perhaps John would like to take over the mantle of making alternative compilations (As gharris999 can't live in the past for ever... wink.gif ).

QUOTE(Jebus @ Jan 20 2007, 07:05) *
That's right; the "optimized" build was a bit slower sad.gif
....
How about a 64-bit build? The extra registers available in x86-64 mode might make more of a difference.
If there is potential speed improvements then it would be interesting. As Martin posted earlier though, I'm a little wary of overloading gharis999 with requests. unsure.gif

Can someone please try some decoding tests?

NB: If anyone would like to test but doesn't know how, I believe that some scripts were made available in the beginning of this thread. I use my own scripts, discussed in this thread.
Martin H
@Synthetic Soul

Cheers mate beer.gif
QUOTE(Synthetic Soul @ Jan 20 2007, 08:10) *

Can someone please try some decoding tests?

Intel Celeron 1.7 GHz (identical to a P4 except only half L2 cache).

Encoding Test :
------------------

Image.wav : 656 MB

Encoding : Default (-5) and -A "tukey(0,5)" on stock compile.

Stock :
--------

Timer 3.01 Copyright © 2002-2003 Igor Pavlov 2003-07-10

Kernel Time = 15.752 = 00:00:15.752 = 8%
User Time = 154.792 = 00:02:34.792 = 83%
Process Time = 170.545 = 00:02:50.545 = 91%
Global Time = 186.228 = 00:03:06.228 = 100%

IC9sse :
----------

Timer 3.01 Copyright © 2002-2003 Igor Pavlov 2003-07-10

Kernel Time = 15.672 = 00:00:15.672 = 9%
User Time = 134.763 = 00:02:14.763 = 79%
Process Time = 150.436 = 00:02:30.436 = 88%
Global Time = 169.744 = 00:02:49.744 = 100%

IC9sseW :
------------

Timer 3.01 Copyright © 2002-2003 Igor Pavlov 2003-07-10

Kernel Time = 16.814 = 00:00:16.814 = 9%
User Time = 135.564 = 00:02:15.564 = 78%
Process Time = 152.379 = 00:02:32.379 = 88%
Global Time = 172.057 = 00:02:52.057 = 100%

---------------------------------------------------------------------------------------------------------------------

Decoding Test:
-----------------

Image.flac : 336 MB

Stock :
--------

Timer 3.01 Copyright © 2002-2003 Igor Pavlov 2003-07-10

Kernel Time = 17.344 = 00:00:17.344 = 17%
User Time = 57.122 = 00:00:57.122 = 58%
Process Time = 74.467 = 00:01:14.467 = 76%
Global Time = 97.270 = 00:01:37.270 = 100%

IC9sse :
----------

Timer 3.01 Copyright © 2002-2003 Igor Pavlov 2003-07-10

Kernel Time = 17.605 = 00:00:17.605 = 21%
User Time = 34.820 = 00:00:34.820 = 43%
Process Time = 52.425 = 00:00:52.425 = 64%
Global Time = 80.826 = 00:01:20.826 = 100%

IC9sseW :
------------

Timer 3.01 Copyright © 2002-2003 Igor Pavlov 2003-07-10

Kernel Time = 17.415 = 00:00:17.415 = 21%
User Time = 35.350 = 00:00:35.350 = 44%
Process Time = 52.765 = 00:00:52.765 = 65%
Global Time = 80.326 = 00:01:20.326 = 100%



CU, Martin.
Jebus
Decoding test on Athlon 64 X2 (with SSE, SSE2 and SSE3):

CODE

Flac stock:     15.360
Flac IC9sseW:   14.500


So, a little improvement here. This was a single -8 compressed file, 25 minutes long.
Synthetic Soul
Results for my laptop (AMD Turion 64 Mobile ML-36 supporting SSE2 and SSE3):

CODE
          |         Oficial          |         IC9sseW
Setting   |  Comp %     Enc     Dec  |  Comp %     Enc     Dec
==========+==========================+==========================
-0        | 64.251%    104x    114x  | 64.251%    105x     155x
-5        | 60.120%     61x    111x  | 60.128%     55x     144x
-8        | 59.805%     12x    109x  | 59.817%     12x     147x
-8 -Ax2   | 59.723%      7x    112x  | 59.736%      7x     144x

As before I am seeing quite an improvement in decoding speed. Encoding speed is the same, or a little worse for -5.
drbeachboy
Intel Pentium 4, Code Name: Willamette, Specification: Intel® Pentium® 4 CPU 1300MHz, Instruction sets: MMX, SSE, SSE2

Encoding: Using only FLAC (No Metaflac)
FLAC 1.1.3 Stock
Timer 3.01 Copyright © 2002-2003 Igor Pavlov 2003-07-10
Kernel Time = 4.135 = 00:00:04.135 = 1%
User Time = 212.265 = 00:03:32.265 = 73%
Process Time = 216.401 = 00:03:36.401 = 74%
Global Time = 288.825 = 00:04:48.825 = 100%

FLAC 1.1.3 IC9sse
Timer 3.01 Copyright © 2002-2003 Igor Pavlov 2003-07-10
Kernel Time = 3.865 = 00:00:03.865 = 2%
User Time = 123.016 = 00:02:03.016 = 74%
Process Time = 126.882 = 00:02:06.882 = 76%
Global Time = 165.748 = 00:02:45.748 = 100%

FLAC 1.1.3 IC9sseW
Timer 3.01 Copyright © 2002-2003 Igor Pavlov 2003-07-10
Kernel Time = 3.625 = 00:00:03.625 = 2%
User Time = 122.946 = 00:02:02.946 = 73%
Process Time = 126.572 = 00:02:06.572 = 75%
Global Time = 167.681 = 00:02:47.681 = 100%

Decoding:
FLAC 1.1.3 Stock
Timer 3.01 Copyright © 2002-2003 Igor Pavlov 2003-07-10
Kernel Time = 3.014 = 00:00:03.014 = 4%
User Time = 45.966 = 00:00:45.966 = 64%
Process Time = 48.980 = 00:00:48.980 = 68%
Global Time = 71.453 = 00:01:11.453 = 100%

FLAC 1.1.3 IC9sse
Timer 3.01 Copyright © 2002-2003 Igor Pavlov 2003-07-10
Kernel Time = 2.994 = 00:00:02.994 = 6%
User Time = 23.383 = 00:00:23.383 = 48%
Process Time = 26.377 = 00:00:26.377 = 54%
Global Time = 48.640 = 00:00:48.640 = 100%

FLAC 1.1.3 IC9sseW
Timer 3.01 Copyright © 2002-2003 Igor Pavlov 2003-07-10
Kernel Time = 2.894 = 00:00:02.894 = 5%
User Time = 23.363 = 00:00:23.363 = 45%
Process Time = 26.257 = 00:00:26.257 = 50%
Global Time = 51.875 = 00:00:51.875 = 100%

From the results of my test, the overall winner for my computer is FLAC 1.1.3 IC9sse.

Edit: All using -5 WAV 424,148 FLAC 247,154
Synthetic Soul
It's good to see from Martin and drbeachboy's results that "sseW" is as good as "sse" on the Pentium 4's. This means that a sseW compile could be made that is optimised for both P4 and AMD.

However, the benefits to AMD users appear a little disappointing, and confused at the moment. I appear to have seen improved decoding speed (probably due to Josh's in-built SSE optimisations), but Jebus saw no improvement at all.
Jebus
QUOTE(Synthetic Soul @ Jan 20 2007, 15:29) *

It's good to see from Martin and drbeachboy's results that "sseW" is as good as "sse" on the Pentium 4's. This means that a sseW compile could be made that is optimised for both P4 and AMD.

However, the benefits to AMD users appear a little disappointing, and confused at the moment. I appear to have seen improved decoding speed (probably due to Josh's in-built SSE optimisations), but Jebus saw no improvement at all.


Naw, read my decoding post. I saw improvement there, but we both saw a speed-DOWN in -5 encoding for some reason. Its not a huge difference, so I wouldn't mind terrifically if this was made the official build for P3+ computers.
gib
After seeing other folk's results I went ahead and ran another test. This time, however, I used the -8 setting rather than the default -5 as I did in my previous tests. I also ran a decoding test. Again, my CPU is an Athlon64 3400+ (supports SSE, SSE2, SSE3). The wav used was a CD image of approximately 770MB, the largest one I have.

CODE

                   Enc Time (s)   Dec Time (s)
flac 1.1.3 official: 324.921        51.860
flac 1.1.3 IC9sseW:  324.547        50.063


So that's further corroboration that on AMD processors, even though encoding with -5 is noticably slower with the IC9sseW build, when you bump up to -8 the IC9sseW build improves to the point of being equal to or faster than the official build. Interesting.

Also very interesting is that I am not seeing the substantial increase in decoding speed that Synthetic Soul has seen. Though the IC9sseW build is a bit faster, the difference was small, very much like what Jebus saw in his test.

Edit: Fixed code box
agentk7
QUOTE(gib @ Jan 20 2007, 23:46) *

So that's further corroboration that on AMD processors, even though encoding with -5 is noticably slower with the IC9sseW build, when you bump up to -8 the IC9sseW build improves to the point of being equal to or faster than the official build. Interesting.

Also very interesting is that I am not seeing the substantial increase in decoding speed that Synthetic Soul has seen. Though the IC9sseW build is a bit faster, the difference was small, very much like what Jebus saw in his test.

Edit: Fixed code box


I'm seeing the same kind of results with my Athlon64.
Synthetic Soul
QUOTE(Jebus @ Jan 21 2007, 00:41) *
Naw, read my decoding post. I saw improvement there, but we both saw a speed-DOWN in -5 encoding for some reason. Its not a huge difference, so I wouldn't mind terrifically if this was made the official build for P3+ computers.
I made the speed 94% of the original, which I was considering negligable. However, in truth, it's better than a kick in the teeth I guess. smile.gif

QUOTE(gib @ Jan 21 2007, 03:46) *
Also very interesting is that I am not seeing the substantial increase in decoding speed that Synthetic Soul has seen. Though the IC9sseW build is a bit faster, the difference was small, very much like what Jebus saw in his test.
Yes, the difference concerns me.

I tested on my PC with my TAK corpus. I tested on my laptop with what I call my FLAC corpus, used in testing the apodisation windows later used in 1.1.3, which consists of 28 full track files. All timings were recorded using TIMER.EXE, via my usual testing scripts.

I will test today with an image file, to see what difference that makes.
Synthetic Soul
OK, results for some images (585MB and 722MB).

CODE
00.wav (00.wav - 13.wav concatenated)

     |      Original        |      IC9sseW
     |     Enc       Dec    |      Enc       Dec
=====+======================+===================
-0   |  32.625    29.734    |   32.656    21.078
-5   |  55.296    31.796    |   57.937    23.578
-8   | 299.171    30.937    |  295.812    23.562

01.wav (14.wav - 27.wav concatenated)

     |      Original        |      IC9sseW
     |     Enc       Dec    |      Enc       Dec
=====+======================+===================
-0   |  38.875    35.156    |   41.421    26.796
-5   |  68.000    36.187    |   72.343    28.781
-8   | 370.171    38.500    |  364.453    28.578

I'm still seeing a decoding speed of approximately 130% of the stock FLAC exe.
PatchWorKs
Well, the real problem is (in my opinion) that builds it's not the solution for significal speed increase. Look @ Lancer: it's not just a compile with different parameters, but a REWRITE of the code optimizing the use of SSEx instructions.

Just my 2 cents.
Martin H
QUOTE(PatchWorKs @ Jan 21 2007, 13:37) *

Well, the real problem is (in my opinion) that builds it's not the solution for significal speed increase. Look @ Lancer: it's not just a compile with different parameters, but a REWRITE of the code optimizing the use of SSEx instructions.

Of course, hand-written SSE/SSE2 assembly gives way better performance, than compiler generated SSE/SSE2 optimized code - that is hardly rocket-science, you know - but as long as no one comes along, which has the required knowledge and motivation, and volunters to make some of that hand-written SSE/SSE2 optimized assembly routines to replace the standard C/C++ routines in the FLAC sources(besides the allready made hand-written assembly code allready present), then using compiler generated SSE/SSE2 optimized code is indeed better than nothing! Also their allready has been archived some pretty impresive performance gains in both encoding and decoding speed compared to the stock compile.
gib
QUOTE(Synthetic Soul @ Jan 20 2007, 23:34) *

I'm still seeing a decoding speed of approximately 130% of the stock FLAC exe.

I went and tried some more decoding tests but could not duplicate the decoding speeds you are seeing - at first. Looking at the bottom numbers in your table, decoding a 722MB file in 27 seconds would require writing to the hard drive at about 27MB/s. Of course, the drive needs to be reading the file at the same time as well, so the hard drive is really being worked hard. That made me wonder if perhaps the hard drive in my computer was limiting decoding speed. I tried some tests on my other drive, but saw no change (not surprising considering the drive). Then I decided to try having the source flac on one drive and decode the output to the other drive, thus splitting up the work somewhat. That made all the difference. I ran the decoding test 6 times in a row for the official flac, then 6 times in a row with the IC9sseW build. The results consistently showed the IC9sseW build to be faster for decoding:
CODE
             Ave. Time (s)   x realtime
flac original: 27.724          ~155
flac IC9sseW:  23.068          ~186    

The flac file used was a CD image that decompressed to 722MB, just like your test. I didn't quite see the 130% of stock speed you saw, but I did get 120% consistently.

I think the mystery is solved. I also think that's the end of my tests. At least until another, different, optimized build it made. smile.gif
jcoalson
I'm thinking if it's possible to arrive at a binary that is usable by anyone and is at least not significantly slower at any task for anyone than my usual MSVC build, but hopefully faster for a significant number of people. such a binary could be used in the official flac release. I've gone back over the whole thread but I still have some questions.

when icl generates cpu-specific code, does it gate it with processor and OS checks so that you won't get an exception running on a different chip? to serve in the release I would need a single binary that would not ever crash, it might at worst default to a code path that is not as fast.

does the icl compile always require an extra dll to be distributed or can it be linked statically?

Josh
Jebus
QUOTE(jcoalson @ Feb 12 2007, 15:14) *

I'm thinking if it's possible to arrive at a binary that is usable by anyone and is at least not significantly slower at any task for anyone than my usual MSVC build, but hopefully faster for a significant number of people. such a binary could be used in the official flac release. I've gone back over the whole thread but I still have some questions.

when icl generates cpu-specific code, does it gate it with processor and OS checks so that you won't get an exception running on a different chip? to serve in the release I would need a single binary that would not ever crash, it might at worst default to a code path that is not as fast.

does the icl compile always require an extra dll to be distributed or can it be linked statically?

Josh


I expect it only includes the SSE path, if not because of the compiler, then because your assembly instructions are enabled. So you'd still need a Pentium 3 or higher to run it. Not sure if that switch builds multiple code paths though... I'm a GCC guy.

I'm sure the additional DLL can be statically linked in. I think I read somewhere that he's using an evaluation version of ICL, though.
3ngel
Unfortunately the link from gharris999 is dead.
Anyone can point me where i can download it?
Thank you very much.
yong
http://www.mytempdir.com/1215982 wink.gif
3ngel
Thank you so much yong smile.gif
Martin H
QUOTE(jcoalson @ Feb 12 2007, 23:14) *

when icl generates cpu-specific code, does it gate it with processor and OS checks so that you won't get an exception running on a different chip?

Yes, if you use /Qax(x) to make multiple codepath i.e. optimized and generic. If you use /Qx(x) then only the optimized codepath is built and hence, the executable will error out on non-supported systems.
QUOTE

does the icl compile always require an extra dll to be distributed or can it be linked statically?

It can be both skipped or linked statically :

Skip the "libmmd" dependency :

icl -c -MD t.cpp

xilink /nodefaultlib:libmmd.lib t.obj ---- it will link msvcrt.lib instead.

Or

Link "libmmd" as static:

icl -MT t.cpp

It will link the libmmds.lib---the static version of libmmd.dll.

Btw, i'm no ICL expert, and i have just found the following info on the web wink.gif

CU, Martin.
jcoalson
side question, I've been messing around with VC++2005 express. no matter what compiler options I try, I fastest binary I can get is at least 10-20% slower than the one I make with MSVC6. is this a known deficiency in the newer compiler?

Josh
gharris999
QUOTE(jcoalson @ Feb 20 2007, 18:34) *

side question, I've been messing around with VC++2005 express. no matter what compiler options I try, I fastest binary I can get is at least 10-20% slower than the one I make with MSVC6. is this a known deficiency in the newer compiler?

Josh

I can't speak to VC++2005 express. I'm using the "trialware" version of MSVS2005 and I'm just keeping my system clock set to January 2nd so it doesn't expire.

I just checked out your cvs and I'm trying a few different optimizations in VS2005 now.

Using your flac.sln solution file and with no changes, I'm getting a flac.exe and metaflac.exe binary that is about 20% faster for encoding than your 2/13/07 flac-1.1.4-win binaries. Decoding seems to be about the same.

My first attempt at throwing some compiler optimizations at the project resulted in slower binaries and a slightly larger flac file.

I hope to spend most of tomorrow trying out different optimizations in VS2005 and IC9 and I'll post the results and the binaries so folks can test for themselves.
manoa
VS6 builds will always be faster, especially when combined with ICC10, I have tested it on several other software projects like blackbox for windows, added bloat in vs7 and vs8 and additional library dependencies (msvcrt71/81.dll msvcp71/81.dll etc...) whether compiled statically or not cause these lags in performance, VS6+ICC10 is the only solution for maximum performance, apperantly now I discover, not only for flac, I never tried vs7/8, because of the impossible-to-get-rid-of dependencies, ICC10 has no dependencies at all - there is only one exception - if compiling with multi-threading support - and even then it only requires one intel library libguide40.dll which is far worth the expense - because the performance that can be achieved with it

does anyone have the version of flac-1.2.1+VS6+ICL10/ICL9 compatible sources or SSE/2/3 builds with/without compilation output with/without compiler config file ? I would like to continue working on it if possible for version 1.2.1b, btw http://www.mytempdir.com/1215982 does not work
Mike Giacomelli
QUOTE(jcoalson @ Feb 20 2007, 20:34) *

side question, I've been messing around with VC++2005 express. no matter what compiler options I try, I fastest binary I can get is at least 10-20% slower than the one I make with MSVC6. is this a known deficiency in the newer compiler?

Josh


Thats pretty surprising. MSVC6 is limited to optimizing for the P3. I'd be surprised if it could do as well as newer compilers that actually know about SSE2 and modern cores.
This is a "lo-fi" version of our main content. To view the full version with more information, formatting and images, please click here.
Invision Power Board © 2001-2008 Invision Power Services, Inc.