Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: Question about Oggenc compiles (Read 9575 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

Question about Oggenc compiles

Hi. Just a quick question about the P4 compiled versus the P3/AMD compiled versions on rarewares. Is it safe to run the P4 versions on any SSE2 enabled CPU's? Specifically since AMD CPU's have SSE2 support from the Athlon64 series onward would it be safe to use the P4 compiles with these CPUs?

Thanks.


Question about Oggenc compiles

Reply #1
No replies, let me add some more info.

I just tested the P4 compile on my Athlon64 and it seems to run fine, encodes approx 25% faster than the P3/AMD version and the resultant mp3 seems ok. So I guess that answers my question, it appears to be no problem to use the "P4" compiled lame version on Althon CPU's that support SSE2. Can anyone else confirm this is fact.

Thanks

Question about Oggenc compiles

Reply #2
No replies, let me add some more info.

I just tested the P4 compile on my Athlon64 and it seems to run fine, encodes approx 25% faster than the P3/AMD version and the resultant mp3 seems ok. So I guess that answers my question, it appears to be no problem to use the "P4" compiled lame version on Althon CPU's that support SSE2. Can anyone else confirm this is fact.

Thanks

Sorry, I didn't see this before! Yes, these will run on the Athlon64s, single and multi-core, and anything else that supports SSE2. I don't have such a system now, but not so long ago I used them as the basis for development systems.

Question about Oggenc compiles

Reply #3
Thanks for the info John.

Actually the reason that I asked this is because the encodes from the P4 and the P3/amd versions departed a lot more from bit identical than I expected (when encoded from the same wav file and then decoded back to wave and compared). Naturally I didn't expect the encoded-decoded wav's to be bit identical to the original wav file, but I kind of expected the wav's from the P3 and P4 versions to be close to bit identical with each other.

This is why I at first thought that the P4 compile might be having some kind of numerical accuracy issue on the Athlon, though I've since discounted that by encoding on an old P4-1500MHz I've got laying around here. The athlon64 and the P4 cpu's both produced bit identical encodes when running the same P4 optimized complies (of the latest Aotuvb5.7 oggenc). So no problem there.

Back to the bit discrepancies between the p4, p3 (and generic) compiled versions. I expected maybe one or two LSB's difference due to rounding errors etc but I found maximum point-wise deviations (in a 3 minute track) to be of over 200 LSB's between the P4 and generic derived wav files (and about 100 LSB's between the P3 and generic derived wav files). I wasn't expecting this.

Admittedly I don't think I can hear a difference between any of the files and I'm certainly not making any claims of audible difference, but does this level of bitwise deviation seem correct?

BTW. Lots of the sample points are identical but there are plenty of bursts where they deviate by more than just a couple of bits. The RMS deviation (over the whole track) was only about 1 LSB's between the P3 and generic versions and about 2.5 LSB's for the P4 versus the generic versions though.

Does anyone have any ideas on this? Especially in relation to the fairly large point-wise discrepancies.

Question about Oggenc compiles

Reply #4
That's due the architecture diferences between the P3 and P4... if you compile this code:

static void Main(string[] args)
{
double inf = 0.1 / 0.0;
int test2 = (int)(inf * 0.0F);
Console.WriteLine( test2.ToString() );
}

using a P3 processor the binary executable will produce "0", but if you comnpile it on a P4 the binary will produce the value -2147483648.

Of course, we're speaking about a div/0, so the result should be undefined (well... in terms of processors is not really "exact" but this is for another day), but it's just an example...

Other mathematical "deviations" known are anything/10, some floating point decimals... e.g 5.76 becomes 5.7600000034 on a P4... And compatibles...

This is mostly due as result of optimizations done by the compiller, and the fact that some operations can't produce exact results expressed as a base 2 numeric system (binary). Usually the trade here is more speed vs accuracy. The P3 uses the old architecture @intel that was mostly exact, but slow. And the P4 and onward uses a very diferent architecture... faster, but more prone to errors in the floating point calculations... And the work of verify the numbers produced is left to the programmer.

So you can't really expect that any data produced by software optimized for one architecture (P3 & Co.) that uses floating point calculations, will be equal to the data produced by the software optimized for the other (P4 & Co.)...

Hope it helps... and sorry about my english... not my native language.

Regards

J.

Question about Oggenc compiles

Reply #5
Quote
Other mathematical "deviations" known are anything/10, some floating point decimals... e.g 5.76 becomes 5.7600000034 on a P4... And compatibles...


Yes I understand that the different architectures and optimized execution units (3dnow, SSE1, SSE2 etc) give numerical calculations of differing precision and that in any case floating point computations give results that are only accurate to a certain number of significant digits. But look at the example above, 5.7600000034 to 5.76 is only in error by about 6E-8% (6 times 10^(-8) percent). This is an error level corresponding to about 2E-5 (0.00005) part of one least significant bit in a wav file. Even the accumulation of many thousands of such rounding errors would be lucky to make even a one LSB error in the wav file!

Like I said I wouldn't have been the least bit surprised if the two decoded wav files differed by one or two LSB's,  however I'm am really surprised that they differed by several hundred LSB's in some places.


Question about Oggenc compiles

Reply #6
Quote
however I'm am really surprised that they differed by several hundred LSB's in some places.


You can see the same behavior for different LAME compiles (e.g. ICL vs MSVC).

Question about Oggenc compiles

Reply #7
Quote
however I'm am really surprised that they differed by several hundred LSB's in some places.


You can see the same behavior for different LAME compiles (e.g. ICL vs MSVC).


Good idea lvqcl, I'll give that a try when I get time. (probably not until tomorrow now)

Question about Oggenc compiles

Reply #8
Quote
Other mathematical "deviations" known are anything/10, some floating point decimals... e.g 5.76 becomes 5.7600000034 on a P4... And compatibles...


Yes I understand that the different architectures and optimized execution units (3dnow, SSE1, SSE2 etc) give numerical calculations of differing precision and that in any case floating point computations give results that are only accurate to a certain number of significant digits. But look at the example above, 5.7600000034 to 5.76 is only in error by about 6E-8% (6 times 10^(-8) percent). This is an error level corresponding to about 2E-5 (0.00005) part of one least significant bit in a wav file. Even the accumulation of many thousands of such rounding errors would be lucky to make even a one LSB error in the wav file!

Like I said I wouldn't have been the least bit surprised if the two decoded wav files differed by one or two LSB's,  however I'm am really surprised that they differed by several hundred LSB's in some places.


I haven't seen the sources, but since they're dealing with lossy sound compression, probably they are using trigonometrical functions at some point... combine trigonometrical math+the floating point errors due the optimizations done by the processor+optimizatios done by the compiller and you **will** have errors of some orders of magnitude (through in the sound context, it can be just discarded... IMHO). Scientific software and software that uses precision critical math operations uses software algorithms based on "safe" math functions found in the processors. (think CAD, for example). But, as they deal with complex calculations in software, they are slower.

Personally i don't care much about it in case of lossy codecs, since they throw out data anyway... But where we CAN get some worries is when some uses a overoptimized lossless codecs... I haven't seen any case... but one never knows...

About the errors... try to divide 9/10 under the actual processors... the results can be interesting...
Decimal -> binary
0.9    ->  0.9
And again:
0.09  ->  0.089999996

Of course is just a two level division... The errors are all acummulative... even if you round this, the error will accumulate (rounding the second result using hardware will result in 0.8... glorious eh??) in case of complex operations (e.g. sin(x) or cos(x) ) things can get really screwed.

My last example:

#include <stdio.h>
      int main(int argc, char *argv[]) {
        double x = 0.49999999999999994;
        int    i = (int)(x+0.5);
        printf("%0.17f  %d\n", x, i);
        return 0;
        }

This is used to convert a floating-point number to an integer. Unfortunately in this particular case the value of x is rounded up during the addition, so it ends up with the value 1 instead of the expected 0. And we're dealing with just additions here...

What's the point??... well.. depending of what compiller you use and what kind of optimizations and what processor are you working on, you might get pretty diferent results... specially in the binary floating point area... where results may vary...

Regards.

J.

Question about Oggenc compiles

Reply #9
john33 =)

And you can make Generic + optimization for dual-core processors?
It is really?

Question about Oggenc compiles

Reply #10
john33 =)

And you can make Generic + optimization for dual-core processors?
It is really?

I'm not completely clear what you are asking here, but since the standard libraries/applications are not multithreaded, I don't really see the point in producing compiles that supposedly take advantage of multi-core processors - unless, of course, I'm missing something which, as we all know, is entirely feasible.

Question about Oggenc compiles

Reply #11
I am referring to what set of commands SSE, SSE2, SSE3, SSE4.1 make errors.
May be going the other way?
Multithreading =)

Question about Oggenc compiles

Reply #12
I am referring to what set of commands SSE, SSE2, SSE3, SSE4.1 make errors.
May be going the other way?
Multithreading =)

Actually we don't even know which file is in error or even if "error" is the appropriate description of what's going on. The thing is that we don't have any reference that we can say is the correct file. We only know that the files from different compilers (or same compilers with different optimizations) give bitwise different output files, we don't know which (if any) could be considered as the "correct" file.

Also, I did do the test suggested previously and I found similar bitwise errors levels between mp3 files generated from the ic10 and vc6(ic9) compiles.

BTW. Could someone else please verify either of these bitwise error measurements (that is between the various oggenc compiles  or between the two mentioned lame3.98.2 compiles) just to make sure I'm not doing something incorrectly here.

Question about Oggenc compiles

Reply #13
Here's an update on this saga. The song I tested before was one of the quietest in my collection (RG about +10dB)  so I just repeated the test with a more typical track (approx four minutes, RG = -8.31dB, Peak=0.98877) and the errors are even bigger, much bigger.

1. Comparing Vorbis2.85 AoTuv 5.7, generic versus P4 compiles.

Max left channel deviation = 3267 LSB's.
Max right channel deviation = 3677 LSB's

2. Comparing Lame 3.98.2, IC10 verus VC6(IC9) compiles.

Max left channel deviation = 3257 LSB's.
Max right channel deviation = 3950 LSB's

These are some pretty big discrepencies between the different compiles.

Question about Oggenc compiles

Reply #14
IIRC LAME has outer and inner iteration loops to copmute values for MP3 encoding. So even small discrepancy can be a reason of much more different resulting values.
Quote
The outer iteration loop controls the masking conditions of all scalefactorbands. It computes the best scalefac and global gain. This module calls the inner iteration loop.
inner_loop starts with the initial quantization step computed above and slowly increases until the bits < huff_bits.

Question about Oggenc compiles

Reply #15
After reading through the previous posts, I don't see a clear consensus on which oggenc build is most appropriate. Is the only difference between the p4 and other builds the use of SSE2?

If I have a modern Core 2 Duo, should I prefer the p4 build, which supports SSE2, or the p3 build (The C2D is much more architecturally similar to the P3 than it is to the p4).

Thanks

Question about Oggenc compiles

Reply #16
Yeah I'm still not sure what to make of it. I wish someone would try to repeat my results to make sure it's not something that I'm doing wrongly in comparing the files. BTW what I did is to convert the same lossless file to ogg (twice using two differently optimized versions) and then convert them both back to wav. I then loaded the two wave files into a matlab clone (octave) and compared the files.

It's been a while so I just tried the procedure again. It seems to happen on any file that I test so there's no need to choose any special track or any particular type of music.

Here are the results of the track I tested just now (results being the average of the left and right channel errors).


Comparison of OggEnc 2.85 AoTuv5.7 generic versus P4 compiles.

Track Length : 3:47

RMS wav difference  : 50 LSB's
Peak wav Difference : 4100 LSB's

Distribution of errors :

LSB's : Percent of Samples with error that exceeds this amount.
10    : 2.8%
20    : 2.3%
50    : 1.65%
100  : 1.14%
200  : 0.66%
500  : 0.21%
1000  : 0.053%
2000  : 0.005%
3000  : 0.0005%
5000  : 0