IPB

Welcome Guest ( Log In | Register )

2 Pages V  < 1 2  
Reply to this topicStart new topic
lossless codec testing, how do we know XYZ is lossless?
TBeck
post Nov 20 2006, 21:01
Post #26


TAK Developer


Group: Developer
Posts: 887
Joined: 1-April 06
Member No.: 29051



QUOTE (greynol @ Nov 20 2006, 21:57) *
QUOTE (Jan S. @ Nov 20 2006, 11:39) *
Then again isn't all of this a non-issue if people just use the -V switch?
Certainly for flac, but what about other codecs?

It's also supported by TAK... smile.gif

hmpf... at least in the GUI version. I forgot to implement a switch in the command line version...
Go to the top of the page
 
+Quote Post
HyperDrive
post Nov 20 2006, 21:09
Post #27





Group: Members
Posts: 9
Joined: 1-December 02
Member No.: 3945



QUOTE (Jan S. @ Nov 20 2006, 11:39) *
1. You mathematically analyse the algoritms and mathematically work out if it will be lossless for all possible samples.
The problem with this is however that the encoder/decoder is not a closed system thus you cannot possibly account for external variables like FPU and CPU. So eventhough you algoritms might be perfect you cannot be sure your output will be. Hence this type of test will be pointless if the goal is absolute perfection.

Pure and utter nonsense. Computer programming is an exact science (it's not too terribly hard to mathematically prove an algorithm's correctness, as Knuth and Dijkstra would atest). FPU/CPU are (by definition) fully deterministic state machines, which means your first argument is also invalid. dry.gif
Go to the top of the page
 
+Quote Post
rjamorim
post Nov 20 2006, 21:19
Post #28


Rarewares admin


Group: Members
Posts: 7515
Joined: 30-September 01
From: Brazil
Member No.: 81



QUOTE (greynol @ Nov 20 2006, 16:57) *
Certainly for flac, but what about other codecs?


Get the developer of these codecs to implement similar funcionality. Hurray!

QUOTE (HyperDrive @ Nov 20 2006, 17:09) *
Pure and utter nonsense. Computer programming is an exact science (it's not too terribly hard to mathematically prove an algorithm's correctness, as Knuth and Dijkstra would atest). FPU/CPU are (by definition) fully deterministic state machines, which means your first argument is also invalid. dry.gif


Lay off "The Art of Computer Programming" for a while, brainiac. CPUs/FPUs have bugs, you know?


--------------------
Get up-to-date binaries of Lame, AAC, Vorbis and much more at RareWares:
http://www.rarewares.org
Go to the top of the page
 
+Quote Post
greynol
post Nov 20 2006, 21:35
Post #29





Group: Super Moderator
Posts: 4791
Joined: 1-April 04
Member No.: 13167



QUOTE (rjamorim @ Nov 20 2006, 12:19) *
QUOTE (greynol @ Nov 20 2006, 16:57) *
Certainly for flac, but what about other codecs?

Get the developer of these codecs to implement similar funcionality. Hurray!

Amen!
Go to the top of the page
 
+Quote Post
jcoalson
post Nov 20 2006, 22:12
Post #30


FLAC Developer


Group: Developer
Posts: 1487
Joined: 27-February 02
Member No.: 1408



QUOTE (Jan S. @ Nov 20 2006, 14:39) *
I discussed some of the problems last night with Roberto and we came up with some viewpoints on this.
There seems to be two theoretical ways to go about this:

1. You mathematically analyse the algoritms and mathematically work out if it will be lossless for all possible samples.
The problem with this is however that the encoder/decoder is not a closed system thus you cannot possibly account for external variables like FPU and CPU. So eventhough you algoritms might be perfect you cannot be sure your output will be. Hence this type of test will be pointless if the goal is absolute perfection.

verifying algorithmic correctness is theoretically possible but can be very difficult. there is more danger in the implementation, which is what I mean by codec. I wasn't challenging the losslessness of formats themselves although I guess it could be possible.

QUOTE (Jan S. @ Nov 20 2006, 14:39) *
2. You run as much data thru the codec to establish a decent confidence level (if you generate random (but non-repeated blocks) thru the codec couldn't this actually be calculated?).
This should be a quit easy task if the author provides a way to do this automatically.

not sure what you mean by random, but I described above why noise may not be sufficient (e.g. to excite overflow problems in a filter).

QUOTE (Jan S. @ Nov 20 2006, 14:39) *
Then again isn't all of this a non-issue if people just use the -V switch?

not totally; I described above the kinds of errors that self checking cannot catch.

but anyway, even that would be a good addition to a general lossless comparison.

Josh
Go to the top of the page
 
+Quote Post
rjamorim
post Nov 20 2006, 22:23
Post #31


Rarewares admin


Group: Members
Posts: 7515
Joined: 30-September 01
From: Brazil
Member No.: 81



QUOTE (jcoalson @ Nov 20 2006, 18:12) *
but anyway, even that would be a good addition to a general lossless comparison.


I agree. Do people know what lossless codecs besides FLAC and TAK have -v switch or something similar?

Edit: I see wvunpack, ttaenc and mac.exe have -v too. ofr.exe has --verify. LPAC has -c. Any other takers?

This post has been edited by rjamorim: Nov 20 2006, 22:33


--------------------
Get up-to-date binaries of Lame, AAC, Vorbis and much more at RareWares:
http://www.rarewares.org
Go to the top of the page
 
+Quote Post
HbG
post Nov 21 2006, 00:00
Post #32





Group: Members
Posts: 289
Joined: 12-May 03
From: The Hague
Member No.: 6555



Is there an application that can compare the audio data in two .wav files and accepts piped input? I've searched for such a simple program but couldnt find it. Or some tool to test a flac against a wav.

I've used foobar's bitcompare with flake for a while, didn't find any differences. I'm using REACT for my ripping now, it'd be nice if i could automate testing with it by using the program described above, it'd be even nicer if many others would do the same.

By the way, flake deserves to be on rarewares!


--------------------
Veni Vidi Vorbis.
Go to the top of the page
 
+Quote Post
foosion
post Nov 21 2006, 00:16
Post #33





Group: FB2K Moderator (Donating)
Posts: 3809
Joined: 24-February 03
Member No.: 5153



QUOTE (rjamorim @ Nov 20 2006, 21:19) *
Lay off "The Art of Computer Programming" for a while, brainiac. CPUs/FPUs have bugs, you know?
There are two things you can do in this case:
  • Verify a piece of software given the execution model in the language specification of its implementation language. Then as a separate task, you can verify that the compiler and the runtime environment actually implement the semantics given in the language specification. This would be the usual way to verify the correctness of software.
  • Include specific errors as possibilities in the execution model you use to verify the software. This can be useful, if you are specifically interested in the fault tolerance of some system.
Actually, there is a third choice: You can take the position that outside events can cause arbitrary errors in the machine used to execute some software, and that you can never ever be sure anything works like intended. In that case you probably should spend your time on something more useful instead of worrying about the correctness of software.


--------------------
http://foosion.foobar2000.org/ - my components for foobar2000
Go to the top of the page
 
+Quote Post
TBeck
post Nov 21 2006, 00:47
Post #34


TAK Developer


Group: Developer
Posts: 887
Joined: 1-April 06
Member No.: 29051



QUOTE (rjamorim @ Nov 20 2006, 23:23) *
QUOTE (jcoalson @ Nov 20 2006, 18:12) *
but anyway, even that would be a good addition to a general lossless comparison.


I agree. Do people know what lossless codecs besides FLAC and TAK have -v switch or something similar?

Edit: I see wvunpack, ttaenc and mac.exe have -v too. ofr.exe has --verify. LPAC has -c. Any other takers?

I am not sure, if those functions are performing the same actions:

Possibility 1:

A verify or test function performed when decoding a compressed file: Look for decoding errors (invalid input for the decoder), calculate the checksum of the decoded data and compare it with the stored (by the encoder) checksum of the original data. This operation can be performed without the original uncompressed data available.

Possibility 2:

A verify function performed when encoding: Immediately decode the just encoded data and compare the whole data (not only it's CRC!) with the original data. This operation needs the original uncompressed data.

Obviously 2 should be able to detect any error, because each byte of the decoded data is beeing compared with the original data. 1 has to rely on the error detection strength of the checksum, which isn't perfect.

Possibly i am a bit anal, but this may also be true to some degree for this thread...

Edit: And possibly i am too ignorant and all the options of the different compressors you have listed are performing 2).

This post has been edited by TBeck: Nov 21 2006, 00:54
Go to the top of the page
 
+Quote Post
Mark0
post Nov 21 2006, 00:50
Post #35





Group: Members
Posts: 66
Joined: 15-August 02
Member No.: 3068



I just want to say that, if someones come up with a corpus of various audio files for this kind of lossless testing, I will gladly offer space and a tracker to host a .torrent with it, if needed.

Bye!

This post has been edited by Mark0: Nov 21 2006, 00:51


--------------------
RIFFStrip - http://mark0.net/soft-riffstrip-e.html
Go to the top of the page
 
+Quote Post
jcoalson
post Nov 21 2006, 02:36
Post #36


FLAC Developer


Group: Developer
Posts: 1487
Joined: 27-February 02
Member No.: 1408



QUOTE (TBeck @ Nov 20 2006, 18:47) *
Obviously 2 should be able to detect any error, because each byte of the decoded data is beeing compared with the original data. 1 has to rely on the error detection strength of the checksum, which isn't perfect.

if you're talking about the codec's own verify function, again I say that this would not have caught the error I found with flake.

Josh
Go to the top of the page
 
+Quote Post
cabbagerat
post Nov 21 2006, 07:11
Post #37





Group: Members
Posts: 1018
Joined: 27-September 03
From: Cape Town
Member No.: 9042



How about a distributed computing initiative? A small program could be developed which would run on somebody's PC (overnight), uncompress (to a temporary file) all their sound files, FLAC them, unFLAC them and compare the output. You could then get a bunch of HA members to run the program - giving an extremely large set of tests for every new FLAC version in a very short time.

It sounds like an odd idea - and it's by no means a proof - but will be a nice demonstration that FLAC works as claimed. I'd guess most FLAC users (including me) wouldn't mind donating a few cycles.

QUOTE (HyperDrive @ Nov 20 2006, 12:09) *
Pure and utter nonsense. Computer programming is an exact science (it's not too terribly hard to mathematically prove an algorithm's correctness, as Knuth and Dijkstra would atest). FPU/CPU are (by definition) fully deterministic state machines, which means your first argument is also invalid. dry.gif
It's not hard to prove an algorithm's correctness if that algorithm is a member of the small subset of extremely simple algorithms. Many real-world algorithms defy closed-form analysis. And of course you know that the general case is undecidable.


--------------------
Simulate your radar: http://www.brooker.co.za/fers/
Go to the top of the page
 
+Quote Post
HyperDrive
post Nov 21 2006, 10:06
Post #38





Group: Members
Posts: 9
Joined: 1-December 02
Member No.: 3945



QUOTE (rjamorim @ Nov 20 2006, 12:19) *
Lay off "The Art of Computer Programming" for a while, brainiac. CPUs/FPUs have bugs, you know?

Agreed. But if the underlying hardware doesn't work for the required operations, you have bigger problems than lossless audio compression and encoding. Besides, (good) compilers work around buggy instructions. If after the encoding/decoding process the output stream bitwise equals the input, I'd say they're identical...

QUOTE (cabbagerat @ Nov 20 2006, 22:11) *
It's not hard to prove an algorithm's correctness if that algorithm is a member of the small subset of extremely simple algorithms. Many real-world algorithms defy closed-form analysis. And of course you know that the general case is undecidable.

Also agreed, to a certain extent. However, even if not proven correct, FLAC and Monkey's Audio, for example, are around for a while and should be quite tested at this point. Assuming the algorithms are correct, I believe the remaining potential problems could be classified as paranoia. smile.gif
Go to the top of the page
 
+Quote Post
cabbagerat
post Nov 21 2006, 10:41
Post #39





Group: Members
Posts: 1018
Joined: 27-September 03
From: Cape Town
Member No.: 9042



QUOTE (HyperDrive @ Nov 21 2006, 01:06) *
Agreed. But if the underlying hardware doesn't work for the required operations, you have bigger problems than lossless audio compression and encoding. Besides, (good) compilers work around buggy instructions.
Not if the CPU is broken. Here are some graphs taken from three consecutive runs of an Octave program on a Duron 800 CPU which worked absolutely fine in OpenOffice, Firefox and Thunderbird.

http://www.brooker.co.za/brokenpc/broken1.png
http://www.brooker.co.za/brokenpc/broken2.png
http://www.brooker.co.za/brokenpc/broken3.png

And the output of the identical program on the same PC with the CPU swapped out for another Duron 800.
http://www.brooker.co.za/brokenpc/fixed.png

Which is what it is supposed to look like. Before you say that it's an Octave bug, I got the same problems with any double precision calculation performed on this CPU in both Linux and Windows. This sort of thing could really wreck a weekend of CD archiving.


--------------------
Simulate your radar: http://www.brooker.co.za/fers/
Go to the top of the page
 
+Quote Post
Synthetic Soul
post Nov 21 2006, 11:14
Post #40





Group: Super Moderator
Posts: 4732
Joined: 12-August 04
From: Exeter, UK
Member No.: 16217



QUOTE (TBeck @ Nov 20 2006, 23:47) *
Edit: And possibly i am too ignorant and all the options of the different compressors you have listed are performing 2).
I know that WavPack, True Audio, OptimFROG and Monkey's Audio are using the first method you list (verifying the compressed data). I suspect this is the standard verification method.

QUOTE (jcoalson @ Nov 21 2006, 01:36) *
if you're talking about the codec's own verify function, again I say that this would not have caught the error I found with flake.
I think Thomas was simply making a distinction between the two main approaches for the benefit of others.

QUOTE (cabbagerat @ Nov 21 2006, 06:11) *
It sounds like an odd idea - and it's by no means a proof - but will be a nice demonstration that FLAC works as claimed. I'd guess most FLAC users (including me) wouldn't mind donating a few cycles.
I don't think Josh started this to get more testing on FLAC. I would like to think that he is trying to improve all lossless codecs, and therefore any testing should not be FLAC-specific - we need a system that can be applied to any lossless codec, irrespective of whether they can validate while encoding.

QUOTE (HbG @ Nov 20 2006, 23:00) *
Is there an application that can compare the audio data in two .wav files and accepts piped input? I've searched for such a simple program but couldnt find it.
I was wondering about Mark0's RIFFStrip in conjunction with MD5 hashes of the resulting file...

It seems it would be very useful if one of our clever developers could create a specific app to help us with this task though. Or two: one to create random noise WAVE files and one to bit-compare WAVE audio data, possibly verifying valid headers also...

This post has been edited by Synthetic Soul: Nov 21 2006, 11:28
Go to the top of the page
 
+Quote Post
HyperDrive
post Nov 21 2006, 13:13
Post #41





Group: Members
Posts: 9
Joined: 1-December 02
Member No.: 3945



QUOTE (cabbagerat @ Nov 21 2006, 01:41) *
Not if the CPU is broken. Here are some graphs taken from three consecutive runs of an Octave program on a Duron 800 CPU which worked absolutely fine in OpenOffice, Firefox and Thunderbird.

http://www.brooker.co.za/brokenpc/broken1.png
http://www.brooker.co.za/brokenpc/broken2.png
http://www.brooker.co.za/brokenpc/broken3.png

And the output of the identical program on the same PC with the CPU swapped out for another Duron 800.
http://www.brooker.co.za/brokenpc/fixed.png

Which is what it is supposed to look like. Before you say that it's an Octave bug, I got the same problems with any double precision calculation performed on this CPU in both Linux and Windows. This sort of thing could really wreck a weekend of CD archiving.

Interesting, indeed. And that was precisely my point: If the underlying hardware is broken, you have bigger problems. Hardware bugs are (mostly) documented and/or fixed/worked around in later revisions, but in your case you had a broken FPU.
However (correct me if I'm wrong), in order to dump an audio CD, you should only need integer instructions (you're basically moving data around), which means you wouldn't end up with a corrupt .wav file. If the lossless compression process required floating-point, the resulting decompressed file would most likely differ from the original, exposing the FPU malfunction.
Go to the top of the page
 
+Quote Post
SebastianG
post Nov 21 2006, 13:39
Post #42





Group: Members
Posts: 1219
Joined: 20-March 04
From: Göttingen (DE)
Member No.: 12875



It's not only malfunction but different floating point formats (coding represents different subsets of |R) and different algorithms for + - * / (may round differently).
Go to the top of the page
 
+Quote Post
cabbagerat
post Nov 21 2006, 14:53
Post #43





Group: Members
Posts: 1018
Joined: 27-September 03
From: Cape Town
Member No.: 9042



QUOTE (HyperDrive @ Nov 21 2006, 04:13) *
However (correct me if I'm wrong), in order to dump an audio CD, you should only need integer instructions (you're basically moving data around), which means you wouldn't end up with a corrupt .wav file. If the lossless compression process required floating-point, the resulting decompressed file would most likely differ from the original, exposing the FPU malfunction.
If your lossless codec does not verify during encoding then problems like this one can make a mockery of even the best algorithm. I'm not saying they are common, just that there is a need for complete verification (encoder-decode-compare or equivalent) even with programs that are known to work. That isn't a perfect solution as it requires more cycles and can't work with on-the-fly encoding (unless something like an MD5 was taken beforehand), but is the only one that is truly foolproof.

Whether foolproof is necessary is a harder question smile.gif

This post has been edited by cabbagerat: Nov 21 2006, 14:58


--------------------
Simulate your radar: http://www.brooker.co.za/fers/
Go to the top of the page
 
+Quote Post
TBeck
post Nov 21 2006, 23:57
Post #44


TAK Developer


Group: Developer
Posts: 887
Joined: 1-April 06
Member No.: 29051



QUOTE (Synthetic Soul @ Nov 21 2006, 12:14) *
QUOTE (jcoalson @ Nov 21 2006, 01:36) *
if you're talking about the codec's own verify function, again I say that this would not have caught the error I found with flake.
I think Thomas was simply making a distinction between the two main approaches for the benefit of others.

Thanks! That's exactly what i wanted to.

If a new feature should be introduced into the comparison table, it would be nice to have an exact definition.

To Josh: I am well aware of the limits of internal verification functions... But i agree, i should be more exact.

QUOTE (Synthetic Soul @ Nov 21 2006, 12:14) *
QUOTE (TBeck @ Nov 20 2006, 23:47) *
Edit: And possibly i am too ignorant and all the options of the different compressors you have listed are performing 2).
I know that WavPack, True Audio, OptimFROG and Monkey's Audio are using the first method you list (verifying the compressed data). I suspect this is the standard verification method.

Then my distinction should make sense.

QUOTE (Synthetic Soul @ Nov 21 2006, 12:14) *
QUOTE (cabbagerat @ Nov 21 2006, 06:11) *
It sounds like an odd idea - and it's by no means a proof - but will be a nice demonstration that FLAC works as claimed. I'd guess most FLAC users (including me) wouldn't mind donating a few cycles.
I don't think Josh started this to get more testing on FLAC. I would like to think that he is trying to improve all lossless codecs, and therefore any testing should not be FLAC-specific - we need a system that can be applied to any lossless codec, irrespective of whether they can validate while encoding.
...
It seems it would be very useful if one of our clever developers could create a specific app to help us with this task though. Or two: one to create random noise WAVE files and one to bit-compare WAVE audio data, possibly verifying valid headers also...


This could be an interesting project. Probably some more evaluation and thinking is necessary, if a systematic testing of critical conditions/files is really significantly better than testing a large pool of random files. Possibly my initial enthusiasm for the benefits of the testing of generally critical files was not adequate.
Go to the top of the page
 
+Quote Post

2 Pages V  < 1 2
Reply to this topicStart new topic
1 User(s) are reading this topic (1 Guests and 0 Anonymous Users)
0 Members:

 



RSS Lo-Fi Version Time is now: 21st November 2009 - 12:44