DAMAGE - File damaging tool, Useful to test the error resistance of lossless codecs |
![]() ![]() |
DAMAGE - File damaging tool, Useful to test the error resistance of lossless codecs |
Jul 5 2006, 21:29
Post
#1
|
|
|
TAK Developer Group: Developer Posts: 887 Joined: 1-April 06 Member No.: 29051 |
Well, i know that there is an Upload section. But i first would like to ask, if anyone would find this little tool useful. It's one several tools i have written to test Yalac and other lossless audio codecs.
The documentation is in html, but i don't know how to insert it here. Therefore i perform a plain text copy. May look a bit strange. Overview I wrote Damage to test the error recognition and recovery abilities of my new lossless audio compressor YALAC. Damage generates a copy of user selected files, applies the extension '.err' and then damages the copies. You can define damage (bit-) patterns and the frequency of the damage. A list of the changes of the data is beeing written to a protocol file. Command line options Helpscreen CODE DAMAGE files [-e -f -s -w] files specify file or directory (Dir\*.ext) to be processed -e x1 x2... specify up to 5 errors xn of type i or r: i 3 = (i)nvert a sequence of 3 bits r 10010 = (r)eplace a bit sequence with bits 10010 (msb left, up to 40 bits) The default definition is: i 1 i 2 i 3 i 36 r 000000000000000000000000000000000000 -f r x relative frequency of damage as errors per MByte. Maximum: 128 Default: 1 -f a x absolute frequency of damage as errors per file. Up to 1024, but will also be limited to relative maximum. -s x no damage until file position x in bytes. Default: 4096 -w wait for enter key when finished files Specify a single file or use wildcards. Examples: d:\VocComp_Data\Sample.wav Damage file "Sample.wav" in directory "d:\VocComp_Data". d:\VocComp_Data\*.wav Damage any file with the extension ".wav" in directory "d:\VocComp_Data". *.* Damage any file ".wav" in the current directory. Damage creates files with the same name as the source, but with the extension '.err': Sample.wav -> Sample.wav.err Existing files will always be overwritten without any warning! -e x1 x2... Specify up to 5 errors xn of type i or r: i 3 (I)nvert a sequence of 3 bits. All bits are beeing flipped. Valid range: 1 to 40 bits. r 10010 ®eplace a bit sequence with bits 10010. The leading (left bit) is the most significant. Valid range: 1 to 40 bits. The default definition is: i 1 i 2 i 3 i 36 r 000000000000000000000000000000000000 -f Specify the frequency of the errors. The error patterns (see -e) will be randomly repeated if necessary. Specify a relative frequency as errors per MByte: -f r 8 Generates 8 errors per MByte. Maximum: 128. Or specify the absolute frequency of damage as errors per file: -f a 25 Generates 25 errors per file. The count will be limited to the relative maximum of 128 per MByte, if the file is small. In both cases the error count is limited to 1024 errors per file. The default setting is: -f r 1. -s x No damage until file position x in bytes. Default: 4096. Useful if you don't want to damage a file header. Protocol file Damage generates a protocol file "Damage_Result.txt" in the source file directory. It contains a detailed list of any changes performed on the files. Example: CODE D:\VocComp\Tools\ATrain.yaa No Position BitOfs BitNum Original New value 1 142614 00022D16 0 36 B1 DA 8A 49 DD 00 00 00 00 D0 2 444685 0006C90D 0 36 6F 17 E4 F2 14 90 E8 1B 0D 1B 3 771488 000BC5A0 5 1 95 B5 4 1046975 000FF9BF 5 3 9A 7A 5 1264657 00134C11 1 2 78 7E 6 1454473 00163189 5 36 7A F5 C2 36 2A 36 1A 00 00 00 00 36 The file name is beeing followed by a list of the applied errors, one per line. Position File position of the first affected byte in bytes. First in decimal, then in hexadecimal representation. BitOfs Position (0-7) of the first affected bit in the byte specified by file position. BitNum Count of affected bits. Original - New value Comparison of the original and the new values after the damage. Both in hexadecimal notation, least significant (lowest adress) byte left. |
|
|
|
Jul 5 2006, 21:43
Post
#2
|
|
![]() Group: Members Posts: 1182 Joined: 19-May 05 From: Montreal, Canada Member No.: 22144 |
This program is interesting, but it would be nice if you could set a damage frequency and generate damage randomly. Maybe using the Mersenne Twister could help you; Also, it would be nice having random bits instead of just inversed ones, or replaced with known bits.
Compare feature is nice, though. So is the copy feature, but it should be toggleable. Also, you should be able to limit damage to a specific zone (in bytes?) in the file, or to limit the damage per zone (eg, 3 bits changed every 30 bytes, maximum) Please tell me if this is clear, or if I am asking too much. You're my favourite developper Peace, Tristan. |
|
|
|
Jul 5 2006, 22:17
Post
#3
|
|
|
TAK Developer Group: Developer Posts: 887 Joined: 1-April 06 Member No.: 29051 |
QUOTE ' date='Jul 5 2006, 22:43' post='409194'] This program is interesting, but it would be nice if you could set a damage frequency and generate damage randomly. Maybe using the Mersenne Twister could help you; Also, it would be nice having random bits instead of just inversed ones, or replaced with known bits. There is some randomness in the positions. The frequency specification defines intervals, for instance 1 MB per error. The first error will be inserted between 0 to 1.5 MB the next between the end of the previous and 2.5 MB and so on. My first implemementation has used random bit patterns. But that was not optimal for my purposes. I like to have control over the conditions. Otherwise the results would not be easy to interpret. And it seems to make even more sense to use controled conditions for comparisons between compressor. You can evaluate, how resistent they for instance are to 1 bit or 2 bit errors, but if you apply random errors and get different results for two compressors, you will not know, if this is beeing caused by the difference of the compressors or by the diffence of the test data caused by the randomness. QUOTE ' date='Jul 5 2006, 22:43' post='409194'] Also, you should be able to limit damage to a specific zone (in bytes?) in the file, or to limit the damage per zone (eg, 3 bits changed every 30 bytes, maximum) That's indeed useful! QUOTE ' date='Jul 5 2006, 22:43' post='409194'] You're my favourite developper You know, this one is especially for you... |
|
|
|
Jul 6 2006, 06:58
Post
#4
|
|
|
FLAC Developer Group: Developer Posts: 1487 Joined: 27-February 02 Member No.: 1408 |
neat program.
My first implemementation has used random bit patterns. But that was not optimal for my purposes. I like to have control over the conditions. Otherwise the results would not be easy to interpret. And it seems to make even more sense to use controled conditions for comparisons between compressor. You can evaluate, how resistent they for instance are to 1 bit or 2 bit errors, but if you apply random errors and get different results for two compressors, you will not know, if this is beeing caused by the difference of the compressors or by the diffence of the test data caused by the randomness. this can mostly be solved by using a pseudo-random generator and exposing the seed as a command-line option. Josh |
|
|
|
Jul 6 2006, 08:06
Post
#5
|
|
![]() Group: Members Posts: 1219 Joined: 20-March 04 From: Göttingen (DE) Member No.: 12875 |
I just want to say that real errors are usually "bursts". (a group of consecutive bytes that are totally wrong -- ie a whole sector).
This could be modeled via a two-state system. Depending on the state the current bit will either be kept or replaced by a randomly chosen one. After processing a bit you either stay in the same state or change to the other based on a randomly chosen number and a threshold. Obviously the model's parameter are the two thresholds (one for each state). The initial state should be the "keep-original-data-state". I just realized that this is actually a simulation of a Markov process. This post has been edited by SebastianG: Jul 6 2006, 08:16 |
|
|
|
Jul 6 2006, 08:29
Post
#6
|
|
|
TAK Developer Group: Developer Posts: 887 Joined: 1-April 06 Member No.: 29051 |
neat program. Yes, nothing special. But it's possibly easier than using a hex editor for the intended purpose. this can mostly be solved by using a pseudo-random generator and exposing the seed as a command-line option. Good idea! I will add this as option. |
|
|
|
Jul 6 2006, 09:31
Post
#7
|
|
|
TAK Developer Group: Developer Posts: 887 Joined: 1-April 06 Member No.: 29051 |
I just want to say that real errors are usually "bursts". (a group of consecutive bytes that are totally wrong -- ie a whole sector). Good point! Possibly i will add an option to generate such erors. I know, that the current implementation is very limited. It would be nice to be able to specify some model of the expected errors: error types, distribution of the error types, distribution of distances between errors and possibly more. But this is currently beyond the scope of my quick and dirty tool. This could be modeled via a two-state system. Depending on the state the current bit will either be kept or replaced by a randomly chosen one. After processing a bit you either stay in the same state or change to the other based on a randomly chosen number and a threshold. Obviously the model's parameter are the two thresholds (one for each state). The initial state should be the "keep-original-data-state". I just realized that this is actually a simulation of a Markov process. That's interesting. Honestly i don't know nearly nothing about Markov processes, but this might be a good starting point for me, if i should like to optimize this tiny tool. |
|
|
|
Jul 6 2006, 16:10
Post
#8
|
|
![]() Group: Members Posts: 1018 Joined: 27-September 03 From: Cape Town Member No.: 9042 |
While I think a complete error simulation would be overkill, support for both random bit errors and burst errors would be useful. This is because error correction schemes respond differently to these two different classes of errors and may perform very well on one and really badly on the other.
-------------------- Simulate your radar: http://www.brooker.co.za/fers/
|
|
|
|
Jul 6 2006, 18:17
Post
#9
|
|
|
TAK Developer Group: Developer Posts: 887 Joined: 1-April 06 Member No.: 29051 |
While I think a complete error simulation would be overkill, support for both random bit errors and burst errors would be useful. This is because error correction schemes respond differently to these two different classes of errors and may perform very well on one and really badly on the other. I don't see a need for random patterns. The default test set allready provides some randomness: - The test patterns are beeing applied to random positions within the file. - The inversion of existing bits at random positions creates different patterns according to the variations of the original bits. Obvious exception: if the data bits don't vary (for instance all zero), the inversion brings no variation. But this is very unlikely to happen with compressed data, that should be random to some degree. I am interested into 2 test cases: 1) If the data is beeing protected by a CRC, which bit patterns will stay undetected. To simplify it a bit: CRC's should be able to detect any 1 bit error and any (single) 2 bit error if the data size isn't to big. Bursts should be detected up to the bit count of the CRC (again simplified). Damage's default test set damages 1, 2, 3 and 36 bits. Allready a small chance to fall though the CRC-32. 2) If the data is not protected, what happens to the decoder. In this case allready a single bit error in the right place can bring the decoder into trouble. |
|
|
|
Jul 6 2006, 18:31
Post
#10
|
|
![]() Group: Members Posts: 1018 Joined: 27-September 03 From: Cape Town Member No.: 9042 |
While I think a complete error simulation would be overkill, support for both random bit errors and burst errors would be useful. This is because error correction schemes respond differently to these two different classes of errors and may perform very well on one and really badly on the other. I don't see a need for random patterns. The default test set allready provides some randomness: - The test patterns are beeing applied to random positions within the file. - The inversion of existing bits at random positions creates different patterns according to the variations of the original bits. Obvious exception: if the data bits don't vary (for instance all zero), the inversion brings no variation. But this is very unlikely to happen with compressed data, that should be random to some degree. These two types of errors will provide decent coverage for testing commonly used ECC schemes, ranging from the simple (CRC) to more complex things like VRS and Turbo codes. I don't know if any user file formats actually use these more advanced schemes, but it would still be a cool feature, IMHO. Anyways, it looks like a cool piece of software and is a great idea. Good work. -------------------- Simulate your radar: http://www.brooker.co.za/fers/
|
|
|
|
![]() ![]() |
|
Lo-Fi Version | Time is now: 22nd November 2009 - 04:47 |