DAMAGE - File damaging tool

Topic: DAMAGE - File damaging tool (Read 7898 times) previous topic - next topic

0 Members and 1 Guest are viewing this topic.

DAMAGE - File damaging tool

2006-07-05 21:29:23

Well, i know that there is an Upload section. But i first would like to ask, if anyone would find this little tool useful. It's one several tools i have written to test Yalac and other lossless audio codecs.

The documentation is in html, but i don't know how to insert it here. Therefore i perform a plain text copy. May look a bit strange.

Overview

I wrote Damage to test the error recognition and recovery abilities of my new lossless audio compressor YALAC.

Damage generates a copy of user selected files, applies the extension '.err' and then damages the copies. You can define damage (bit-) patterns and the frequency of the damage. A list of the changes of the data is beeing written to a protocol file.

Command line options

Helpscreen

Code: [Select]

DAMAGE files [-e -f -s -w]

files       specify file or directory (Dir\*.ext) to be processed
-e x1 x2... specify up to 5 errors xn of type i or r:
              i 3     = (i)nvert a sequence of 3 bits
              r 10010 = (r)eplace a bit sequence with bits 10010
                        (msb left, up to 40 bits)
            The default definition is:
              i 1 i 2 i 3 i 36 r 000000000000000000000000000000000000
-f r x      relative frequency of damage as errors per MByte.
              Maximum: 128 Default: 1
-f a x      absolute frequency of damage as errors per file.
              Up to 1024, but will also be limited to relative maximum.
-s x        no damage until file position x in bytes. Default: 4096
-w          wait for enter key when finished

files

Specify a single file or use wildcards.

Examples:

d:\VocComp_Data\Sample.wav

Damage file "Sample.wav" in directory "d:\VocComp_Data".

d:\VocComp_Data\*.wav

Damage any file with the extension ".wav" in directory "d:\VocComp_Data".

*.*

Damage any file ".wav" in the current directory.

Damage creates files with the same name as the source, but with the extension '.err':

Sample.wav -> Sample.wav.err

Existing files will always be overwritten without any warning!

-e x1 x2...

Specify up to 5 errors xn of type i or r:

i 3

(I)nvert a sequence of 3 bits. All bits are beeing flipped. Valid range: 1 to 40 bits.

r 10010

®eplace a bit sequence with bits 10010. The leading (left bit) is the most significant. Valid range: 1 to 40 bits.

The default definition is:

i 1
i 2
i 3
i 36
r 000000000000000000000000000000000000

-f

Specify the frequency of the errors. The error patterns (see -e) will be randomly repeated if necessary.

Specify a relative frequency as errors per MByte:

-f r 8

Generates 8 errors per MByte. Maximum: 128.

Or specify the absolute frequency of damage as errors per file:

-f a 25

Generates 25 errors per file. The count will be limited to the relative maximum of 128 per MByte, if the file is small.

In both cases the error count is limited to 1024 errors per file.

The default setting is: -f r 1.

-s x

No damage until file position x in bytes. Default: 4096.

Useful if you don't want to damage a file header.

Protocol file

Damage generates a protocol file "Damage_Result.txt" in the source file directory. It contains a detailed list of any changes performed on the files.

Example:

Code: [Select]

D:\VocComp\Tools\ATrain.yaa

No   Position              BitOfs BitNum Original           New value
   1      142614  00022D16      0     36  B1 DA 8A 49 DD     00 00 00 00 D0   
   2      444685  0006C90D      0     36  6F 17 E4 F2 14     90 E8 1B 0D 1B   
   3      771488  000BC5A0      5      1  95                 B5               
   4     1046975  000FF9BF      5      3  9A                 7A               
   5     1264657  00134C11      1      2  78                 7E               
   6     1454473  00163189      5     36  7A F5 C2 36 2A 36  1A 00 00 00 00 36

The file name is beeing followed by a list of the applied errors, one per line.

Position

File position of the first affected byte in bytes. First in decimal, then in hexadecimal representation.

BitOfs

Position (0-7) of the first affected bit in the byte specified by file position.

BitNum

Count of affected bits.

Original - New value

Comparison of the original and the new values after the damage. Both in hexadecimal notation, least significant (lowest adress) byte left.

DAMAGE - File damaging tool

Reply #1 – 2006-07-05 21:43:40

This program is interesting, but it would be nice if you could set a damage frequency and generate damage randomly. Maybe using the Mersenne Twister could help you; Also, it would be nice having random bits instead of just inversed ones, or replaced with known bits.

Compare feature is nice, though. So is the copy feature, but it should be toggleable.

Also, you should be able to limit damage to a specific zone (in bytes?) in the file, or to limit the damage per zone (eg, 3 bits changed every 30 bytes, maximum)

Please tell me if this is clear, or if I am asking too much.

You're my favourite developper

Peace,
Tristan.

DAMAGE - File damaging tool

Reply #2 – 2006-07-05 22:17:42

Quote

' date='Jul 5 2006, 22:43' post='409194']
This program is interesting, but it would be nice if you could set a damage frequency and generate damage randomly. Maybe using the Mersenne Twister could help you; Also, it would be nice having random bits instead of just inversed ones, or replaced with known bits.

There is some randomness in the positions. The frequency specification defines intervals, for instance 1 MB per error. The first error will be inserted between 0 to 1.5 MB the next between the end of the previous and 2.5 MB and so on.

My first implemementation has used random bit patterns. But that was not optimal for my purposes. I like to have control over the conditions. Otherwise the results would not be easy to interpret. And it seems to make even more sense to use controled conditions for comparisons between compressor. You can evaluate, how resistent they for instance are to 1 bit or 2 bit errors, but if you apply random errors and get different results for two compressors, you will not know, if this is beeing caused by the difference of the compressors or by the diffence of the test data caused by the randomness.

Quote

' date='Jul 5 2006, 22:43' post='409194']
Also, you should be able to limit damage to a specific zone (in bytes?) in the file, or to limit the damage per zone (eg, 3 bits changed every 30 bytes, maximum)

That's indeed useful!

Quote

' date='Jul 5 2006, 22:43' post='409194']
You're my favourite developper

You know, this one is especially for you...

DAMAGE - File damaging tool

Reply #3 – 2006-07-06 06:58:16

neat program.

Quote from: TBeck on 2006-07-05 22:17:42

My first implemementation has used random bit patterns. But that was not optimal for my purposes. I like to have control over the conditions. Otherwise the results would not be easy to interpret. And it seems to make even more sense to use controled conditions for comparisons between compressor. You can evaluate, how resistent they for instance are to 1 bit or 2 bit errors, but if you apply random errors and get different results for two compressors, you will not know, if this is beeing caused by the difference of the compressors or by the diffence of the test data caused by the randomness.

this can mostly be solved by using a pseudo-random generator and exposing the seed as a command-line option.

Josh

DAMAGE - File damaging tool

Reply #4 – 2006-07-06 08:06:04

I just want to say that real errors are usually "bursts". (a group of consecutive bytes that are totally wrong -- ie a whole sector).

This could be modeled via a two-state system. Depending on the state the current bit will either be kept or replaced by a randomly chosen one. After processing a bit you either stay in the same state or change to the other based on a randomly chosen number and a threshold. Obviously the model's parameter are the two thresholds (one for each state). The initial state should be the "keep-original-data-state".

I just realized that this is actually a simulation of a Markov process.

DAMAGE - File damaging tool

Reply #5 – 2006-07-06 08:29:40

Quote from: jcoalson on 2006-07-06 06:58:16

neat program.

Yes, nothing special. But it's possibly easier than using a hex editor for the intended purpose.

Quote from: jcoalson on 2006-07-06 06:58:16

this can mostly be solved by using a pseudo-random generator and exposing the seed as a command-line option.

Good idea! I will add this as option.

DAMAGE - File damaging tool

Reply #6 – 2006-07-06 09:31:34

Quote from: SebastianG on 2006-07-06 08:06:04

I just want to say that real errors are usually "bursts". (a group of consecutive bytes that are totally wrong -- ie a whole sector).

Good point! Possibly i will add an option to generate such erors.

I know, that the current implementation is very limited. It would be nice to be able to specify some model of the expected errors: error types, distribution of the error types, distribution of distances between errors and possibly more. But this is currently beyond the scope of my quick and dirty tool.

Quote from: SebastianG on 2006-07-06 08:06:04

This could be modeled via a two-state system. Depending on the state the current bit will either be kept or replaced by a randomly chosen one. After processing a bit you either stay in the same state or change to the other based on a randomly chosen number and a threshold. Obviously the model's parameter are the two thresholds (one for each state). The initial state should be the "keep-original-data-state".

I just realized that this is actually a simulation of a Markov process.

That's interesting. Honestly i don't know nearly nothing about Markov processes, but this might be a good starting point for me, if i should like to optimize this tiny tool.

DAMAGE - File damaging tool

Reply #7 – 2006-07-06 16:10:43

While I think a complete error simulation would be overkill, support for both random bit errors and burst errors would be useful. This is because error correction schemes respond differently to these two different classes of errors and may perform very well on one and really badly on the other.

DAMAGE - File damaging tool

Reply #8 – 2006-07-06 18:17:06

Quote from: cabbagerat on 2006-07-06 16:10:43

While I think a complete error simulation would be overkill, support for both random bit errors and burst errors would be useful. This is because error correction schemes respond differently to these two different classes of errors and may perform very well on one and really badly on the other.

I don't see a need for random patterns. The default test set allready provides some randomness:

- The test patterns are beeing applied to random positions within the file.
- The inversion of existing bits at random positions creates different patterns according to the variations of the original bits. Obvious exception: if the data bits don't vary (for instance all zero), the inversion brings no variation. But this is very unlikely to happen with compressed data, that should be random to some degree.

I am interested into 2 test cases:

1) If the data is beeing protected by a CRC, which bit patterns will stay undetected.

To simplify it a bit: CRC's should be able to detect any 1 bit error and any (single) 2 bit error if the data size isn't to big. Bursts should be detected up to the bit count of the CRC (again simplified). Damage's default test set damages 1, 2, 3 and 36 bits. Allready a small chance to fall though the CRC-32.

2) If the data is not protected, what happens to the decoder.

In this case allready a single bit error in the right place can bring the decoder into trouble.

DAMAGE - File damaging tool

Reply #9 – 2006-07-06 18:31:48

Quote from: TBeck on 2006-07-06 18:17:06

Quote from: cabbagerat on 2006-07-06 16:10:43

While I think a complete error simulation would be overkill, support for both random bit errors and burst errors would be useful. This is because error correction schemes respond differently to these two different classes of errors and may perform very well on one and really badly on the other.

I don't see a need for random patterns. The default test set allready provides some randomness:

- The test patterns are beeing applied to random positions within the file.
- The inversion of existing bits at random positions creates different patterns according to the variations of the original bits. Obvious exception: if the data bits don't vary (for instance all zero), the inversion brings no variation. But this is very unlikely to happen with compressed data, that should be random to some degree.

That's more or less what I meant about "random bit errors" - the sort of errors that noise introduces on some sorts of channel. Maybe what i should have said is "errors where every bit in the file has the same finite probability of being flipped" and the second kind being "errors where the probability a bit is flipped depends on recent bit flips". The first kind you have already implemented, and some people have already suggested the second kind.

These two types of errors will provide decent coverage for testing commonly used ECC schemes, ranging from the simple (CRC) to more complex things like VRS and Turbo codes. I don't know if any user file formats actually use these more advanced schemes, but it would still be a cool feature, IMHO.

Anyways, it looks like a cool piece of software and is a great idea. Good work.

Notice