Attempt to improve lossless comparison tests
Reply #5 – 2005-01-07 16:56:01
music/sound/noise is dynamic and will resulting data will obviously change as the input changes... so the goal of benchmark style teseting is to include an exhaustive amount of data as to make a resulting deviance minimal or not important. within the context of benchmark testing, codecs must have a resulting value - i think you are confusing percentage (compression ratio) with a static value rather than a value generated using dynamic input. [a href="index.php?act=findpost&pid=263068"][{POST_SNAPBACK}][/a] My argument is that by compressing a hundred 10 second samples you have a selection of music with lower deviance than by compressing five full songs. If we somehow had access to every hip hop song made and found that most would compress to between 50% and 56%, a comparison test that gave the single value of 53% would be good in my opinion. If a different test which used a small sample gave 55%, this isn't such a good representation of that genre. also, why test a codec's ability to compress random/frankenstein noise samples when that codec will not be used for that purpose? [a href="index.php?act=findpost&pid=263068"][{POST_SNAPBACK}][/a] Frankenstien samples is a good way to describe them, but the aim is to make ones aren't just random noise, they should represent the actual music people compress very well. the munger is definitely a useful tool. but even random selection from a corpus will still reflect the overall corpus bias. the only way around this is to have as large and varied a corpus as possible. but then you will only get an "average" number. the measures that will be more relevant to people will still be genre-based and format-based ... it is among formats and genres that the codecs tend to differ more. [a href="index.php?act=findpost&pid=263195"][{POST_SNAPBACK}][/a] I would like to see a number of corpus' [span style='font-size:8pt;line-height:100%'](corpii?)[/span] to reflect the most common genres and produce the kind of test bawjaws mentions. Perhaps being able to collaberate with a number of people to create these based off several thousand songs would help create 'well rounded' files to base the test on.As for faults in your reasoning/testing, the only thing I could find that definitely needs a bit of looking into is the uncertainty in the correction factor. Ideally it should be minimised as much as possible, perhaps by choosing longer sections, by modifying the way you join sections, or even changing the way a file is munged. ... Theoretically it might even be possible to simply generate a short piece of sound that just has properties similar to the original, but isn't even composed of actual sections from the original (based on frequency data for example). [a href="index.php?act=findpost&pid=263323"][{POST_SNAPBACK}][/a] Crossfading samples was something I wanted to try next, it might help to ease the codec into the next sample and improve the error. One thing I have tested was to select all the random positions first, then sort them into 'ascending order' in time. If the collection is ordered into genre and then into artist, this would place all the chunks of the same artist together, and if there are two chunks from a given song, they will be next to eachother. This only has a noticable effect with short chunk length when there are 2 or more chunks per song on average. I have my doubts about constructing artificial data to test codecs on, it stops being a 'real world' test if you don't even base the material on real music. Aside from that issue, I'd hate to think how complex creating a song that has representative patterns of all music in it!I also have a question about your first graphs (showing the results for selecting tracks and albums). If I understand correctly you compressed quite a few tracks and then calculated what different combinations of those tracks would yield as a result, if this is the case I'd imagine that quite a lot of combinations would be possible (hundreds), probably all with slightly varying data sizes. So I suspect you grouped them together somehow, but my question is how did you group them and what was your reason for doing it that way? (Or am I overlooking something?) I'm not very good at statistics, but I imagine it could influence the results if, for example, groups with large data sizes had more members. [a href="index.php?act=findpost&pid=263323"][{POST_SNAPBACK}][/a] I think if you have a particular distribution of numbers [span style='font-size:8pt;line-height:100%'](the compression ratio of different songs)[/span] you should get the same value for the deviation no matter what sample size you use. With a bigger sample size you get a better estimation of this deviation, but the number shouldn't rise or fall in a systematic way as you increase the sample size - just fluctuate about a central point. Can anyone confirm or correct this? The program creates a list of 500 compression ratios [span style='font-size:8pt;line-height:100%'](if I have 500 songs)[/span], so the sample size of '1 track combinations' is 500. It then pairs them up into combinations of 2 tracks, which it would choose every possible combination for, so every possible data size and compression ratio is found. Once you get to looking at combinations of 3 tracks, there are 20'000'000 ways, so it becomes far too slow to calculate every choice. If the sample size is going to be over a million, it doesn't look at every combination, but picks out combinations at random the way a lottery machine picks a combination of 6 numbered balls. It just picks a set of track id numbers, and again has no knowledge about the data size in a particular combination. I think it should end up as being fair, as it picks enough combinations [span style='font-size:8pt;line-height:100%'](500'000)[/span] to remove the effect of some combinations having an unusually small or large data size. I hope that explains its operation better to you?