I will just say one more thing before I wave my white flag and/or get banned for being too stubborn, whichever comes first

Please, spend some time on my little rant, one last time.
Another 'hypothetical situation' for you:
Suppose we have a test between
2 encoders, both
VBR and only
2 samples.
- Bachpsichord, total playtime say 10 seconds.
- Hardrock, also 10 seconds playtime.
Now we run the encoders with 'quality settings such that they will average at 128kbit, or to put it differently, such that in the end the
sum of all filesizes add up to exactly the same: 327680 bytes. Both encoders have used the same amount of bytes to 'play with'.
Now. The results: *drumroll*
Encoder A:
- Bachpsichord, encoded at 192kbit average.
- Hardrock, encoded at 64kbit average
Encoder B:
- Bachpsichord, encoded at 128kbit average.
- Hardrock, encoded at 128kbit average.
Now we run the listening tests, and suppose it turns out that overall, the Encoder A and Encoder B versions of both Bachpsichord and hardrock tracks
get the same score. (they don't sound the same, they just sound equally 'pleasing' when you consider overall score). Or if you wish, more skewed: Say that Encoder A gets a slightly higher score than B on bachpsichord, but a slightly lower score than B on hardrock and the average score is the same.
And now my point is:
The result: "well, that's that, Encoder A and Encoder B are equal for 128kbit." is what would be the conclusion of this test like everybody supports here. And I agree! It is
Fair.
But.
If you weigh the averages, you can say
MORE. There is more information in the results! You can then say:
- Encoder A probably has an
inferior coding method for
bachpsichord, and needs more bits to reach its target quality.
- Encoder A probably has a
superior coding method for
hardrock and needed fewer bits to reach its target quality
And this is useful information, wouldn't you say? Isnt this what this test is all about? Finding out as much as we can from listening tests we do, trying to compare the various files? Wouldn't it be nice to be able to say
"Well, for Classical music, mpc scored highest, while for hardrock ogg vorbis and AAC are the preferred choices? (just making something up here).
Plus, of course my example is highly stylized. With a lot more encoders, a lot more samples, filesizes and so on, it is not inconceivable that interesting extra results can come to the light. Again, I'm not saying this weighing scheme is rock solid, it, like the test itself, gives some extra data to think about. I really don't see why you would want to ignore this.
As a last comparison, if you are choosing which car to buy, and a test gives Car A and Car B *exactly* the same score and same price, then you don't just pick one blindfolded. No. You go and check what exact features both cars have that
you consider important. One car will go faster while the other is safer. A USEFUL car test won't just say "Car A and B are equal", no, a useful test will say "Car A and B are equal, Car A is faster and more dangerous, car B is slower and safer". Pick what you prefer.