Of course, it all depends on the purpose of the test...
And I read Dibrom's argument about showing and hiding a codec's flaws...
BUT...
IF, on a wider sample, mp3pro and Vorbis get the same average bitrate of 64kbps, while in the test range there is this difference...
You could say this test is showing Vorbis's flaw in not allocating enough bits for this test range, but you could just as well say that it is showing mp3pro's flaw in allocating too many bits for this test range! (if it *had* to bloat in order to maintain quality, that's another kind of flaw again, so that argument doesn't really work) And it's unfair to penalize Vorbis for the former but reward mp3pro for the latter.
And the sample set would be biasing in the favour of mp3pro for choosing samples it bloats on and against Vorbis for choosing samples it shrinks on. Moreover, going on the assumption in this section that their average bitrates over a wide selection are the same, the test will not be 'realistic' because people won't be getting a lower bitrate in general with Vorbis and getting these shrinking problems!
If you really want to test the weaknesses of mp3pro and Vorbis in bloating and shrinking, you should include in your sample set equal amounts of bloating and shrinking files for both mp3pro and Vorbis, and see whether mp3pro suffers more from its shrinking than Vorbis, and mp3pro benefits more from its bloating than Vorbis, or is it the other way round? I.e. who is more justified in their bloating and shrinking?
But obviously this is not an alternative. The time it would take to find such a sample set would be unimaginable and the sample size would be too large to make a practical test.
If it turns out that even in a wide selection mp3pro generally gets more bitrate than Vorbis, the answer is even more clear: relative to Vorbis mp3pro is cheating across the board! (EDIT: or maybe not, after reading rjamorim's post. But I believe the rest still applies if you believe the codecs are generally matched at 64kbps)
Even if you think that bloating is always justified while shrinking is always a flaw (to justify Dibrom's argument?), the test is still biased against Vorbis for choosing samples IT shrinks on instead of samples that some other codec shrinks on!
If the goal were to see which codec gives the most bang for the same buck, the approach to take is even more clear (see my last post).
So, IMHO it is a better idea to standardize the bitrate in the test and then put a note in the results saying that you had to adjust Vorbis up, and/or mp3pro down, etc. and let people draw their own conclusions. After reading the test results people might even start to use q0.2 instead of q0, *making* the results more applicable to real life!
The test is all over and I don't even use any of these codecs, so I don't know why I'm so serious about it

But I'm very convinced that I am right