I have some fears.
1/ A public listening test seems to be a lot of work. Choosing samples, finding the good presets, building the config txt files, uploading them, collecting informations, motivating people before the end of the test, fighting against criticism after the test... I fear that nobody will accept to conduct this kind of test (I prefer to see you, Nyaochi or Aoyumi spending their limited time on improving vorbis codec

)
2/ A listening test with vorbis 1.01 as expected "loser" is certainly difficult for most people. But we need significant results. At the end, we need to see clear differences (even if differences are small). Otherwise, the test is useless. I fear that few people could be helpful here... Willingness people are maybe tired, with MP3@128 than AAC@128 and soon multi@128. Add a vorbis@128 in the same month, and I fear that people are going to be mad.
I suggest the following thing. If few people could send useful results, I don't think it's really necessary for someone to prepare archive samples, abc/hr material, etc... all the painful work. The challengers are not related to different editors (with "good" one and "evil" one), or linked to different ideology ("drmed crap format" against "paradisiacal free format"). I don't think that people are going to spend their time cheat in order to favour a specific encoder: all codec seems to be neutral.
Therefore, I suggest that people should prepare themself their own test material. All we need are to specify the conditions :
- samples
- challengers and settings
- if necessary, some rules if listeners are confront to some problems (ex. if you failed on ABX tests, listener must put the slider on 5.0, or eventually on 4.9, but not keeping the initial note of 3.7... seems logical to me, and this will avoid some epistemological problems encountered by Roberto). Other rule: testing all samples, and not only three or four.
I suppose and expect that 6 or 7 trained people will participate to this test. Nyaochi, Aoyumi, QuantumKnot, me. Maybe [proxima] or some other good listeners interested by vorbis, or simply interested about quality. Of course, it's not fully scientific. Just something pragmatic. We need good leads about quality of current vorbis encoders, before chosing one for the next multiformat test. Results should be available for public of course, and "winner" should be considered as the next recommanded version for daily encoding.
What do you think?