QUOTE(Dibrom @ Sep 18 2005, 03:03 PM)
What is an "untrained listener"? Is this someone who hasn't spent much time listening to lossy audio codecs? Or is this someone who doesn't listen to music much in general? Or is this someone who ... ?
I can think of quite a few people who, in other contexts, people would consider "trained" (musicians for example, or stereotypical "audiophiles"), but where such a label doesn't necessarily translate well into this domain.
The problem with that right away is that as soon as you label such a group and use them to make a statement regarding quality derived from the results of their listening test, you're going to have some group complaining about the representativeness of it all. Ultimately, it'd take a lot of effort to make the arbitrary distinctions necessary to setup the test, and you'd only end up with questionable results. It's simply not worthwhile.
An untrained listener is exactly what the adjective implies: Somebody with no experience in detecting encoder artifacts. While I can't offer any evidence that such a listener is particularly fungible (ie musicians and regular listeners and audiophiles would make equally fine untrained listeners), I see no evidence to the contrary either. If such a listener
is fungible, then I would argue that the power of a test involving them is going to be OK - not good, not great, but acceptable for the target audience.
In other words, your point about representativeness is cogent, but to the best of my knowledge, not actually validated. In this respect this situation only differs from transparent encoding tests by degrees. Before HA and before ff123 and before the whole listening test era, was it obvious that there is a well-defined boundary of transparency for properly designed encoders? (Actually that isn't a very rhetorical question as I don't know the answer; if it was obvious, then this comparison isn't that valid.)
QUOTE
QUOTE
Along with this we might want to consider relaxing the ABX protocol a bit, because the focus here is on casual listening.
Why?
Again, you have the same sort of problem. One person's "casual listening" is going to be completely different from another's. What you need to do to get a representative result is control this situation as much as possible, and that's what ABX provides for us.
But let's say that somehow, even given these problems, you find a correct way to carry out such a test and to rely on such relaxed conditions. Why would any developer in their right mind want to waste tuning for such results? It's a waste of effort because, in the end, people are going to ultimately compare their efforts based on some sort of benchmark, and "casual listening" is hardly a good metric to use for that. And furthermore, the results themselves are questionable because they lack a level of objectivity that is necessary to really nail down problems in quality and fix them. If you relax ABX and tune for the "average listener," you'll spend all your time chasing phantoms.
I agree that tuning for this sort of thing is mostly useless. ie, when you tune an encoder, it would be far more effective to tune based on transparency and based on 1-5 ranking results with trained ears rather than based on casual listening. For transparency, there is a well-defined and psychoacoustically sound boundary to tune for. For casual listening there isn't, and you'd have some people who can't hear anything and some who are just naturals at telling differences, which makes drawing that sort of a boundary impossible.
If I were to ad-hoc this a bit further, I would argue that this could be worked around by making the result for such a test statistically determined from the distribution of listeners rather than the ABX results themselves. That is, given 20 or so listeners, the final "tuning" is going to be the one that yields "casual listening transparency", whatever that is, for, say, 70% of the listeners (I pulled that number out of a hat).
QUOTE
No, the way to do it is the same way I listed earlier. Then, if you need lower quality, you provide some sort of smooth quality scaling. Most codecs do that these days anyway (LAME with -V, Vorbis and MPC with --q). From here, a particular individual can determine what meets their own "casual" needs through a few simple listening tests.
In all reality these are probably sufficient for most people. The only realistic thing that a casual listening tuning would turn into is a note saying "start at -Vsomething for background music, car music, casusl listening or workout music etc", and that is so close to the current recommendations that it's probably not worth pursuing.
QUOTE
QUOTE
128kbps encoder tests are somewhere around what is requested here,
And how do you know this?
I admit, I don't, I was just guessing. I think I may have confused "casual listening" with "listening at acceptable levels of distortion", which is what the 128 tests have been doing.
This thread is getting a bit out of hand, and the issue itself is kind of moot for me because I would never use such a "casual listening" encode, and Dibrom has made enough good points about the efficacy of all of it, so I'll bow out at this post.