QUOTE (krabapple @ May 5 2009, 12:36)

A conference paper by Zielinski fleshed the ideas in the AES presentation out, and is available online
http://www.surrey.ac.uk/soundrec/ias/papers/Zielinski.pdfThe work was also published in JAES last year, as
S. Zielinski., F. Rumsey, and S. Bech. On Some Biases Encountered in Modern Audio Quality Listening Tests - A Review. J. Audio Eng. Soc. Vol. 56, 6, pp. 427-451 (June 2008).
An intesting and it seems an insightful paper. This jumped out at me:
"It was shown that hedonic judgments (related to pleasantness) may introduce more bias to the results of audio quality listening tests than sensory judgments. Consequently, hedonic judgments should be avoided in audio listening tests if possible. For instance, the participants could be asked to evaluate sound character or audio fidelity (trueness with respect to a reference) rather than how much they like, dislike, prefer or desire certain audio stimuli."
Also:
"One may argue that the two currently most popular methods for evaluation of audio quality [8], [9] are free from the aforementioned biases, as they use an emotion-free definition of audio quality which is substantially different from the definitions quoted above. According to both standards, the basic audio quality is defined as a single, global attribute used to judge any and all detected differences between the reference and the object. This definition does not make any references to the “satisfaction”, “adequacy” or “desired nature” of a sound but to the perceptual “difference” between the audio reference and the object under evaluation. Since the perceptual “difference” can be considered as an emotion-free attribute, one could conclude that in these two standardised methods there is no place for any hedonic judgments. However, a close examination of the grading scales used in these standard techniques reveals that this conclusion is flawed. According to the ITU-R BS. 1116 recommendation, a 5-point impairment scale should be used in listening tests involving small audio quality impairments [8]. It can be seen in Fig. 2 that the two ends of the scale do not contain bipolar labels, as the top end of the scale is concerned with imperceptibility of impairments whereas the middle and bottom parts of the scale are used to represent different levels of annoyance. In other words, this scale can be described as a “hybrid”, combining two different perceptual constructs at two ends of the scale; perceivability at the top and annoyance at the bottom. Since the “annoyance” construct is directly related to disliking, it can be inferred that the middle and bottom part of the scale will involve a substantial proportion of hedonic judgments. Hence, all the biases discussed in the previous section can potentially affect the results obtained using the ITU-R BS. 1116 recommended method."
There is another kind of bias to hedonic judgements which is indirect. Let's say that I was comparing a 2-channel system to a 7.1 channel system. if the program material is exploiting the 7.1 system than identifying each system's identity is pretty trivial. How do we keep people from given responses that are biased by this obvious identification? Isn't an blind test where the results are obvious subject to some of the same or at least similar biases as a sighted evaluation?