QUOTE(Kees de Visser @ Apr 18 2006, 02:00 PM)

"this kind of test have already been done"
I know, and the results indicate that high(er) bitrates have no (or very little) benefits. This can mean that audible differences are negligible (or non-existant), or the test was flawed. If it turns out that we can't design a better test, it's useless to start one.
I think that using dynamic recordings in dedicated listening rooms, you may be able to show the benefit of bit depths superior to 16 bits.
You may also try to generate artificial signals that would show the theoretical audibility of a given parameter, even if you fail to find a musical recording that suffers from this parameter. For example I recently tried to ABX a phase shift at 30 Hz. I chose a recording with 30 Hz notes with sharp attacks. I failed. But I was told that the ideal signal for this test was a low frequency "saw-teeth" signal.
The main problem is that strong high frequencies can damage tweeters.
QUOTE(Kees de Visser @ Apr 18 2006, 02:00 PM)

"only Oohashi claims to have got significant results"
I've been thinking about adding EEG (or even MEG) measurements but in practice this will probably be too complicated (MEG even impossible to install in a studio). Have EEG results, besides Oohashi, ever been used to prove audibility of stimuli ?
I don't know, but I was not talking about the EEG results, that are also questionable because the EEG excitation started one minute after the stimulus was presented, and also ceased quite a lot of time after the stimulus have been removed
I was talking about the subjective appreciation of the sound by the listeners. Table 2 gives an impressive set of significant p values associated with direct listening test, not through EEG. However, not a word about the way they were computed.
If you want to try the same experiment, that is asking people if what they hear sound "harsh, dynamic etc" instead of asking them to identify X, then we would have to setup a mathematical model in order to get the statistical significance of the answers.
QUOTE(Kees de Visser @ Apr 18 2006, 02:00 PM)

"The classical statistic evaluation (ABX) doesn't seem to fit this kind of tests."
I'm glad to find this open-minded attitude on this forum and I'm very interested to hear more about your proposed evaluation method.
For this kind of test, where we want to see if a difference can be heard by some people under some conditions at least, I suggest a protocol divided into three parts :
In part 1, the listeners, that are supposed to be familiar with the kind of difference tested, are allowed to play with the system. They must find the hardawre and the musical samples on which the difference is the easiest to spot. This phase goes on until they think that the difference is obvious enough for a blind test to easily succeed.
In part 2, some fake blind tests are done. This is the training. Listeners try to recognize the difference under the real test conditions. They can compare ABX with other methods. They can choose what seems to be the best delay between the trials. This part ends when the listeners, or at least some of them, consistently get statistically good results. Remember that this is only training. These results won't be taken into account in the final conclusion, no matter what happens.
In part three, the real test is done, according to the protocol chosen in part 2. If the number of trials was decided in advance, listeners are told their score after each trial. If they begin to make some mistakes, they can interrupt part 3 in order to undergo some more training, or stop for a while. In part 3, they are allowed to give null answers when they are not sure. In ABX, it would be a three choice test : "X is A", or "X is B", or "I'm not completely sure".
Only the X is A or X is B answers are recorded. The part 3 goes on until the right amount of these kind of answers is collected.
The advantage of dividing the test in three parts is to dismiss the usual arguments opposed to blind tests :
The listeners are deaf : dismissed by part 1
The system is not good enough for the difference to be heard : dismissed by part 1
Listening in ABX doesn't allow to spot these kind of differences : dismissed by part 2
A decision process cannot account for the unconcious influences at work : dismissed by part 2
If one listener decides to do an ABX test in 8 trials, here is an example of phase 3 :
Trial 1 : X is A : right
Trial 2 : X is A : right
Trial 3 : X is A : right
Trial 4 : I'm not completely sure
Trial 5 : I'm not completely sure
Pause
Trial 6 : X is A : wrong
Training
Trial 7 : X is B : right
Trial 8 : X is A : right
Trial 9 : I'm not completely sure
Pause
Trial 10 : I'm not completely sure
Trial 11 : X is A : wrong
This is the second error, the test has failed. Otherwise, it would have gone on until one more "X is A" or "X is B" answer would have been got, which would have totalized 8 answers of this kind.
If more than one listener is taking part, the required number of right answers must be mathematically decided. We must compute the probability for one listener to fail its own ABX test by chance. Then put it to the power N, when N is the number of listeners. It gives the probability that everyone fails. The complementary event is that one listener at least have succeeded.
This is our final statistical result : the probability that among all the listeners, one of them at least gets by chance the same or more than the highest individual score recorded.
All the listeners can pass the test together, if they want. Uncontrolled influences between them can only decrease the probability of this event, thus increase the statistical significance of the result.
Advantages of this kind of statistical evaluation over a classical one :
-Listeners who cannot hear the difference don't prevent listeners who can hear it from demonstrating that the difference is audible
-Listeners can communicate and help each other during the test. They don't need to pass it one by one.
Drawback :
-More trials are needed in order to reach an acceptable level of confidence.
It is very probable, in case of a difference that cannot be heard at all, that the test doesn't get past part 2. The listeners must then explain why the differences heard in part 1 have vanished in part 2, and possibly get back in part 1 in order to find a better way to pass part 2. It's up to them. They are the one hearing a difference, they are the one who can tell how the test must be done.
This protocol was discussed here, in french, during the setup of the interconnect blind listening test :
http://www.homecinema-fr.com/forum/viewtop...r=asc&start=195