Today, I wanted to show to a friend the forum in which I am online so often, and to explain the scientific methodology on which discussions are based. I went into the FAQ in order to read 2Bdecided's exellent post about objectivism. After having read it, I realized that it was very theoretical and didn't explain in practice how problems should be submitted and analyzed.
So I went into rule 8, telling myself that I should link the terms of service before that post in the FAQ. But after having read rule 8 and its comment, I clearly saw she was still one light year away from understanding the required procedures for submitting problems in HydrogenAudio.
What surprised me the most was that nowhere in the terms of service, nor in 2Bdecided comments the requirement for blind testing is stated !
Here are the factual statements about testing procedures :
Rule 8 :
"expected to be supported by the author"
"supply supportive information "
"to provide either a test sample, ABX testing results, or ideally both"
2BDecided's message :
"subective opinions should be backed up by rigorous tests"
"making claims without challenging them"
"We don't let people claim that X is better than Y, when it isn't. We don't let people claim that Z has magical properties. We do testing, and we try to move forward"
"following the rules of the forum"
"Whether we accept unsubstantiated claims is not up for debate - we do not"
"the importance of evidence, proof, and blind testing against feelings and opinions,"
The only place where the word "blind" appears is at the end of 2BDecided's post, in a sidenote : If you have any good objectivist/subjectivist links, links showing the importance of evidence, proof, and blind testing against feelings and opinions, or the opposite side of the argument, feel free to post them.".
Rule 8 also mentions "ABX testing results", but without any link or explanation about what it is.
Therefore I think that the terms of service's rule 8 should rewritten, with more emphasis about blind testing and statistical analysis, and maybe a sticky could be written in order to explain briefly the procedure.
Here are some quotes from which we can start. Adding appropriate introduction and conclusion.
Canar :
ABX testing is one of very few, if not the only way to receive statistically valid data about whether or not there is a difference between two test setups. ABX testing can prove to a certain level of confidence that one can distinguish between two setups. Again, it can also notify you that you cannot perceive a difference between the two. It is not proof of quality, it is proof of equal or disparate quality.
FF123 :
The PC ABX program works as follows: The known reference is always available as stimulus "B". The known object (test sample) is also always available as stimulus "A". Either the reference or the object is randomly assigned to "X", depending on the trial. The subject decides whether "X" corresponds to the reference "B" or to the test sample "A".
Pio2001 :
What is required is that the test is double-blind and has a statistical confidence over 95 %.
EricS :
The percentage means that there is only 5% chance that someone who guess wildly could have gotten the same result. Then we can be pretty confident that you actually heard what you say you did and that you can actually repeat it in the future if needed.
KikeG :
In order to achieve 95% confidence (5% probability of being guessing), one must get at least one of those, given that he fixes the number of trials to perform before the test begins :
5/5
6/6
7/7
7/8
8/9
9/10
9/11
10/12
10/13
11/14
12/15
13/16
If the listener performs the test several times, being strict all the trials must be summed and the correspondent confidence level must be calculated. But then, a 95% confidence level may not be enough, a 99% is desirable.
Personally, I always go for 99% confidence (1% of guessing).
There's an Excel table with confidence levels (better say p-values, or probability of being guessing) for any number of trials up to 100 at http://www.kikeg.arrakis.es/winabx/bino_dist.zip