QUOTE("JC")
If Arny really believes that the successful-detection probability changes
from trial to trial, then he has a difficult task explaining why he thinks
it is appropriate to use a binomial distribution when analyzing the
results of an ABX test.
Recall that the binomial(n,p) distribution is the distribution of the
number of successes in a fixed number n of independent and
identically-distributed Bernoulli (binary) trials with constant success
probability p on each trial.
Actually, despite Arny's blunder, a properly-run ABX experiment does
satisfy the conditions to justify using a binomial distribution.
from trial to trial, then he has a difficult task explaining why he thinks
it is appropriate to use a binomial distribution when analyzing the
results of an ABX test.
Recall that the binomial(n,p) distribution is the distribution of the
number of successes in a fixed number n of independent and
identically-distributed Bernoulli (binary) trials with constant success
probability p on each trial.
Actually, despite Arny's blunder, a properly-run ABX experiment does
satisfy the conditions to justify using a binomial distribution.
Fist, observe that for an ABX trial X_i the probability of success is p_i >= 0.5. It follows that for X=sum{X_i} E[X] >= 0.5*n. Note that the boundary condition is met when all p_i = 0.5. So we can set the null hypothesis H0:E[X] = 0.5*n. For a given test of n trials with outcome s "successes", we calculate Pr{ X >= s | H0 }. If this probability is smaller than a chosen value alpha, we reject H0. It follows that we must accept the alternative hypothesis E[X] > 0.5*n, so at least one p_i > 0.5. The ABX test "proves" that the subject was not guessing at least once. This proof saved Arny's ass
The more interesting question is related to the power of ABX tests with unequal probabilities. Given a critical value c, determined by finding the maximum c such that Pr{ X >= c | H0 } < alpha, we want to find beta = Pr{ X <= c | Ha }, which yields the power as 1-beta. Ha only tell us that at least one p_i > 0.5. So, how comes we can use the binomial for Ha? It turns out that even if the probabilities are unequal, the binomial is an uniform upper bound for the sum of X_i, thanks to one of Hoeffdings' theorems. What does the theorem say? Amongst other things:
Given a set of n Poisson trials X_i [called by some heterogeneous Bernoulli trials], each taking value one with probability p_i, and X=sum{X_i} with expectation E[X]=np, the probability Pr{X<=c} <= binomial(n,p,c) for any 0<=c<=np-1.
I'm not going to prove Hoeffding's theorem here. The interested parties may want to read Hoeffding's seminal paper [1], where he gives a stronger result wrt. to a class of functions on X_i (not just the sum). For an easier to understand proof of the simple statement above, see the paper by H.A. David [2].
To state the theorem in simple terms, the case where all p_i=p is the limit case, all other cases have even less probability of occurring.
Observe that an ABX trial is in fact a Poisson trial. We do not even need not make use of p_i >= 0.5. So, the binomial model is a correct model for Ha, in the sense that it is the worst case model. Power calculated under this model is a conservative estimate (lower bound) for the real power under unequal probabilities.
As a corollary, the two-kind-of-listeners model proposed by Burstein is a particular case of the average-listener-probability. In other words when you apply the binomial for calculating the power the group probability of hearing pg = (1-pd)*p0 + pd*p1, you are in fact stating that the average probability of listener is pg, regardless of the distribution of pg in the population. This is a much stronger statement on the power of such test. I'm not sure if Burstein realizes this in his paper(s), I don't have access to them.
As a further note, Meligaard et all also fail to mention this. Their entire book seems an attempt at "field manual for the dummy sensory analyst", so the lack of insight is not surprising.
References:
[1] Hoeffding, W., On the distribution of the number of successes in independent trials, The Annals of Mathematical Statistics, Vol. 27, No. 3. (Sep., 1956), pp. 713-721.
[2] David, H. A., A Conservative Property of Binomial Tests (in Notes), The Annals of Mathematical Statistics, Vol. 31, No. 4. (Dec., 1960), pp. 1205-1207.
