Help - Search - Members - Calendar
Full Version: ABX trials with unequal probabilities
Hydrogenaudio Forums > Hydrogenaudio Forum > Scientific Discussion
gaboo
Here is post on rec.audio.pro made by someone that apparently has a Ph.D. in statistics:

QUOTE("JC")
If Arny really believes that the successful-detection probability changes
from trial to trial, then he has a difficult task explaining why he thinks
it is appropriate to use a binomial distribution when analyzing the
results of an ABX test.

Recall that the binomial(n,p) distribution is the distribution of the
number of successes in a fixed number n of independent and
identically-distributed Bernoulli (binary) trials with constant success
probability p on each trial.

Actually, despite Arny's blunder, a properly-run ABX experiment does
satisfy the conditions to justify using a binomial distribution.


Fist, observe that for an ABX trial X_i the probability of success is p_i >= 0.5. It follows that for X=sum{X_i} E[X] >= 0.5*n. Note that the boundary condition is met when all p_i = 0.5. So we can set the null hypothesis H0:E[X] = 0.5*n. For a given test of n trials with outcome s "successes", we calculate Pr{ X >= s | H0 }. If this probability is smaller than a chosen value alpha, we reject H0. It follows that we must accept the alternative hypothesis E[X] > 0.5*n, so at least one p_i > 0.5. The ABX test "proves" that the subject was not guessing at least once. This proof saved Arny's ass wink.gif

The more interesting question is related to the power of ABX tests with unequal probabilities. Given a critical value c, determined by finding the maximum c such that Pr{ X >= c | H0 } < alpha, we want to find beta = Pr{ X <= c | Ha }, which yields the power as 1-beta. Ha only tell us that at least one p_i > 0.5. So, how comes we can use the binomial for Ha? It turns out that even if the probabilities are unequal, the binomial is an uniform upper bound for the sum of X_i, thanks to one of Hoeffdings' theorems. What does the theorem say? Amongst other things:

Given a set of n Poisson trials X_i [called by some heterogeneous Bernoulli trials], each taking value one with probability p_i, and X=sum{X_i} with expectation E[X]=np, the probability Pr{X<=c} <= binomial(n,p,c) for any 0<=c<=np-1.

I'm not going to prove Hoeffding's theorem here. The interested parties may want to read Hoeffding's seminal paper [1], where he gives a stronger result wrt. to a class of functions on X_i (not just the sum). For an easier to understand proof of the simple statement above, see the paper by H.A. David [2].

To state the theorem in simple terms, the case where all p_i=p is the limit case, all other cases have even less probability of occurring.

Observe that an ABX trial is in fact a Poisson trial. We do not even need not make use of p_i >= 0.5. So, the binomial model is a correct model for Ha, in the sense that it is the worst case model. Power calculated under this model is a conservative estimate (lower bound) for the real power under unequal probabilities.

As a corollary, the two-kind-of-listeners model proposed by Burstein is a particular case of the average-listener-probability. In other words when you apply the binomial for calculating the power the group probability of hearing pg = (1-pd)*p0 + pd*p1, you are in fact stating that the average probability of listener is pg, regardless of the distribution of pg in the population. This is a much stronger statement on the power of such test. I'm not sure if Burstein realizes this in his paper(s), I don't have access to them.

As a further note, Meligaard et all also fail to mention this. Their entire book seems an attempt at "field manual for the dummy sensory analyst", so the lack of insight is not surprising.

References:
[1] Hoeffding, W., On the distribution of the number of successes in independent trials, The Annals of Mathematical Statistics, Vol. 27, No. 3. (Sep., 1956), pp. 713-721.
[2] David, H. A., A Conservative Property of Binomial Tests (in Notes), The Annals of Mathematical Statistics, Vol. 31, No. 4. (Dec., 1960), pp. 1205-1207.
ff123
QUOTE(gaboo @ Nov 24 2004, 04:01 AM)
Here is post on rec.audio.pro made by someone that apparently has a Ph.D. in statistics:

QUOTE("JC")
If Arny really believes that the successful-detection probability changes
from trial to trial, then he has a difficult task explaining why he thinks
it is appropriate to use a binomial distribution when analyzing the
results of an ABX test.

Recall that the binomial(n,p) distribution is the distribution of the
number of successes in a fixed number n of independent and
identically-distributed Bernoulli (binary) trials with constant success
probability p on each trial.

Actually, despite Arny's blunder, a properly-run ABX experiment does
satisfy the conditions to justify using a binomial distribution.



The things you find when combing the Usenet archives, eh?

This reminds me that I lashed out at Arny for not fixing his PC-ABX program. He doesn't even use the binomial distribution to calculate his results -- he uses a chi-square approximation (he says it's not important). And when asked about his lack of attention to type II errors on both his web page and PC-ABX program, he says that statistical hair-splitters should worry about improving the sensitivity of their system instead.

I didn't like these answers, so I changed ABC/HR, but I've been just as dilatory as Arny in distributing the program properly. Anyway, I've put a link to the 1.1 beta2 version of ABC/HR on my main page:

http://ff123.net/abchr/abchr.html

ff123
gaboo
The problem of calculating power under unequal probabilities is a bit more complicated if np - 1 < c < np. Hoeffding gives an upper bound, which we can use as lower bound for power. It does not have a closed form however. I posted a graph in the Uploads area.

If there's interest in this I'll post the code; it's in R.

Link to graph
krabapple
Arny seems a good guy, but he's been battered by, and been battering back at, hateful malignant creatures like Middius for so long that he shoots from the hip. It's an occupational hazard for long-time denizens of unmoderated Usenet groups.
audioflex
QUOTE(gaboo @ Nov 24 2004, 05:01 AM)
Here is post on rec.audio.pro made by someone that apparently has a Ph.D. in statistics:

QUOTE("JC")
If Arny really believes that the successful-detection probability changes
from trial to trial, then he has a difficult task explaining why he thinks
it is appropriate to use a binomial distribution when analyzing the
results of an ABX test.

Recall that the binomial(n,p) distribution is the distribution of the
number of successes in a fixed number n of independent and
identically-distributed Bernoulli (binary) trials with constant success
probability p on each trial.

Actually, despite Arny's blunder, a properly-run ABX experiment does
satisfy the conditions to justify using a binomial distribution.


Fist, observe that for an ABX trial X_i the probability of success is p_i >= 0.5. It follows that for X=sum{X_i} E[X] >= 0.5*n. Note that the boundary condition is met when all p_i = 0.5. So we can set the null hypothesis H0:E[X] = 0.5*n. For a given test of n trials with outcome s "successes", we calculate Pr{ X >= s | H0 }. If this probability is smaller than a chosen value alpha, we reject H0. It follows that we must accept the alternative hypothesis E[X] > 0.5*n, so at least one p_i > 0.5. The ABX test "proves" that the subject was not guessing at least once. This proof saved Arny's ass wink.gif

The more interesting question is related to the power of ABX tests with unequal probabilities. Given a critical value c, determined by finding the maximum c such that Pr{ X >= c | H0 } < alpha, we want to find beta = Pr{ X <= c | Ha }, which yields the power as 1-beta. Ha only tell us that at least one p_i > 0.5. So, how comes we can use the binomial for Ha? It turns out that even if the probabilities are unequal, the binomial is an uniform upper bound for the sum of X_i, thanks to one of Hoeffdings' theorems. What does the theorem say? Amongst other things:

Given a set of n Poisson trials X_i [called by some heterogeneous Bernoulli trials], each taking value one with probability p_i, and X=sum{X_i} with expectation E[X]=np, the probability Pr{X<=c} <= binomial(n,p,c) for any 0<=c<=np-1.

I'm not going to prove Hoeffding's theorem here. The interested parties may want to read Hoeffding's seminal paper [1], where he gives a stronger result wrt. to a class of functions on X_i (not just the sum). For an easier to understand proof of the simple statement above, see the paper by H.A. David [2].

To state the theorem in simple terms, the case where all p_i=p is the limit case, all other cases have even less probability of occurring.

Observe that an ABX trial is in fact a Poisson trial. We do not even need not make use of p_i >= 0.5. So, the binomial model is a correct model for Ha, in the sense that it is the worst case model. Power calculated under this model is a conservative estimate (lower bound) for the real power under unequal probabilities.

As a corollary, the two-kind-of-listeners model proposed by Burstein is a particular case of the average-listener-probability. In other words when you apply the binomial for calculating the power the group probability of hearing pg = (1-pd)*p0 + pd*p1, you are in fact stating that the average probability of listener is pg, regardless of the distribution of pg in the population. This is a much stronger statement on the power of such test. I'm not sure if Burstein realizes this in his paper(s), I don't have access to them.

As a further note, Meligaard et all also fail to mention this. Their entire book seems an attempt at "field manual for the dummy sensory analyst", so the lack of insight is not surprising.

References:
[1] Hoeffding, W., On the distribution of the number of successes in independent trials, The Annals of Mathematical Statistics, Vol. 27, No. 3. (Sep., 1956), pp. 713-721.
[2] David, H. A., A Conservative Property of Binomial Tests (in Notes), The Annals of Mathematical Statistics, Vol. 31, No. 4. (Dec., 1960), pp. 1205-1207.
*


i hate math smile.gif
Woodinville
QUOTE(krabapple @ Nov 30 2004, 01:45 PM)
Arny seems a good guy, but he's been battered by, and been battering back at, hateful malignant creatures like Middius for so long that he shoots from the hip.  It's an occupational hazard for long-time denizens of unmoderated Usenet groups.
*


I suspect it's why many informed people have left.

In regard to the original statement, if the randomization of X in an ABX test is competent, and I must note that it's not hard to make it so, then a purely random answer will have .5 chance, and a straight binomial is entirely appropriate.

The only question I see is in figuring out how many trials are required in order to establish a confidence interval for some "not quite chance" user response.

While I do not personally expect to find that to be a problem with most listener performance, it is the only serious question that can be asked, and it's hardly a fatal question, only an informative one.
This is a "lo-fi" version of our main content. To view the full version with more information, formatting and images, please click here.
Invision Power Board © 2001-2008 Invision Power Services, Inc.