Blind test challenge

Topic: Blind test challenge (Read 59772 times) previous topic - next topic

0 Members and 1 Guest are viewing this topic.

Blind test challenge

Reply #50 – 2003-03-04 21:24:14

I verified Continuum's formulas by simulation.

I hadn't thought to simulate a huge ABX test which just continues until it finally passes what is basically a "no difference" situation, but I have little doubt that's what will eventually happen.

ff123

Blind test challenge

Reply #51 – 2003-03-05 10:28:42

Regarding the sequential ABX test problems mentioned, for me it would be more comfortable something like always knowing your "classic" p value, but needed to reach a different value to achieve test pass confidence, depending on the number of trials performed. Since I'm not very good at statistics, would it be possible to calculate the needed p-values or something similar that you need to achieve, depending on the nº of trials performed?

Blind test challenge

Reply #52 – 2003-03-05 15:22:23

Quote

Regarding the sequential ABX test problems mentioned, for me it would be more comfortable something like always knowing your "classic" p value, but needed to reach a different value to achieve test pass confidence, depending on the number of trials performed. Since I'm not very good at statistics, would it be possible to calculate the needed p-values or something similar that you need to achieve, depending on the nº of trials performed?

If the number of trials are fixed before the test starts and enforced by the tool, then classic p-value calculations will work as is.

Otherwise, I think we decided it's best to use a "profile" because one rapidly loses the ability to get a significant result if allowed to see the results and to stop at any time.

ff123

Blind test challenge

Reply #53 – 2003-03-05 16:32:31

I know, but I would rather prefer an alternative method as the one I suggested that would not impose a fixed number of trials required.

Blind test challenge

Reply #54 – 2003-03-05 16:46:08

Quote

I know, but I would rather prefer an alternative method as the one I suggested that would not impose a fixed number of trials required.

Yes, there is a method. I think I posted a graph of it once here, but I must have deleted it earlier. Here it is again:

A description of the formulas to derive the two lines are in that monster statistics thread. I think we discarded this because it was less sensitive than the profile method. However, it does have the advantage of simplicity and the trials don't have to be fixed beforehand.

ff123

Blind test challenge

Reply #55 – 2003-03-05 17:05:53

Quote

Quote
E.g. You want to reach 95%-confidence (in the classical sense) and stop as soon as this condition is satisfied. Now the following are your win-conditions:
5/5, 7/8, 9/11, 10/13, 12/16, 13/18, ...
So, the probability to pass this is test by guessing is not only 0.05 but something like:
P(5/5) + P(7/8 and not 5/5) + P(9/11 and neiter 5/5 nor 7/8) + ...
which tends to 1 .

Are you sure? It's counterintuitive to me (as are many statistics, but anyway )

Well, I have no proof for the 1-limit at hands, but it is suggested by empirical results.

Quote

It's P(5/5) + P(7/8 or 8/8 and not 5/5) + P(9/11 or 10/11 or 11/11 and not 5/5 or not 7/8 or not 8/8) + ...

Yes, but this makes no difference: After less than 5 of 5 a result of 8/8 is impossible.

Quote

The chances are interdependent, failure on the first influences success on the second one and so on.

Yes, but they are all disjunct. They just represent the list of winning conditions.

Quote

A silly test is to write a simulation that keeps guessing in ABX, if you are right it has to pass eventually.

Who knows, how long it takes?

The probability to pass a 0.95 test with not more than 500 trials is ~30%. (My Computer had to work half a minute to calculate this already, and it becomes far worse for more trials )

Blind test challenge

Reply #56 – 2003-03-05 17:14:18

Quote

A description of the formulas to derive the two lines are in that monster statistics thread. I think we discarded this because it was less sensitive than the profile method. However, it does have the advantage of simplicity and the trials don't have to be fixed beforehand.

IIRC, I had some reservations about it, because I didn't understand the implied calculations. I suspect that they could be only approximations.

But it definitely would be possible to construct a test with infinite length. The only problem is, that the test would be really hard at later stages. (The above sum must converge to e.g. 0.05, so each following term has to be smaller and smaller.)

Blind test challenge

Reply #57 – 2003-03-05 18:10:27

Quote

IIRC, I had some reservations about it, because I didn't understand the implied calculations. I suspect that they could be only approximations.

But it definitely would be possible to construct a test with infinite length. The only problem is, that the test would be really hard at later stages. (The above sum must converge to e.g. 0.05, so each following term has to be smaller and smaller.)

I believe that method was a Bayesian approach, and the path we chose was frequentist. So we always needed to know the upper limit of trials involved.

Blind test challenge

Reply #58 – 2003-03-05 18:34:13

Quote

So we always needed to know the upper limit of trials involved.

Not really. We constructed a sum as above, so that terms were reasonable large (not to small). For the 28-profile this was: 0,015625 + 0,013916016 + 0,007171631 + 0,00677526 + 0,005667329 = 0,049155235.

But there is no theoretical bound for this summation. We could search for a small enough value on-the-fly. We only have to assure that the sum stays below 0.05.

E.g. we could construct something like p/2 + p/4 + p/8 + ... (p=0.05).

Blind test challenge

Reply #59 – 2003-03-05 19:54:14

I think I was concerned with an upper bound because I wanted to keep the difficulty of passing about the same at each stopping point. This is in contrast to a test which gets progressively harder to pass at each of the stopping points.

Hmm. I wonder if I should just get off my butt and implement the thing. Currently none of the programs (including abchr) does it quite correctly when they display interim results.

ff123

Blind test challenge

Reply #60 – 2003-03-05 20:28:27

Quote

Yes, there is a method. I think I posted a graph of it once here, but I must have deleted it earlier. Here it is again:

I'll take a look at THE thread again, focusing on the posts related to this. I'd prefer a method like this also because of its simplicity for the tester who has no experience about ABX or statistics, and because makes the test easier to administer, to implement from a interface design point of view, and to understand for the tester.

Blind test challenge

Reply #61 – 2003-03-07 23:56:53

Having read through the 'Statistics(...)' thread I have to mention that I only use few types of test:
7-8 tries (short), 14 or 16 tries (long). If I need to make a long test after a short one to be sure,
I double number of tries, not just repeat it. I'm not expecting certain probability.
I can state that I'm sure if I achieve > 6/7 or 7/8 and know the type of an artifact.
(Otherwise I do additional long test.)

Quote

Example: The probability to pass an "traditional" 0.95-test by guessing when one's allowed to stop at every point up to 30 is 0.129! (you can test this with my Excel-sheet from above)

That's the point... The only proper ABX has either 'hard' stop points or no stop points.
I'm using the latter.

Notice