IPB

Welcome Guest ( Log In | Register )

3 Pages V  < 1 2 3  
Reply to this topicStart new topic
Blind test challenge, Find the SB64 recording !
ff123
post Mar 4 2003, 22:24
Post #51


ABC/HR developer, ff123.net admin


Group: Developer (Donating)
Posts: 1396
Joined: 24-September 01
Member No.: 12



I verified Continuum's formulas by simulation.

I hadn't thought to simulate a huge ABX test which just continues until it finally passes what is basically a "no difference" situation, but I have little doubt that's what will eventually happen.

ff123
Go to the top of the page
+Quote Post
KikeG
post Mar 5 2003, 11:28
Post #52


WinABX developer


Group: Developer
Posts: 1578
Joined: 1-October 01
Member No.: 137



Regarding the sequential ABX test problems mentioned, for me it would be more comfortable something like always knowing your "classic" p value, but needed to reach a different value to achieve test pass confidence, depending on the number of trials performed. Since I'm not very good at statistics, would it be possible to calculate the needed p-values or something similar that you need to achieve, depending on the nš of trials performed?

This post has been edited by KikeG: Mar 5 2003, 11:33
Go to the top of the page
+Quote Post
ff123
post Mar 5 2003, 16:22
Post #53


ABC/HR developer, ff123.net admin


Group: Developer (Donating)
Posts: 1396
Joined: 24-September 01
Member No.: 12



QUOTE (KikeG @ Mar 5 2003 - 02:28 AM)
Regarding the sequential ABX test problems mentioned, for me it would be more comfortable something like always knowing your "classic" p value, but needed to reach a different value to achieve test pass confidence, depending on the number of trials performed. Since I'm not very good at statistics, would it be possible to calculate the needed p-values or something similar that you need to achieve, depending on the nš of trials performed?

If the number of trials are fixed before the test starts and enforced by the tool, then classic p-value calculations will work as is.

Otherwise, I think we decided it's best to use a "profile" because one rapidly loses the ability to get a significant result if allowed to see the results and to stop at any time.

ff123
Go to the top of the page
+Quote Post
KikeG
post Mar 5 2003, 17:32
Post #54


WinABX developer


Group: Developer
Posts: 1578
Joined: 1-October 01
Member No.: 137



I know, but I would rather prefer an alternative method as the one I suggested that would not impose a fixed number of trials required.
Go to the top of the page
+Quote Post
ff123
post Mar 5 2003, 17:46
Post #55


ABC/HR developer, ff123.net admin


Group: Developer (Donating)
Posts: 1396
Joined: 24-September 01
Member No.: 12



QUOTE (KikeG @ Mar 5 2003 - 08:32 AM)
I know, but I would rather prefer an alternative method as the one I suggested that would not impose a fixed number of trials required.

Yes, there is a method. I think I posted a graph of it once here, but I must have deleted it earlier. Here it is again:



A description of the formulas to derive the two lines are in that monster statistics thread. I think we discarded this because it was less sensitive than the profile method. However, it does have the advantage of simplicity and the trials don't have to be fixed beforehand.

ff123
Go to the top of the page
+Quote Post
Continuum
post Mar 5 2003, 18:05
Post #56





Group: Members
Posts: 473
Joined: 7-June 02
Member No.: 2244



QUOTE (Garf @ Mar 4 2003 - 08:53 PM)
QUOTE (Continuum @ Mar 3 2003 - 05:16 PM)
E.g. You want to reach 95%-confidence (in the classical sense) and stop as soon as this condition is satisfied. Now the following are your win-conditions:
5/5, 7/8, 9/11, 10/13, 12/16, 13/18, ...
So, the probability to pass this is test by guessing is not only 0.05 but something like:
P(5/5) + P(7/8 and not 5/5) + P(9/11 and neiter 5/5 nor 7/8) + ...
which tends to 1 ohmy.gif .

Are you sure? It's counterintuitive to me (as are many statistics, but anyway smile.gif)

Well, I have no proof for the 1-limit at hands, but it is suggested by empirical results.

QUOTE
It's P(5/5) + P(7/8 or 8/8 and not 5/5) + P(9/11 or 10/11 or 11/11 and not 5/5 or not 7/8 or not 8/8) + ...

Yes, but this makes no difference: After less than 5 of 5 a result of 8/8 is impossible.

QUOTE
The chances are interdependent, failure on the first influences success on the second one and so on.

Yes, but they are all disjunct. They just represent the list of winning conditions.

QUOTE
A silly test is to write a simulation that keeps guessing in ABX, if you are right it has to pass eventually.

Who knows, how long it takes?

The probability to pass a 0.95 test with not more than 500 trials is ~30%. (My Computer had to work half a minute to calculate this already, and it becomes far worse for more trials smile.gif )

This post has been edited by Continuum: Mar 5 2003, 18:14
Go to the top of the page
+Quote Post
Continuum
post Mar 5 2003, 18:14
Post #57





Group: Members
Posts: 473
Joined: 7-June 02
Member No.: 2244



QUOTE (ff123 @ Mar 5 2003 - 05:46 PM)
A description of the formulas to derive the two lines are in that monster statistics thread.  I think we discarded this because it was less sensitive than the profile method.  However, it does have the advantage of simplicity and the trials don't have to be fixed beforehand.

IIRC, I had some reservations about it, because I didn't understand the implied calculations. I suspect that they could be only approximations. unsure.gif

But it definitely would be possible to construct a test with infinite length. The only problem is, that the test would be really hard at later stages. (The above sum must converge to e.g. 0.05, so each following term has to be smaller and smaller.)
Go to the top of the page
+Quote Post
ff123
post Mar 5 2003, 19:10
Post #58


ABC/HR developer, ff123.net admin


Group: Developer (Donating)
Posts: 1396
Joined: 24-September 01
Member No.: 12



QUOTE (Continuum @ Mar 5 2003 - 09:14 AM)
IIRC, I had some reservations about it, because I didn't understand the implied calculations. I suspect that they could be only approximations. unsure.gif

But it definitely would be possible to construct a test with infinite length. The only problem is, that the test would be really hard at later stages. (The above sum must converge to e.g. 0.05, so each following term has to be smaller and smaller.)

I believe that method was a Bayesian approach, and the path we chose was frequentist. So we always needed to know the upper limit of trials involved.
Go to the top of the page
+Quote Post
Continuum
post Mar 5 2003, 19:34
Post #59





Group: Members
Posts: 473
Joined: 7-June 02
Member No.: 2244



QUOTE (ff123 @ Mar 5 2003 - 07:10 PM)
So we always needed to know the upper limit of trials involved.

Not really. We constructed a sum as above, so that terms were reasonable large (not to small). For the 28-profile this was: 0,015625 + 0,013916016 + 0,007171631 + 0,00677526 + 0,005667329 = 0,049155235.

But there is no theoretical bound for this summation. We could search for a small enough value on-the-fly. We only have to assure that the sum stays below 0.05.

E.g. we could construct something like p/2 + p/4 + p/8 + ... (p=0.05).
Go to the top of the page
+Quote Post
ff123
post Mar 5 2003, 20:54
Post #60


ABC/HR developer, ff123.net admin


Group: Developer (Donating)
Posts: 1396
Joined: 24-September 01
Member No.: 12



I think I was concerned with an upper bound because I wanted to keep the difficulty of passing about the same at each stopping point. This is in contrast to a test which gets progressively harder to pass at each of the stopping points.

Hmm. I wonder if I should just get off my butt and implement the thing. Currently none of the programs (including abchr) does it quite correctly when they display interim results.

ff123
Go to the top of the page
+Quote Post
KikeG
post Mar 5 2003, 21:28
Post #61


WinABX developer


Group: Developer
Posts: 1578
Joined: 1-October 01
Member No.: 137



QUOTE (ff123 @ Mar 5 2003 - 05:46 PM)
Yes, there is a method.  I think I posted a graph of it once here, but I must have deleted it earlier.  Here it is again:

I'll take a look at THE thread again, focusing on the posts related to this. I'd prefer a method like this also because of its simplicity for the tester who has no experience about ABX or statistics, and because makes the test easier to administer, to implement from a interface design point of view, and to understand for the tester.
Go to the top of the page
+Quote Post
Bedeox
post Mar 8 2003, 00:56
Post #62





Group: Members
Posts: 246
Joined: 20-December 02
From: Quite quiet place in Poland
Member No.: 4181



Having read through the 'Statistics(...)' thread I have to mention that I only use few types of test:
7-8 tries (short), 14 or 16 tries (long). If I need to make a long test after a short one to be sure,
I double number of tries, not just repeat it. I'm not expecting certain probability.
I can state that I'm sure if I achieve > 6/7 or 7/8 and know the type of an artifact.
(Otherwise I do additional long test.)

QUOTE
Example: The probability to pass an "traditional" 0.95-test by guessing when one's allowed to stop at every point up to 30 is 0.129! (you can test this with my Excel-sheet from above)

That's the point... The only proper ABX has either 'hard' stop points or no stop points.
I'm using the latter.


--------------------
I've changed only because of myself.
Remember, when you quote me, you're quoting AstralStorm.
(read: this account is dead)
Go to the top of the page
+Quote Post

3 Pages V  < 1 2 3
Reply to this topicStart new topic
1 User(s) are reading this topic (1 Guests and 0 Anonymous Users)
0 Members:

 



RSS Lo-Fi Version Time is now: 21st April 2014 - 05:11