Blind test challenge, Find the SB64 recording ! |
![]() ![]() |
Blind test challenge, Find the SB64 recording ! |
Mar 4 2003, 22:24
Post
#51
|
|
![]() ABC/HR developer, ff123.net admin Group: Developer (Donating) Posts: 1396 Joined: 24-September 01 Member No.: 12 |
I verified Continuum's formulas by simulation.
I hadn't thought to simulate a huge ABX test which just continues until it finally passes what is basically a "no difference" situation, but I have little doubt that's what will eventually happen. ff123 |
|
|
|
Mar 5 2003, 11:28
Post
#52
|
|
|
WinABX developer Group: Developer Posts: 1578 Joined: 1-October 01 Member No.: 137 |
Regarding the sequential ABX test problems mentioned, for me it would be more comfortable something like always knowing your "classic" p value, but needed to reach a different value to achieve test pass confidence, depending on the number of trials performed. Since I'm not very good at statistics, would it be possible to calculate the needed p-values or something similar that you need to achieve, depending on the nš of trials performed?
This post has been edited by KikeG: Mar 5 2003, 11:33 |
|
|
|
Mar 5 2003, 16:22
Post
#53
|
|
![]() ABC/HR developer, ff123.net admin Group: Developer (Donating) Posts: 1396 Joined: 24-September 01 Member No.: 12 |
QUOTE (KikeG @ Mar 5 2003 - 02:28 AM) Regarding the sequential ABX test problems mentioned, for me it would be more comfortable something like always knowing your "classic" p value, but needed to reach a different value to achieve test pass confidence, depending on the number of trials performed. Since I'm not very good at statistics, would it be possible to calculate the needed p-values or something similar that you need to achieve, depending on the nš of trials performed? If the number of trials are fixed before the test starts and enforced by the tool, then classic p-value calculations will work as is. Otherwise, I think we decided it's best to use a "profile" because one rapidly loses the ability to get a significant result if allowed to see the results and to stop at any time. ff123 |
|
|
|
Mar 5 2003, 17:32
Post
#54
|
|
|
WinABX developer Group: Developer Posts: 1578 Joined: 1-October 01 Member No.: 137 |
I know, but I would rather prefer an alternative method as the one I suggested that would not impose a fixed number of trials required.
|
|
|
|
Mar 5 2003, 17:46
Post
#55
|
|
![]() ABC/HR developer, ff123.net admin Group: Developer (Donating) Posts: 1396 Joined: 24-September 01 Member No.: 12 |
QUOTE (KikeG @ Mar 5 2003 - 08:32 AM) I know, but I would rather prefer an alternative method as the one I suggested that would not impose a fixed number of trials required. Yes, there is a method. I think I posted a graph of it once here, but I must have deleted it earlier. Here it is again: ![]() A description of the formulas to derive the two lines are in that monster statistics thread. I think we discarded this because it was less sensitive than the profile method. However, it does have the advantage of simplicity and the trials don't have to be fixed beforehand. ff123 |
|
|
|
Mar 5 2003, 18:05
Post
#56
|
|
![]() Group: Members Posts: 473 Joined: 7-June 02 Member No.: 2244 |
QUOTE (Garf @ Mar 4 2003 - 08:53 PM) QUOTE (Continuum @ Mar 3 2003 - 05:16 PM) E.g. You want to reach 95%-confidence (in the classical sense) and stop as soon as this condition is satisfied. Now the following are your win-conditions: 5/5, 7/8, 9/11, 10/13, 12/16, 13/18, ... So, the probability to pass this is test by guessing is not only 0.05 but something like: P(5/5) + P(7/8 and not 5/5) + P(9/11 and neiter 5/5 nor 7/8) + ... which tends to 1 Are you sure? It's counterintuitive to me (as are many statistics, but anyway Well, I have no proof for the 1-limit at hands, but it is suggested by empirical results. QUOTE It's P(5/5) + P(7/8 or 8/8 and not 5/5) + P(9/11 or 10/11 or 11/11 and not 5/5 or not 7/8 or not 8/8) + ... Yes, but this makes no difference: After less than 5 of 5 a result of 8/8 is impossible. QUOTE The chances are interdependent, failure on the first influences success on the second one and so on. Yes, but they are all disjunct. They just represent the list of winning conditions. QUOTE A silly test is to write a simulation that keeps guessing in ABX, if you are right it has to pass eventually. Who knows, how long it takes? The probability to pass a 0.95 test with not more than 500 trials is ~30%. (My Computer had to work half a minute to calculate this already, and it becomes far worse for more trials This post has been edited by Continuum: Mar 5 2003, 18:14 |
|
|
|
Mar 5 2003, 18:14
Post
#57
|
|
![]() Group: Members Posts: 473 Joined: 7-June 02 Member No.: 2244 |
QUOTE (ff123 @ Mar 5 2003 - 05:46 PM) A description of the formulas to derive the two lines are in that monster statistics thread. I think we discarded this because it was less sensitive than the profile method. However, it does have the advantage of simplicity and the trials don't have to be fixed beforehand. IIRC, I had some reservations about it, because I didn't understand the implied calculations. I suspect that they could be only approximations. But it definitely would be possible to construct a test with infinite length. The only problem is, that the test would be really hard at later stages. (The above sum must converge to e.g. 0.05, so each following term has to be smaller and smaller.) |
|
|
|
Mar 5 2003, 19:10
Post
#58
|
|
![]() ABC/HR developer, ff123.net admin Group: Developer (Donating) Posts: 1396 Joined: 24-September 01 Member No.: 12 |
QUOTE (Continuum @ Mar 5 2003 - 09:14 AM) IIRC, I had some reservations about it, because I didn't understand the implied calculations. I suspect that they could be only approximations. But it definitely would be possible to construct a test with infinite length. The only problem is, that the test would be really hard at later stages. (The above sum must converge to e.g. 0.05, so each following term has to be smaller and smaller.) I believe that method was a Bayesian approach, and the path we chose was frequentist. So we always needed to know the upper limit of trials involved. |
|
|
|
Mar 5 2003, 19:34
Post
#59
|
|
![]() Group: Members Posts: 473 Joined: 7-June 02 Member No.: 2244 |
QUOTE (ff123 @ Mar 5 2003 - 07:10 PM) So we always needed to know the upper limit of trials involved. Not really. We constructed a sum as above, so that terms were reasonable large (not to small). For the 28-profile this was: 0,015625 + 0,013916016 + 0,007171631 + 0,00677526 + 0,005667329 = 0,049155235. But there is no theoretical bound for this summation. We could search for a small enough value on-the-fly. We only have to assure that the sum stays below 0.05. E.g. we could construct something like p/2 + p/4 + p/8 + ... (p=0.05). |
|
|
|
Mar 5 2003, 20:54
Post
#60
|
|
![]() ABC/HR developer, ff123.net admin Group: Developer (Donating) Posts: 1396 Joined: 24-September 01 Member No.: 12 |
I think I was concerned with an upper bound because I wanted to keep the difficulty of passing about the same at each stopping point. This is in contrast to a test which gets progressively harder to pass at each of the stopping points.
Hmm. I wonder if I should just get off my butt and implement the thing. Currently none of the programs (including abchr) does it quite correctly when they display interim results. ff123 |
|
|
|
Mar 5 2003, 21:28
Post
#61
|
|
|
WinABX developer Group: Developer Posts: 1578 Joined: 1-October 01 Member No.: 137 |
QUOTE (ff123 @ Mar 5 2003 - 05:46 PM) Yes, there is a method. I think I posted a graph of it once here, but I must have deleted it earlier. Here it is again: I'll take a look at THE thread again, focusing on the posts related to this. I'd prefer a method like this also because of its simplicity for the tester who has no experience about ABX or statistics, and because makes the test easier to administer, to implement from a interface design point of view, and to understand for the tester. |
|
|
|
Mar 8 2003, 00:56
Post
#62
|
|
|
Group: Members Posts: 246 Joined: 20-December 02 From: Quite quiet place in Poland Member No.: 4181 |
Having read through the 'Statistics(...)' thread I have to mention that I only use few types of test:
7-8 tries (short), 14 or 16 tries (long). If I need to make a long test after a short one to be sure, I double number of tries, not just repeat it. I'm not expecting certain probability. I can state that I'm sure if I achieve > 6/7 or 7/8 and know the type of an artifact. (Otherwise I do additional long test.) QUOTE Example: The probability to pass an "traditional" 0.95-test by guessing when one's allowed to stop at every point up to 30 is 0.129! (you can test this with my Excel-sheet from above) That's the point... The only proper ABX has either 'hard' stop points or no stop points. I'm using the latter. -------------------- I've changed only because of myself.
Remember, when you quote me, you're quoting AstralStorm. (read: this account is dead) |
|
|
|
![]() ![]() |
|
Lo-Fi Version | Time is now: 21st May 2013 - 12:24 |