Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: About hide results option, in foobar ABX component. (Read 4865 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

About hide results option, in foobar ABX component.

hi,
In an other thread , I've  done an ABX  test, with the resulting log:

Quote
foo_abx 1.3.4 report
foobar2000 v0.9.6.8
2009/10/15 18:19:47

File A: C:\Users\mehdi\Desktop\carpenter\Carpenter_128kps_cbr.mp3
File B: C:\Users\mehdi\Desktop\carpenter\Carpenter_FLAC.flac

18:19:47 : Test started.
18:38:54 : 00/01  100.0%
18:39:06 : Trial reset.
18:39:48 : 01/01  50.0%
18:40:44 : 01/02  75.0%
18:43:37 : Trial reset.
18:55:29 : 01/01  50.0%
18:56:41 : 02/02  25.0%
18:58:29 : 03/03  12.5%
19:00:07 : 03/04  31.3%
19:06:17 : Trial reset.
19:11:58 : 01/01  50.0%
19:13:08 : 02/02  25.0%
19:14:10 : 03/03  12.5%
19:14:46 : 04/04  6.3%
19:16:17 : 05/05  3.1%
19:17:16 : 05/06  10.9%
19:21:55 : 06/07  6.3%
19:22:29 : 07/08  3.5%
19:23:15 : 08/09  2.0%
19:24:21 : 09/10  1.1%
19:26:07 : 10/11  0.6%
19:27:00 : Test finished.

----------
Total: 14/18 (1.5%)


In this test I've been resetting the results, until I  found a section where I  could  "successfully" pass the  ABX  test.
I  was a bit annoyed by  all the reset that appeared in the logs, but decided to not crop it.
Then people reacted that it's not a proper way to do an ABX  , and that the option "hide options" should have been checked.
In fact they even think that this option should be checked by default, in foobar ABX  component.

I  personally don't see why.
What's the point of continuing an ABX  test, if you are uncertain that you can spot a  difference ?

About hide results option, in foobar ABX component.

Reply #1
I'll reproduce what I posted a few minutes ago in the thread extrabigmehdi refers to in his post above.

... Like extrabigmehdi, I find that fatigue sets in quickly.  (Fatigue could be due to desensitization of the nerve receptors in the ear, desensitization of the brain's processing of the nervous impulses from the ear, or both.)

For me, sometimes the fatigue is obvious - samples that were initially distinguishable seem so much the same that I lose any confidence in giving an answer.  (In such a situation I would not need to see progressive results to tell that to continue giving answers without taking a break, or without changing the segment of the test files I was listening to, would be futile.)

But sometimes the fatigue is not obvious. This is most likely when the samples are extremely similar.  The intense concentration required can result in a false sensation of hearing a difference. In such circumstances, ability to see progressive results provides assurance that persisting with the ABX session is worthwhile.

Yes there is potential impairment of the statistical significance of results if progressive results can be seen ... but without the availability of progressive results, the ABX process can prove not only labourious but fruitless.  I know that if I spent 45 minutes listening intently and providing 16 answers I would be extremely disappointed to find that the first 4 answers were correct, and the next 12 were no better than random guesses!  An overall result of 10 out of 16 correct.  But had I been able to see progressive results, then on getting the 5th answer wrong, I would have been alerted to the need to shift the test selection to another segment of the files under comparison.  If the new selection proved to be more revealing of discrepancies for my hearing, and not subject to fatigue, then my overall ABX result might have been 15 correct out of 16, instead of 10 out of 16.  Another approach would be to restart testing using the more revealing and less fatiguing segment.  However I find that with many test files asked to be ABX'd there is no segment that is revealing and not fatiguing, for my hearing.


About hide results option, in foobar ABX component.

Reply #2
For both Extrabigmehdi and MLXXX, I want you both to understand that my intention with this post is education.  It is not meant in any disrespectful manner.

First, a very important link.  This is findable through the TOS as well, specifically #8:
http://www.hydrogenaudio.org/forums/index....showtopic=16295

Pio has done a wonderful job explaining ABX testing in detail here.  With the detailed explanation covered, I am going to approach this in a more conversational manner.

My summary of the result of an ABX test.  An ABX test, properly conducted, either provides or fails to provide evidence to support the statement "A difference exists between these two things that can be detected".  A test that fails to provide evidence is every bit as valid and useful as a test that provides evidence of a difference.

Given this understanding, I am going to ask a somewhat loaded question.  Why do you feel it is appropriate to throw out results that you don't like?

About hide results option, in foobar ABX component.

Reply #3
@Tahnru
Quote
Why do you feel it is appropriate to throw out results that you don't like?

If I would redo the test, I would just  only ABX the section that worked the best for me.
That's not different.

About hide results option, in foobar ABX component.

Reply #4
Yes, that's another way to skew the strength of the results.  Why would such a thing be appropriate?

[Edit]  I thought of another question, that may help demonstrate what I'm driving at.

A quote of the relevant section of Pio's work:
Rule
Quote
3. The p values given in the table linked above are valid only if the two following conditions are fulfilled :
-The listener must not know his results before the end of the test, exept if the number of trials is decided before the test.
...otherwise, the listener would just have to look at his score after every answer, and decide to stop the test when, by chance, the p value goes low enough for him.
-The test is run for the first time. And if it is not the case, all previous results must be summed up in order to get the result.


New question.  Is Pio wrong on rule #3?
[/edit]

About hide results option, in foobar ABX component.

Reply #5
Tahru, before you edited your post, I was going to post this as food for thought:-
[blockquote]
Of course you can train yourself as much times as you whish, provided that you firmly decide beforehand that it will be a training session. If you get 50/50 during a training and then can't reproduce this result, too bad for you. the results of the training sessions must be thrown away whatever they are, and the results of the real test must be kept whatever they are.
[/blockquote]

I think that there is a serious risk of skewing the results if a person repeats their testing in dozens of sessions of up to 16 trials and then keeps the results of the best session and presents those as the "official result".

But I think that use of a progressive result to warn the user that they are wasting their time on that occasion, and should rest their ears or choose another test segment, is of practical use, and need not skew the results seriously.  [I do not intend to try to provide a quantitative mathematical explanation of the extent of the skewing.  Perhaps someone interested in that aspect could offer something.]

I've noticed that in some threads on HA, ABXing has been requested and yet few people have posted results to the thread.  I think that a methodology that saves time and facilitates arriving at a result, may be more practical than a methodolgy that involves a person devoting 45 minutes to find that the segment they selected to test on that day was beyond their capacity to perceive different hearing sensations that were more accurate than guesses.

In other words 10 slightly skewed results may be more useful than say 2 "pure" results.  If I were developing a new codec I'd use the slightly skewed results of users to make faster progress in optimising the codec, rather than oblige users to take 45 minutes each time they tested a revised version of my codec.

Cheers,
MLXXX

About hide results option, in foobar ABX component.

Reply #6
But I think that use of a progressive result to warn the user that they are wasting their time on that occasion


I'm afraid I am probably mis-interpreting you here.  But the first thing that springs to my mind when I read this is that you think there is no use in a negative result.

Please help me understand this better.  Or, explain how can a user be wasting time with a properly set up test?

About hide results option, in foobar ABX component.

Reply #7
I  apologize that I  didn't follow to the letter Pio's Sticky, and doubt
everyone are doing this. This piece of literature is quite long to assimilate (at least for someone, that doesn't have English as a mother tongue).
Never mind , this is the first  time I've posted an ABX log here, and now  that I  realized how it could be complicated , I  won't try to justify this ABX log.




About hide results option, in foobar ABX component.

Reply #8
I'm afraid I am probably mis-interpreting you here.  But the first thing that springs to my mind when I read this is that you think there is no use in a negative result.

Perhaps the following hypothetical  example will explain "where I am coming from".

[blockquote]Let us assume a research laboratory pays Solomon €30 an hour to provide ABX results for codex Z [relative to lossless encoding] at the following ten bitrates:

48, 64, 72, 88, 96, 128, 144, 164, 176, 196 kbps.

He is required to spend up to one hour on each bitrate, and in that hour stop when he has provided 20 considered responses.  He is allowed a maximum of 6 listening attempts per requested response but then must provide a response even if unsure.

Let us assume that Solomon can easily distinguish codec Z at 48 and 64 kbps; finds 72 kbps and 88 kbps challenging; is at the limit of his ability at 96 kbps [borderline ability to discriminate, depends on physical and mental state]; and has no capacity to discriminate at 128kbps and above.

It could be expected that Solomon would reach his quota of 20 very easily for the tests at 48kbps and 64kbps and be very confident; have to put in a bit of effort for 72kbps and a lot of effort for 88kbps and be somewhat confident; be in a state of stress when attempting the 96kbps test and be not sure whether he was imagining things; and for 128kbps, 144kbps, 164kbps, 176kbps and 196kbps be in a similar state of stress and lack of confidence as for 96kbps.

___________

Alternatively, Solomon could be permitted to do some quick preliminary testing where he could see his own results.  Perhaps he could be instructed to cease testing (in the first instance) a bitrate if he gets 5 out of 5 correct (this has a guessing probability of 1/(2x2x2x2x2) or 1/32 or 3%) and proceed to the next higher bitrate.  However he could be instructed that If cannot get 5 out of 5 correct at the next higher bitrate, he must redo the lower bitrate to 20 answers.  In the end he might end up doing only two 20 answer tests:
* a 20 answer test at 88kbps for which he might obtain 20 out of 20 correct;
* a 20 answer test at 96kbps for which he might obtain 15 out of 20 correct.

He would not attempt a fully fledged test at 128bps or above.  In fact he would not even attempt a 5 out of 5 test above 144kbps.

The use of feedback of results has improved the efficiency of the testing.

______________

Solomon's result of 15 out of 20 is not all that persuasive.  The research laboratory staff could ask Solomon to drop in on another day, when he was "feeling fresh".  He could be allowed a quick warmup, or "training session".  If his results were poor in the training session he could be told not to waste his time that day on a formal test, but try yet another day.  He might be paid €15 for his trouble. On that further day, if his warmup result at 96kbps was good (say 5 out of 5) he could be asked to proceed to a formal 20 trial test.  He might get 17 out of 20 correct. The conclusion reached would be that it appeared Solomon's hearing discrimination at 96kbps for codec Z was unreliable but existed.  To encode music [similar to the test sample music] transparently for a person with hearing like Solomon's, 128kbps or above would be recommended.[/blockquote]

About hide results option, in foobar ABX component.

Reply #9
MLXXX, Thank you for providing feedback on your thoughts.  It's a great help in understanding. 

Before I go back and respond to your last post (I have several points of criticism that you may or may not find helpful), let's first try to steer back to the topic.

To my point for wishing for the checkbox to be checked by default: - as Extrabigmedli and other newer users have demonstrated, correctly performing an ABX test involves several steps that are not immediately obvious.  Having this box enabled by default would at least require the user to take some small action before peeking at the results.  For the users that aren't familiar enough to want intermediate results, they won't get pulled off-course by seeing 00/03 and an attitude that they must "pass" an ABX test.

For any kind of test I can think of, within my own work or in other disciplines, it's bad procedure to interpret results before the test is completed.

The suggestion isn't to remove the ability to see intermediate results (although this would be my personal choice), but to check the box by default.




About hide results option, in foobar ABX component.

Reply #10
I  apologize that I  didn't follow to the letter Pio's Sticky, and doubt
everyone are doing this. This piece of literature is quite long to assimilate (at least for someone, that doesn't have English as a mother tongue).
Never mind , this is the first  time I've posted an ABX log here, and now  that I  realized how it could be complicated , I  won't try to justify this ABX log.


Extrabigmehdi, as I said before, it was my intention to thank you for providing an example of what I feel is a common pitfall for those new to ABX testing.  Since I'm pretty much the only one currently on the check-the-box-by-default side of the discussion in this thread, I'm pretty sure this is directed at me.  Again, I mean nothing personal by this.  I'd be happy to address anything further, if need be, by PM.

About hide results option, in foobar ABX component.

Reply #11
For any kind of test I can think of, within my own work or in other disciplines, it's bad procedure to interpret results before the test is completed.

The suggestion isn't to remove the ability to see intermediate results (although this would be my personal choice), but to check the box by default.
Only have a few seconds to post at the moment:

For many feats of human performance, results are observable to the subject, e.g. weight lifting, running.  Although it is possible to lift unmarked weights, or to perform a marathon untimed, and solo.

Perhaps the ABX hide results check box could be checked by default and labelled "training mode - see progressive results".  Or unchecked by default.