Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: Changes to come in ABX module of ABC/hr (Read 17027 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

Changes to come in ABX module of ABC/hr

I plan to add a training mode to the ABX module.  This mode would display the number of correct trials and the number of total trials performed in training mode, but no p-value.  The listener is allowed to clear this running display at any time.  Results in training mode will not be written to file.

The normal ABX mode will no longer display a running tally of correct trials; however it will continue to display the number of trials performed.  The listener will be asked to enter a fixed number of trials to perform.  The ABX will terminate after those trials have been completed and the tally and p-value will then be displayed and written to file.  If the specified number of trials are not completed, then the results will not be displayed or written to file.  The listener can reset at any time, before the fixed trials have been completed.  After the fixed trials have been completed, though there will be no further ABX testing possible except in training mode.

If the ABX results achieve a p-value <= 0.05, and a sample is being compared against the original, then the hidden reference for that sample will be disabled in the ABC/hr module.

This is the simplest solution to the sequential test problem.  Thanks to John Corbett in rec.audio.opinion for the suggestion.

ff123

P.S.  I have been looking at a simpler library for XML than expat/arabica, and it is located here:

http://www.fxtech.com/xmlio/

But I haven't implemented anything yet.  I think I will put this on hold until I can fix ABX, which has been less than satisfactory IMO for a long time.

Changes to come in ABX module of ABC/hr

Reply #1
Here is the screen I plan to fill out:

http://ff123.net/export/newabx.png

I'm thinking that the total number of trials in normal mode will be allowed to be changed in the middle of the test.  It is still "fixed" as long as the results are not revealed.  Once the results are revealed, though, the test is terminated.  So I need to add a "Finish test" button.

ff123

Edit:  I think I'll keep the p-value in training mode, but make it clear that it's a "projected" p-value, or something like that.

Changes to come in ABX module of ABC/hr

Reply #2
Quote
though there will be no further ABX testing possible except in training mode.

I don't know how limiting this is in the real world. Probably not at all.

But this assumes, that a negative ABX result means the tester can not hear a difference. This may be true for the specific situation, but may be untrue an hour later or after having eaten supper. This is disregarded if you deny further ABXing.

Many people allready complain about the pressure in ABX testing (often just as an excuse, granted), but if there are both training mode and normal mode, additional pressure is put on the tester to make each round count in normal mode.

Anyway, I suspect that during regular listening tests, people will seldom bother to try to re-ABX a sample so it's not really a big limitation.

I found tigre's idea about "abx stop points" really interesting. Has it been investigated if this would be statistically sound? Maybe that would be a cool idea to implement.

Changes to come in ABX module of ABC/hr

Reply #3
I like the simplicity of the normal mode/training mode vs. stop points.

Here is an updated screen:

http://ff123.net/export/newabx1.png

It has entries for type II error and effect size, which for the near future will just be disabled, since I don't know enough about the theory to put those in yet.  The type I error is the normal 0.05 value we're accustomed to using.

Before the results are finalized, the type I error can be changed, but it is fixed afterwards.  Default will be 0.05.

ff123

Changes to come in ABX module of ABC/hr

Reply #4
I'm from being an ABX Aficionado, however, I really like where you are going.
I believe. the training exercise will be an immense help.

You've peaked my interest, I'll have to go back and read up to re-familiarize myself with ABC/HR.

Thank you for your work ff123, I thoroughly enjoy your site and recommend it often.

Bye, tec

Changes to come in ABX module of ABC/hr

Reply #5
I think this is a very good idea. It solves all the statistical problems with sequential ABX results, yet stays very simple and easily understandable for everyone (which, in my opinion, wasn't the case for the "stop point" approach). And with the possibility of training as much as you like, and resetting the "normal" ABX if you're in doubt whether you got a good start (as long as you haven't reached the trial limit yet), I think there isn't too much pressure on the listener.

I have just implemented this in ABC/HR for Java, except for the disabling of wrong sliders in the ABC/HR part. I'll post an updated version once it's done and tested.

Changes to come in ABX module of ABC/hr

Reply #6
Latest screen:

http://ff123.net/export/newabx2.png

Eventually, the program would suggest the number of trials to perform given the alpha, beta, and theta.  For the near term it will suggest the minimum number of trials to perform given the type I error specified.

ff123

Changes to come in ABX module of ABC/hr

Reply #7
Can I suggest something.

Move stop button further down. The most buttonns people want to use during ABX trial or training are:


A, B, X, radio buttons and next.

Stop is only used once.


Group the most used buttons together.


That is, if I read your layout right.

Changes to come in ABX module of ABC/hr

Reply #8
Quote
Can I suggest something.

Move stop button further down. The most buttonns people want to use during ABX trial or training are:


A, B, X, radio buttons and next.

Stop is only used once.


Group the most used buttons together.


That is, if I read your layout right.

Actually, I click stop all the time.  So for me, the layout works for the mouse clickers among us.  For the keyboard types, the shortcuts should work ok.

ff123

Changes to come in ABX module of ABC/hr

Reply #9
Ah, ok. I misunderstood the meaning of the buttons. Stop is only "playback stop" not "stop the current test run", right?

Changes to come in ABX module of ABC/hr

Reply #10
Quote
Ah, ok. I misunderstood the meaning of the buttons. Stop is only "playback stop" not "stop the current test run", right?

Correct.

Changes to come in ABX module of ABC/hr

Reply #11
I've been playing with the statistics of ABX and now understand it well enough to make some decisions.

I think I've decided that since ABC/hr is mostly used for perceptual codec testing that I may make some presets, the one immediately below being the most important:

defect description: "moderate"
critical alpha = 0.05
critical beta = 0.2
theta = 0.9 to 0.95 (listener can hear a difference 80 to 90% of the time)
suggested correct/total trials:  7/8

defect description: "subtle"
critical alpha = 0.05
critical beta = 0.2
theta = 0.8 (listener can hear a difference 60% of the time)
suggested correct/total trials:  13/18

defect description: "obvious"
critical alpha = 0.05
critical beta = 0.2
theta = 0.995 (listener can hear a difference 99% of the time)
suggested correct/total trials:  5/5

Type II errors are not so much of a concern for codec comparisons.  I am trying to detect differences, not to verify similarity.


I will probably implement an "N suggester" via lookup table allowing the following choices:

critical alpha: {0.05, 0.01}
critical beta: {0.05, 0.1, 0.2}
theta: {0.995, 0.9, 0.85, 0.8, 0.75, 0.7, 0.65}


Note that for similarity testing, critical beta is typically set low (sometimes lower than 0.05), and theta is typically even lower than 0.65.  And sometimes critical alpha is allowed to rise to 0.1 or even higher.  But since I am not interested in this type of testing, such values will not be allowed.

ff123

Changes to come in ABX module of ABC/hr

Reply #12
Spreadsheet with values calculated for type I and type II errors, given correct trials, total trials, and effect size (what's called pmax)

http://ff123.net/export/TestSensitivityAnalyzer.xls

ff123

Changes to come in ABX module of ABC/hr

Reply #13
Latest screenshot:

http://ff123.net/export/newabx3.png

The program can suggest the number of trials for you to perform given an estimate of the effect size.  The presets are some values of alpha, beta, and theta based on my personal abx experience, and on the typical function of the ABX module in ABC/hr, which is to detect differences (verifying similarity is not important).  Or, if you perform a number of trials in training mode, you can get an estimate of the effect size that way.

I had a heated discussion with Arny Krueger, the author of the original PC-ABX, over the importance of these features:

http://groups.google.com/groups?hl=en&lr=&...001%25400.0.0.0

ff123

Changes to come in ABX module of ABC/hr

Reply #14
I've uploaded a beta version to:

http://ff123.net/export/abchr1.1beta.zip

It does not yet disable the appropriate slider if an ABX against the original is successful.  Also, the information written to file does not include the specified alpha, beta, or theta.  Nor does it include the resulting beta.

However, I think everything else pretty much works the way I want.  I'd appreciate some feedback or bug reports.

ff123

Changes to come in ABX module of ABC/hr

Reply #15
Excellent work ff123!

You are working your way there much faster than I am.

Your argumentation on the rao was also illuminating to read as you work your way through the logic step by step. It is very unfortunate to notice that AK, who is gung-ho on statistics initially, starts to move goal posts when his arguments are proven to be misguided (eventually claiming the discussion is only about statistics). That again is unfortunate, because the discussion is about improving the accuracy of statistical inference and the implications that has for practical testing! It's not just developing the methods for "mental masturbation" as somebody put it.

This is development of detection methods that has _practical_ significance. It is sad to notice that people fail to see this.

All in all, discussion in RAO has been so ad hominem based for so many years already that it takes a keen mind and perseverence to weed out really useful comments from amidst all the noise. I've given up on hope many years ago.

I'm  not sure if you looked up the two-tailed test by Leventhal yet, but it shines a little more light on the same issue (in terms of methods), imho.

Regards,
Halcyon

PS What are you using as statistics references? I've been recommended the following two as up-to-date books (but have not gotten them yet):

http://tinyurl.com/3dhl5

http://tinyurl.com/2a2yc

Changes to come in ABX module of ABC/hr

Reply #16
Quote
Excellent work ff123!

You are working your way there much faster than I am.

Your argumentation on the rao was also illuminating to read as you work your way through the logic step by step. It is very unfortunate to notice that AK, who is gung-ho on statistics initially, starts to move goal posts when his arguments are proven to be misguided (eventually claiming the discussion is only about statistics). That again is unfortunate, because the discussion is about improving the accuracy of statistical inference and the implications that has for practical testing! It's not just developing the methods for "mental masturbation" as somebody put it.

I knew going in that convincing Arny to change PC-ABX would be a lost cause.  But to see such disregard for the valid criticisms pointed out was stunning.

His general recommendation of 14/16 is looking for a p-value of 0.002!  This is a ridiculously low value which raises the bar unfairly against the detection of very subtle differences.

Arny's assertion that more trials don't help to hear subtler differences, which he claims to base on experience, is contradicted by the fact that he even recommends 14/16 at all.  Why not just recommend 9/9 trials instead?  That would give the same p-value of 0.002 (ignoring the non-standard criterion of significance), without the need for an extra 7 trials.  The answer is that 16 trials is better than 9 trials because one might make mistakes.  Why would somebody make mistakes?  Because the differences are subtle!  He fails to carry the argument all the way through, stopping at the point where he and his cohorts in the 70's decided that even subtler differences were unimportant.

I should point out that in tests for similarity (according to Sensory Evaluation Techniques), small values of theta would be 0.625 or less.  They assume a large number of different testers instead of an individual tester performing many trials (the theta is different between the two types of tests, which Corbett pointed out).

An example, ignoring the theta transformation:

Let's say I want to show that two cables sound the same with a type II error risk of 0.05, and a theta of 0.625.  To control N somewhat, I'll allow the type I error risk to rise up to 0.2.  That still calls for an N of about 100!  No wonder Arny doesn't highlight the statistics.  He wouldn't be able carry out his vendetta against "snake oil" with such intensity if his victims knew what it takes to really show that "there is no difference" with confidence.

His point about controlling fatigue is valid, but nobody said that many trials have to be performed in one sitting.

Quote
I'm  not sure if you looked up the two-tailed test by Leventhal yet, but it shines a little more light on the same issue (in terms of methods), imho.


What was the reference again?  I looked up two articles by Burstein, which John Corbett recommended, but not the Leventhal.

Quote
PS What are you using as statistics references? I've been recommended the following two as up-to-date books (but have not gotten them yet):

http://tinyurl.com/3dhl5

http://tinyurl.com/2a2yc


Still using Sensory Evaluation Techniques, which has contained all of the information on both test techniques and statistics I've needed to date.

Changes to come in ABX module of ABC/hr

Reply #17
I performed my vorbis@pre-echo listening test with 1.1 beta. No problem. I like the idea of presets for the ABX module. One regret: impossible to perform a second ABX test on the same session. If I miss a « moderate difference » listening test (8 trials), I can't perform another one.

Changes to come in ABX module of ABC/hr

Reply #18
Quote
I performed my vorbis@pre-echo listening test with 1.1 beta. No problem. I like the idea of presets for the ABX module. One regret: impossible to perform a second ABX test on the same session. If I miss a « moderate difference » listening test (8 trials), I can't perform another one.

Thanks for giving it a test spin.

I've fixed one annoyance where the ABX screen would not clear if another set of files were loaded.

Regarding not being able to perform a second ABX test.  Yes, that is a pity, but it's only really problematic if a test administrator insists on seeing successful ABX results written to file.  Otherwise, you can still get personal results in training mode.  It also means that one needs to develop a feel for how many trials will be needed for success, which takes practice.

The upside is that I finally have no hesitation about unhiding the hidden reference for a successful ABX, which is nice compensation, I think.  Working on that part of the code now.

ff123

Changes to come in ABX module of ABC/hr

Reply #19
I've also noticed two-three times than comment windows were not cleaned (i.e. keeping comments I've wrote with older tests).

It would be nice if user could cancel the llast trial during ABX session. I sometimes regrets my choices (shortcuts mistake, too fast evaluation, etc...). When results are hidden (and only in that case), I suppose that cancelling latest trial isn't a problem. Am I right?

Changes to come in ABX module of ABC/hr

Reply #20
Quote
I've also noticed two-three times than comment windows were not cleaned (i.e. keeping comments I've wrote with older tests).

The only way I could think of to make that happen was to leave the comment window open when starting a new test.  I will make sure they get closed.

I'm going to release a second beta which unhides the reference.

Quote
It would be nice if user could cancel the llast trial during ABX session. I sometimes regrets my choices (shortcuts mistake, too fast evaluation, etc...). When results are hidden (and only in that case), I suppose that cancelling latest trial isn't a problem. Am I right?


Ech, you want yet another button?  It's already pretty cluttered.

ff123

Changes to come in ABX module of ABC/hr

Reply #21
http://ff123.net/export/abchr1.1beta2.zip

bug fix: open comments windows (including general comments) are closed and cleared if another test is started.

feature added: hidden reference is unhid if abx is successful.

ff123

Please test and report bugs/comments.

Changes to come in ABX module of ABC/hr

Reply #22
Aarrgh!  General comments are still buggy.  If I save just before loading another config file, general comments don't clear.

Also, if I start another test by clicking OK from the setup menu, it prematurely starts to end the test (it's supposed to give the listener a chance to save or cancel).

Will fix tonight.

ff123

Changes to come in ABX module of ABC/hr

Reply #23
Is there any way to avoid the slight gap when ABXing when you switch between files? It makes it a tad harder than I'd like to ABX the files.

Changes to come in ABX module of ABC/hr

Reply #24
Quote
Is there any way to avoid the slight gap when ABXing when you switch between files? It makes it a tad harder than I'd like to ABX the files.
[a href="index.php?act=findpost&pid=225457"][{POST_SNAPBACK}][/a]


Not the way I have it implemented currently.  Suffice it to say that I have no plans to change the current implementation (uses a single wavOut stream).  I stop each stream before starting another.

The ideal solution would be to synchronize multiple audio streams (how does one do that?), and then switch between the streams.

ff123