Help - Search - Members - Calendar
Full Version: Blind tests and HydrogenAudio
Hydrogenaudio Forums > Hydrogenaudio Forum > Site Related Discussion
Pio2001
I've at last updated the Term Of Service number 8, and posted a sticky explaining what is an ABX test.

I'd like to thank, by alphabetical order, Canar, Dibrom, FF123, and Garf for their previous contribution, that I partially copied in the TOS and the sticky, and KikeG for his binomial table.

Any suggestion for improvement is welcome.
Digga
that's a nice thread for new users here, IMO it's well explained, without getting to difficult.
however, I see one thing that does confuse me:

QUOTE
If the probability is in the green (<5%) or, better, the yellow zone (<1%), the test is considered as successful.
isn't normaly the color-sheme (used in pc-games, encspot, traffic signs and whatnot) from good to bad in that order: green, yellow, red.
so, the quote would indicate having a chance to guess of 5% is better than 1% blink.gif

as I said, a very usefull nice sticky, but to prevent confusion I guess the colour thing should be changed...
Pio2001
Thanks for your suggestion, I let KikeG see if he can update the binomial table.

By the way, I deliberately left out the "The Hydrogenaudio staff might not take action against users that post these harsh responses" part of the previous version, because I think that we can't handle anymore the "harsh responses" currently posted in many threads (I talk about no one especially, there were many people involved in harsh or unuseful answers recently).
blessingx
Most ABX tools mentioned are Windows only. Are there other options for members who use a different OS?
I'm personally interested in OS X, but curious about Linux possibilities for those users also.
If there are not accepted ABX tools available for a particular OS, what are the ramifications for those users and comments they are allowed to make on HA? Just curious.
rjamorim
You can do ABX testing on any computer with Java using Schnofler's ABC/HR

http://rarewares.hydrogenaudio.org/others.html

Also, you can do it from a web browser (!) using smok3's ABX2go

http://users.volja.net/smoker/abx.htm

The java comparator uses the same randomizing routines as the other tools, so it should be OK.

Dunno about ABX2go.
Jan S.
[rant]why don't you use the wiki?
http://doc.hydrogenaudio.org/wikis/hydrogenaudio/ABX[/rant]
Pio2001
I read this page. I would have used it if I had it from the beginning, but for a reason I don't remember, I only read this text after having finished the draft of the new post, and I actually forgot that it was from the wiki (because I had gathered the documentation long before, and yesterday all I started from were text files).

I considered from the beginning if the new post should be in the Wiki or in a sticky. The best answer was the wiki. The problem was that I'm really short of time, and I have not assimilated the rules and formatting of the wiki. I thought it would had taken me one more hour. But if I had remembered about this page, I would just have copied it in a sticky instead of writing a new post.
Pio2001
I updated the FAQ with the two java programs, as well as Foobar2000 and Linabx (for Linux).

I changed the link in the TOS to the Wiki rather than the Sticky.
MxM
sticking into this thread i should state that we had a thread being in mp3 general about differences between JS and TS. as im newer to the board then others my working progress was a little aside the rules...and just step by step in that thread i got into what ABX is and so on... afterwards it came to the necessary rulechange that is mentioned here and so i decided to ask some things via PM before posting. after some conference PIO and me decided that some of the question might be posted here to be discussed and declared open.

as i agree to that:


my first question was about WEBSPACE, as not anybody has Webspace for posting its samples.

WHERE TO PUT THE SAMPLES IN THE WEB?

HA gives the fine opportunity to load them to its own server though the restriction is 30 seconds of copyrighted material. second restriciton is 9 MB as users with modem should have the chance to load that.

my very next question was MAY I UPLOAD MORE THEN 30 SECONDS, IF THE MATERIAL IS NON-COPYRIGHTED MADE MYSELF, AS I CAN CHOSE SOME SPECIAL EFFECTS OR INSTRUMENATION TO SHOW WHATS THE PROBLEM.

pio gave the answer that if it is not copyrighted in any way and theres no law restriction to my own label... for licensing or any other problematic way... i can upload more then 30 seconds still restricted to 9 MB for modem

so i came to the quality of the samples thinking about samplefaking and in the same way we came to WHICH FORMAT IS TO BE CHOSEN FOR UPLOAD THE SAMPLE.

conclusion was, it MUST be lossless anyway. as there are all presets and recommendations available its no need for posting more then the PURE UNENCODED LOSSLESS SAMPLE thats the grounding for the DOUBLE ABX TEST. everything else will be reworkable for anybody else as it MUST BE KNOWN (see recommendation list of this forum) still the question for particular formats is open. IT IS RECOMMENDED TO TAKE .FLAC as default lossless unencoded file as it is portable to LINUX and other OS ( i had a question about ape, which actually became open source, as far as i know but still is not ported at all, though FLAC is) in some cases if the sample is short enough .WAV CAN BE TAKEN TOO, though you still should think of modem users.

i think these were the questions


WHERE TO PUT THE SAMPLES?
ARE THERE RESTRICTIONS ABOUT FILESIZE AND SAMPLELENGTH?
WHICH FORMAT TO CHOOSE FOR THE UNENCODED SAMPLE FOR UPLOAD?

i hope i didnt forget anything, and if ...pio may find this post and adds it to me.


-max
PoisonDan
IMO, samples longer than 30 seconds aren't very useful for ABX tests. Quite the contrary, the sample should be as short as possible.
Digga
sooo...
any chance that the green-yellow thing could be corected?
esldude
Hello,

Am rather a recent poster to the board.

Why don't any of the tests use two alternative forced choice testing?

Perhaps this has already been covered here.
tigre
QUOTE(esldude @ Dec 13 2003, 09:36 AM)
Why don't any of the tests use two alternative forced choice testing?

If I understand correctly what a "two alternative forced choice test" is, this is exactly what ABX programs do:

At each trial you have two choices:
1. A = X (and B = Y, depending on the program) or
2. B = X (and A = Y)

You're forced to choose between either 1. or 2.

Have I missed something?

BTW: Welcome to this place, esldude! smile.gif
ff123
QUOTE(esldude @ Dec 12 2003, 11:36 PM)
Hello,

Am rather a recent poster to the board. 

Why don't any of the tests use two alternative forced choice testing?

Perhaps this has already been covered here.

A 2-AFC is typically used to determine directional preference. For example, a typical test from the food tasting industry would ask which of two beer samples is more bitter. The analog in music samples would be to ask which sample is "better" or "preferred."

This is subtly different from asking which sample is closer to the original.

A better test would be to use the Same/Different test, where listeners are presented with two samples, asking whether the samples are the same or different. In half the pairs, the samples are different, and in half the pairs, they're the same (the four combinations would be equally distributed (A/A, B/B, A/B, B/A). This test is more general than the "A - Not A" test, in that A or B do not necessarily have to be the original (i.e., they can both be lossy samples).

The apparent disadvantage of this type of test is its inefficiency -- the information on possible differences is obtained by comparing responses obtained from the different pairs (A/B and B/A) with those obtained from matched pairs (A/A and B/B). Ideally, the listener would listen to all 4 pairs to avoid bias from creeping in (repeat multiple times). I would think that many more trials would have to be performed to achieve the same level of confidence that ABX produces.

The advantage of this test, of course, is that deciding whether A is the same as B is a simpler task than determining whether A or B sounds the same as X.

ff123
KikeG
QUOTE(Digga @ Dec 13 2003, 08:10 AM)
sooo...
any chance that the green-yellow thing could be corected?

Maybe this Christmas, when I have more spare time. I'm sorry, but I don't think the colors used are something very important.
Digga
QUOTE(KikeG @ Dec 13 2003, 05:03 PM)
QUOTE(Digga @ Dec 13 2003, 08:10 AM)
sooo...
any chance that the green-yellow thing could be corected?

Maybe this Christmas, when I have more spare time. I'm sorry, but I don't think the colors used are something very important.

I agree that it is not utterly important, but still it might confuse everybody new to this area (as in every other aspect of daily life, the coloursheme is viseversa).
of course there's no need to jump and correct it immeadently, but I would appriciate a change in the long run.
however, after all, I'm not in the position to demand such a thing, I can only suggest.
KikeG
QUOTE(Digga @ Dec 13 2003, 05:11 PM)
I agree that it is not utterly important, but still it might confuse everybody new to this area (as in every other aspect of daily life, the coloursheme is viseversa).

It depends. For me it's obvious the the smaller the p-value the better, since the "big" p-values have no color. And, personally, I wouldn't extract any conclusion basing just on the colors used.
Digga
QUOTE(KikeG @ Dec 13 2003, 05:30 PM)
For me it's obvious the the smaller the p-value the better

right, that's obious.

QUOTE
And, personally, I wouldn't extract any conclusion basing just on the colors used.
and this is where you might be wrong, IMHO. if you see some trafficlight, which colour represents the right to cross? red?
this whole sheme of the colours green, yellow, red and them representing abstract values like good ar bad (or go, don't go, in one piece, broken etc) is a natural, unconcious thing in our society.
it is not just colours. it is what the colours mean, and they mean the exact opposite of the p value (in some cases): 5% (=not as good) green (=good), 1% (=good) yellow (=not as good).
that would be like a sentece: I like her very much, but I can't stand her...

so, while this is a maybe a little far fetched, the basic unconguent message is still there, and thus it's confusing.
I agree that everybody who takes a closer look to that will have no problems, but why make this potential confusion possible in the first place?
esldude
ABX is not 2AFC.

It is as ff123 describes it.

You have two choices that are different in some way. You pick one. The question is whether or not the difference is perceptible. If you score 75% or better, then the difference is perceptible. This 75% value scales over differing sample sizes. It has some validity with 16 samples. And of course is statistically quite sound with 30 or more samples.

According to researchers in psychoacoustics, 2AFC is more discriminating, more sensitive than ABX or other methods. Subjectively to test subjects, it is usually simpler and easier. So extra sampling isn't that much of a big deal.

With some of the codecs getting pretty good at 128 kbps, I thought maybe the extra sensitivity of 2AFC would be useful.

Does anyone know of readily available software that uses a 2AFC methodology?
Digga
QUOTE(esldude @ Dec 13 2003, 10:53 PM)
If you score 75% or better, then the difference is perceptible. This 75% value scales over differing sample sizes.  It has some validity with 16 samples.  And of course is statistically quite sound with 30 or more samples.

75 % has not much validity at all. this result may give an indication, but I would definatly not say it's statistically valid with 30 trials.

QUOTE
According to researchers in psychoacoustics, 2AFC is more discriminating, more sensitive than ABX or other methods
can you point me to any links that back this up?
esldude
"Fundamentals of Hearing" by William Yost
"Psychology of hearing" by Brian C. J. Moore

The levels of significance for 2afc are different. A random result will yield 50% so the scale is 50-100% not 0-100%. And 75% is much more significant than 75% in ABX would be.
schnofler
QUOTE
QUOTE((esldude @ Dec 13 2003 @  10:53 PM))

If you score 75% or better, then the difference is perceptible. This 75% value scales over differing sample sizes. It has some validity with 16 samples. And of course is statistically quite sound with 30 or more samples.


75 % has not much validity at all. this result may give an indication, but I would definatly not say it's statistically valid with 30 trials.

75% at 16 trials: pval = 3.8%
75% at 30 trials (22 correct): pval = 0.08%
So, yes, 75% at 30 trials is a highly significant result.
Digga
QUOTE(esldude @ Dec 13 2003, 11:45 PM)
"Fundamentals of Hearing" by William Yost
"Psychology of hearing" by Brian C. J. Moore

The levels of significance for 2afc are different.  A random result will yield 50% so the scale is 50-100% not 0-100%.  And 75% is much more significant than 75% in ABX would be.

thx. if I have some spare time, I'm gonna do some basic reading.
ff123
QUOTE(esldude @ Dec 13 2003, 01:53 PM)
According to researchers in psychoacoustics, 2AFC is more discriminating, more sensitive than ABX or other methods.  Subjectively to test subjects, it is usually simpler and easier.  So extra sampling isn't that much of a big deal.

With some of the codecs getting pretty good at 128 kbps, I thought maybe the extra sensitivity of 2AFC would be useful.

Does anyone know of readily available software that uses a 2AFC methodology?

Yes, I've read that too. But again, 2-AFC is usually about directional preference of a single characteristic. So 2-AFC might be more sensitive than ABX (or actually ABC/HR, which is the inverse of ABX) if a question such as: "Which of the following two samples are louder?" is asked.

ABX and ABC/HR are more robust in that a multiple or complex characteristics can be tested for.

The Same/Difference test I described would be ok for testing large groups of people for one trial (instead of multiple trials for one person), but I think listener fatigue would be a real problem for one-person testing. ABX is already quite fatiguing as it is, and it's more efficient than Same/Different testing.

But as far as I know, there isn't a piece of software readily available which implements 2-AFC or Same/Difference for audio signals.

ff123
This is a "lo-fi" version of our main content. To view the full version with more information, formatting and images, please click here.
Invision Power Board © 2001-2008 Invision Power Services, Inc.