How do I actually perform ABX tests? |
![]() ![]() |
How do I actually perform ABX tests? |
Dec 29 2011, 08:19
Post
#1
|
|
|
Group: Members Posts: 193 Joined: 28-September 08 Member No.: 58729 |
I have the ABX plugin with foobar, and I did some playing around with two files and didn't really know if I was doing the test correctly. Am I supposed to be comparing the difference between A and B or am I supposed to be looking for similarities between X and Y?
|
|
|
|
Dec 29 2011, 10:42
Post
#2
|
|
|
Group: Members Posts: 35 Joined: 20-September 10 Member No.: 84009 |
You have to decide if X or Y is A or B.
|
|
|
|
Dec 29 2011, 11:04
Post
#3
|
|
![]() Group: Members Posts: 1049 Joined: 16-February 08 From: NL Member No.: 51347 |
A and B are known, X and Y are randomized. After listening intently to all four A, B, X and Y, you should be able to say whether A is X or Y, and same for B.
You must repeat this process a number of times ("trials") in a single ABX session, preferrably 10-20. The result probability will either hover around 50%, or approximate zero. In the 50% case, you were clearly guessing, and thus incapable of hearing a difference between A and B. In the second case, there's a good chance you can hear the difference. Try it once with completely different songs to be sure of what you're doing. ABX tests prove beyond a shadow of a doubt whether a person can hear a difference or not. ABX test are wholly incapable of determining which item has better quality. This post has been edited by dhromed: Dec 29 2011, 11:14 |
|
|
|
Dec 30 2011, 07:20
Post
#4
|
|
|
Group: Members Posts: 2082 Joined: 18-December 03 Member No.: 10538 |
ABX tests prove beyond a shadow of a doubt whether a person can hear a difference or not. well...no. A 'shadow of a doubt' is exactly what remains in a statistics-based result. We quantify how large that shadow is, via the p value...in the case of a 'no difference' conclusion, it's our willingness to risk a false negative result. For a p=.05 (a typical, though not necessarily appropriate, value for such tests) we accept a 1-in-20 chance that our results were merely a fluke, rather than being informative. That's the shadow hanging over our conclusion. You can shrink this, and your conclusion can lie far beyond any *reasonable* doubt, but it never actually reaches zero. This post has been edited by krabapple: Dec 30 2011, 07:21 |
|
|
|
Dec 30 2011, 09:55
Post
#5
|
|
![]() Group: Members Posts: 1049 Joined: 16-February 08 From: NL Member No.: 51347 |
pf, nitpickins.
But of course, yes, there are facts, and then there are statistics. |
|
|
|
Dec 30 2011, 14:36
Post
#6
|
|
|
Group: Members Posts: 3080 Joined: 1-September 05 From: SE Pennsylvania Member No.: 24233 |
Or, as Mark Twain would say, "Lies, Damned Lies and Statistics".
|
|
|
|
Dec 30 2011, 19:40
Post
#7
|
|
![]() Group: Super Moderator Posts: 9262 Joined: 1-April 04 Member No.: 13167 |
Also, ABX tests are designed to demonstrate perceived differences. They aren't really intended to determine if things sound the same, let alone prove that things sound the same.
-------------------- Everything sounds the same until it is proven otherwise.
|
|
|
|
Dec 30 2011, 23:45
Post
#8
|
|
![]() Group: Members Posts: 619 Joined: 15-March 07 Member No.: 41501 |
pf, nitpickins. But of course, yes, there are facts, and then there are statistics. http://xkcd.com/882/ |
|
|
|
Dec 31 2011, 20:48
Post
#9
|
|
|
Group: Members Posts: 2082 Joined: 18-December 03 Member No.: 10538 |
pf, nitpickins. But of course, yes, there are facts, and then there are statistics. http://xkcd.com/882/ ooh I like that! But I think, nitpickingly speaking, that one would have to perform the *green* test 20 times to make that 1-in-20 point. OR show that a different color gets a 'significant' result in the next 20 rounds. This post has been edited by krabapple: Dec 31 2011, 20:50 |
|
|
|
Jan 6 2012, 05:14
Post
#10
|
|
|
Group: Members Posts: 296 Joined: 5-August 07 Member No.: 45913 |
Ha! That's good. Replace "acne" with "a rare form of bone cancer" and "green jelly beans" with "fluoridated water" and you have a real story from the news I have read about. The fluoride scaremonger sites reduce it to simply "fluoride causes bone cancer", but when you look into it, it turns out an undergraduate decided to take a much larger, existing study, which found no correlation, and she broke it down into bunches of smaller sub groups. Sure enough, a certain age category of young boys showed a (slight) correlation between fluoridated water intake and this particular cancer, yet girls the same age didn't, nor did any other age group. I would think there must be a name for this kind of error, does anyone know what it is? This post has been edited by mzil: Jan 6 2012, 05:31 |
|
|
|
Jan 6 2012, 09:37
Post
#11
|
|
|
Group: Members Posts: 39 Joined: 2-January 12 Member No.: 96196 |
I would think there must be a name for this kind of error, does anyone know what it is? I've seen it called clusters (if memory serves). Take any randomly distributed dataset, like cancer incidence over a big enough area. There will, even if it's perfectly random be clusters within this data set. So a particular town could have a high incidence of say brain cancer and also happen to have overhead power lines running through it. The brain thinks these two have to be related even though it's just statistical noise. There are huge problems with this in statistics for obvious reasons. Edit: From a quick google, my memory is failing me it's not called clusters. It's a sampling error problem coming from using a small section of a population (and thus a serious problem in small tests if they are not repeated elsewhere). You can't know if your small sample happens to be a sample containing people/things from one of these clusters or not a priori. This is all assuming that you're looking at something is normally distributed, rather than something that has a distribution with fat tails (i.e. more of a chance of extreme events than one expects with a normal distribution) which complicates things further. Sorry, for all the edits, just woke up. This post has been edited by nesf: Jan 6 2012, 09:49 |
|
|
|
Jan 6 2012, 12:04
Post
#12
|
|
![]() Group: Members Posts: 1468 Joined: 30-November 06 Member No.: 38207 |
Clusters are something else.
Dunno what the universal term for this is, but in insurance and in certain branches of economics, it is called «selection». Simply, you select the dice after you have rolled them. (Google «adverse selection» -- then you select rush to action while everyone still treats the dice as random. In this case: you do N trials and report the best, while the uninformed public thinks it is a random draw.) I guess the prototypical joke is this science demonstration at some public fair: Scientist equips the audience with dice and tells them to roll N times and record the outcome. Scientist collects the data, draws the histogram on the overhead projector, explains the theory and opens for questions. Journalist asks to get a picture to the newspaper story of the guy who was so good at rolling dice. This post has been edited by Porcus: Jan 6 2012, 12:09 -------------------- geocities.com/hydrogenaudio: http://goo.gl/tqYZj
|
|
|
|
Jan 11 2012, 01:07
Post
#13
|
|
|
Group: Members Posts: 9 Joined: 28-August 07 Member No.: 46568 |
It's the "look-elsewhere effect". Related to the law of very large numbers: with a large enough sample size, any possible event will eventually occur. If you select sample sets from a larger total some will contain such very rare events.
|
|
|
|
Jan 11 2012, 08:50
Post
#14
|
|
|
Group: Members Posts: 39 Joined: 2-January 12 Member No.: 96196 |
I guess the prototypical joke is this science demonstration at some public fair: Scientist equips the audience with dice and tells them to roll N times and record the outcome. Scientist collects the data, draws the histogram on the overhead projector, explains the theory and opens for questions. Journalist asks to get a picture to the newspaper story of the guy who was so good at rolling dice. It's the "look-elsewhere effect". Related to the law of very large numbers: with a large enough sample size, any possible event will eventually occur. If you select sample sets from a larger total some will contain such very rare events. Yeah my memory is crap, did all this stuff in college and forgot the names of it. The phrase "look-elsewhere effect" doesn't ring any bells for me though, I think we called it something else. We always were presented it as illness "clusters", i.e. a town with say a high rate of mental retardation that also happened to have fluoridated water as a lesson in cause, effect and spurious correlation. I think it's slightly different to what you're talking about SoAnIs, it's more about finding unusual rates of something within a subset of the population that aren't consistent with the population rate than finding a rare event within a sample. Something along the lines of "Let me pick my sample and I can prove anything." It's like how in a randomly distributed data set there will be clusters of a particular event happening or not happening, so say heart disease was randomly distributed and we looked at a nation's distribution of it, we would by chance find towns and villages that have very high or very low rates of heart disease. Some people take these high rates or low rates and assume automatically that there needs to be some causal factor behind them when really they can just be a product of chance. Like the journalist in Porcus' joke. This post has been edited by nesf: Jan 11 2012, 09:07 |
|
|
|
![]() ![]() |
|
Lo-Fi Version | Time is now: 21st May 2013 - 20:56 |