schnofler
Feb 24 2004, 20:46
For me, it takes much more time and effort to do this test compared to the mp3 test. I completed the mp3 test in three days, but with this one, I can't do more than two samples in a row before my concentration and motivation slip. Thus, like tigre, I usually stick to doing one or two a day.
I have completed 9 samples so far, and I'm optimistic that I'll be able to produce useful results with two of the remaining.
QUOTE(rjamorim @ Feb 25 2004, 09:22 AM)
So, I'd like to know, from the ones that tried to participate, the reason of such low response:
-Having problems with ABC/HR Java?
-The samples are too transparent?
-Getting tired of so many tests in a row?
-Another reason?
Simple reason. My computer died

I don't want to sit at a public computer taking the test... Looks a bit...
QUOTE(rjamorim @ Feb 24 2004, 05:22 PM)
So, I'd like to know, from the ones that tried to participate, the reason of such low response:
-Having problems with ABC/HR Java?
-The samples are too transparent?
-Getting tired of so many tests in a row?
-Another reason?
I'm sorry I just cannot seem to ABX these with any kind of consistency. I can do one sample at most before I lose all my concentration and the results become less than useful. I'm going to try again on Saturday since I am really busy this week and maybe I can at least give you some results. New headphones might help me too, since cheap earbuds get uncomfortable after a while. Sorry again.
westgroveg
Feb 25 2004, 00:11
QUOTE
So, I'd like to know, from the ones that tried to participate, the reason of such low response
I had problems d/ling the Sun Java Runtime v1.4.
kl33per
Feb 25 2004, 00:59
I'm still completing the test. Hopefully you'll have a flood of results right before the test closes.
The test is pretty tough compared to the mp3 test. And that one was already difficult. Test fatigue is a factor for me. Still, I've managed to mail in 5 results (one of them all 5's, though).
ff123
QUOTE(rjamorim @ Feb 24 2004, 05:22 PM)
Here I come to bitch again.
So far, I got samples from 6 (six) different listeners. And most of them didn't submit complete result sets.
Obviously, that's not enough to generate statistically valid results. Specially considering I didn't screen the results yet for ranked references.
So, I'd like to know, from the ones that tried to participate, the reason of such low response:
-Having problems with ABC/HR Java?
-The samples are too transparent?
-Getting tired of so many tests in a row?
-Another reason?
That would be my first listening test, and it seems very difficult. Maybe I need training but I am not able to consistently ABX the samples.
Also I didn't find the much time to work on it.
robUx4
Feb 25 2004, 04:31
QUOTE(rjamorim @ Feb 25 2004, 02:22 AM)
So far, I got samples from 6 (six) different listeners. And most of them didn't submit complete result sets.
Obviously, that's not enough to generate statistically valid results. Specially considering I didn't screen the results yet for ranked references.
Hi Roberto,
I've downloaded all the necessary stuff yesterday to proceed with the (complete) test. I will only have time to make it on saturday evening or sunday. I hope it won't be too late for you

edit: BTW that's also the first time I make such a test too
askoff
Feb 25 2004, 05:03
Finaly it's over. This test was much harder than MP3 test. With couple of first samples I got realy frustrated, becaus I just couln't find a spot where to ABX. I almost gave up, until first ABXable sample was found.
I send couple of results where i mentioned that i couln't ABX it, but i guess that is one result also.
So the transparency was almost the issue for me.
i vote for frustration as well, the problem is that for this test i have to abx almost all the samples to actually be sure that things are as they seems to be..., in any case my ratings are like all from 4-5, so i dont see much point on sending such results in.
java abchr is a great app thought.
Continuum
Feb 25 2004, 07:53
No real problem with Java, the save session feature is great.
The test is really difficult though. So far I have done samples 2, 5, 6 and 12. So far, I could find differences everywhere (I hope!), but this takes a lot of time. Two tests in a row is too much, it's more half test - wait - half test for me.
duartix
Feb 25 2004, 09:55
QUOTE
in any case my ratings are like all from 4-5, so i dont see much point on sending such results in
I believe it makes sense and that there are many reasons to still send your results in.
Even if the results say that all codecs are transparent. (My lowest rating so far is 4.7) I'm not worrying too much if I can't spot any diferences. It means the codecs are better than my ears.
But at least we can reach a conclusion. Without results we can't reach it!
rjamorim
Feb 25 2004, 10:58
QUOTE(duartix @ Feb 25 2004, 12:55 PM)
QUOTE
in any case my ratings are like all from 4-5, so i dont see much point on sending such results in
I believe it makes sense and that there are many reasons to still send your results in.
That's right. It's important to know if all codecs are transparent for you. That means they did very well.
Thank-you very much for the explanations. I hope to get lots of results during the weekend

So that you know: the test ends Sunday midnight, Brazilian time.
http://www.timeanddate.com/worldclock/
Yes, the test is quite hard. I completed the previous one in 2-3 hours, but doing all samples for this one took considerably longer, and I had to ABX much more. That's good: it shows that AAC has been improving.
Still, with enough effort it should be quite possible to distinguish the codecs still. Practise makes perfect!
music_man_mpc
Feb 25 2004, 13:15
QUOTE(Garf @ Feb 25 2004, 10:56 AM)
Still, with enough effort it should be quite possible to distinguish the codecs still. Practise makes perfect!
I agree, so far I have had to ABX one sample up to 67/102 before I was sure I heard a difference.

I am quite sure I did, though.
eagleray
Feb 25 2004, 14:04
@Roberto
I saw that you used q 115 for faac.
Did you use a "c" setting or was it the default?
Thanks.
QUOTE(rjamorim @ Feb 25 2004, 06:58 PM)
That's right. It's important to know if all codecs are transparent for you. That means they did very well.
or maybe one just got bored enough with a specific sample?
So far I've only managed to ABX one of the samples
BUT I've been running a test a day or so and planning to send in the results when they're due

That and trying to rustle up a better pair of headphones
guruboolez
Feb 26 2004, 05:55
I'll send you my results in some hours.
I didn't find the test too difficult. It seems that some AAC encoders still have serious problems. I was more annoyed by some samples, which sounded distorted even with the reference.
I've noticed the same problems with ABC/HR for Java, especially with the piano sample. It didn't really disturbed me during this test (but for higher bitrate, which need a lot of concentration, this software is unfortunately not usable on my computer).
Anyway, thank you for conducting the test (and to schofler, for his software).
kl33per
Feb 26 2004, 06:48
Funny, I find this test much easier then the MP3 test.
Firstly, I'm not sure how many samples the MP3 test had, but having five to six samples is so much easier then having seven or eight, particularly when it comes to scoring them.
Secondly, I know that I personally have much more interest in AAC devlopment then I do in MP3 devlopment. Thus I gave the MP3 test away without really giving it fair go, mainly because it didn't interest me. However, considering AAC is what I currently encode music in (Nero AACEnc to be exact) the outcome of this test thouroughly interests me, and therefore I wish to do my part to make it better. So while technically this test may be much harder, subconciously I find it much easier to do as I'm motivated to do it.
Edit: Of course this could all be rubbish if my test results show that I consistantly picked the wrong sample.
This test is difficult. Atleast on my equipment and with my ears. I guess 128kbps (aac) really is enough for me
[proxima]
Feb 26 2004, 09:55
Test completed and results mailed.
I found the test not so diffcult, some AAC encoders have still annoying artifacts. For some codecs ringing is a bad issue, quite annoying for me. Nevertheless, apart some killer samples, a
well tuned AAC encoder at 128 kbps is acceptable...
guruboolez
Feb 26 2004, 10:08
I agree. And one of them suffers from excessive lowpass to my ears (~15000 hertz, maybe less). Maybe not as bad as ringing or other distortions in daily playback, but on ABA/ABX comparisons, the consequences of the lowpass are really unpleasant and immediately betray the encoding.
rjamorim
Feb 26 2004, 10:51
QUOTE([proxima)
,Feb 26 2004, 12:55 PM] Nevertheless, apart some killer samples, a
well tuned AAC encoder at 128 kbps is acceptable...
I agree. I guess several listeners will notice 128kbps is the transparency thresold for them...
Deimos
Feb 26 2004, 14:28
QUOTE(guruboolez @ Feb 26 2004, 01:55 PM)
I was more annoyed by some samples, which sounded distorted even with the reference.
I'm interested to know which reference samples do you consider distorted
Thanks in advance,
Deimos
I'm sending my results on Saturday. A hard test for me, most ratings at 4.5-5 stage.
guruboolez
Feb 26 2004, 20:18
Can't remember their name or number in the list. I've reported my feelings in the ABC/HR comments; I just need the decrypted files to answer you.
P.S. The first one troubles me, and waiting.wav is heavily distorted (or at least very unatural voice) too. But there are some other files too.
It's probably personal troubles, linked to musical genre and mastering I'm not used to listen to (some people are troubled with classical: maybe too much fidelity

)
kl33per
Feb 26 2004, 20:57
QUOTE(guruboolez @ Feb 27 2004, 02:08 AM)
I agree. And one of them suffers from excessive lowpass to my ears (~15000 hertz, maybe less). Maybe not as bad as ringing or other distortions in daily playback, but on ABA/ABX comparisons, the consequences of the lowpass are really unpleasant and immediately betray the encoding.
I noticed this also (I think I mentioned it in the results), particularly on one of the samples.
Raptus
Feb 28 2004, 08:22
Results send for the 12 samples.
After some warming up I didn't have much trouble with the test. Most samples were easily picked because of obvious lowpass (my hearing goes up to 19khz) and ringing/smearing. Sample 5 was a bitch though, could't pick any until I found a spot were preecho occurred with one sample. Samples 1, 2, 6 and 10 were hard, too.
gear: Terratec DMX6Fire 24/96, oldschool Toshiba HR-80 Headphones
Among the encoders are two very good ones and two quite bad.
Looking forward to the results. Will there be an extension?
rjamorim
Feb 28 2004, 10:35
QUOTE(Raptus @ Feb 28 2004, 11:22 AM)
Looking forward to the results. Will there be an extension?
Yes. AAC winner vs. MP3 winner (Lame) vs. WMA Std., MPC, Vorbis and anchor.
guruboolez
Feb 28 2004, 10:42
Are they enough results for the moment? I really hope to see the final results in 48 hours, and not in 10 days...
rjamorim
Feb 28 2004, 10:56
QUOTE(guruboolez @ Feb 28 2004, 01:42 PM)
Are they enough results for the moment?
Hard to say. I surely received enough results, but I'm getting a big amount of ranked references. Some of them can still be used (because the listener sucessfully ABXd the ranked reference), some must be dropped because no ABX was performed.
I'm still screening them to see what can be used and what can't. Hopefully there will be enough by tomorrow night.
elmar3rd
Feb 28 2004, 11:38
QUOTE(rjamorim @ Feb 28 2004, 04:56 PM)
QUOTE(guruboolez @ Feb 28 2004, 01:42 PM)
Are they enough results for the moment?
Hard to say. I surely received enough results, but I'm getting a big amount of ranked references. Some of them can still be used (because the listener sucessfully ABXd the ranked reference), some must be dropped because no ABX was performed.
I'm still screening them to see what can be used and what can't. Hopefully there will be enough by tomorrow night.
Can ranked references be considered as transparent?
A ranked reference shows, that the listener "fails" at this sample and other effects (psychological, fitness) are responsible for the ranking.
I isn't this in some way the same as transparent?
rjamorim
Feb 28 2004, 11:41
QUOTE(elmar3rd @ Feb 28 2004, 02:38 PM)
Can ranked references be considered as transparent?
Yes, they can.
But I can't accept all ranked references, or I would be acepting files where the participant just randomly moved some sliders and sent me the results (like a very famous result
set I received in the 64kbps test)
That's why I'm going to use ABX results to decide what will be used and what will be dropped.
i sent in my results now (i hope i dont have too much ranked references, its hard to avoid with encrypted results

)
guruboolez
Feb 28 2004, 12:41
I'm really impatient to see my results, and the possible mistakes I did
QUOTE(rjamorim @ Feb 28 2004, 07:41 PM)
That's why I'm going to use ABX results to decide what will be used and what will be dropped.
What happens in this case:
I successfully abxed (p < 0.01) but afterwards I'm exausted and can't hear the difference anymore reliably and rank the reference - Will this cout as 5.0?
rjamorim
Feb 28 2004, 18:32
QUOTE(tigre @ Feb 28 2004, 06:24 PM)
I successfully abxed (p < 0.01) but afterwards I'm exausted and can't hear the difference anymore reliably and rank the reference - Will this cout as 5.0?
Yes. I discussed the subject with Garf and ff123 (that are the two other Hydrogenaudio members with experience in public listening tests), and they agree that scoring 5 for ranked references successfully ABXd is OK.
Dologan
Feb 29 2004, 00:21
Hmm... strikes me as a little unfair. If someone went through the hassle of ABXing all the way to p < 0.01, a mere slip or lapse of concentration may make all that effort worthless, since s/he might as well ranked it 5.0 right away with the first difficult sample and not ABXed at all. If I were in the situation of having to decide, I would 'switch' the rank of the reference to the sample, provided that the (misplaced) rank is above 4.0 (or maybe even a little higher).
An encrypted ABX result of p < 0.01 is a very strong evidence that the person IS detecting a difference and therefore, you are discarding valuable and valid information by assigning a 5.0 to an obvious mistake in the ranking. In the case of subtle, neutral artifacts (which would be expected to be ranked at >4.0), it would not be too difficult for a fatigued or distracted person to confound the encoding with the original, considering that none sounds annoying to him/her. Ranking a reference as already annoying, however, well...

... we simply can't take the person's word for it and hence should be taken as 5.0, even in view of a successful ABX.
Continuum
Feb 29 2004, 01:08
The best solution IMHO would be, to reduce the two sliders to one after a significant ABX-result.
well what if i rank the reference without abxing?
i mean i didnt do abx on all samples or till a result < 0.01 (yes i am lazy) when i thought i hear a difference right away (well who knows if i was right with these)
i mean i would think it would be ok to use these ranked references as 5.0 too (i think most people are reliable here to not send in randomly choosen results, like me

)
robUx4
Feb 29 2004, 10:14
I'm conducting the test right now, but sometimes the applet crashes
schnofler
Feb 29 2004, 10:39
QUOTE(robUx4 @ Feb 29 2004, 08:14 AM)
I'm conducting the test right now, but sometimes the applet crashes

Well, if you want me to do something about it, you have to be a bit more precise in your error description... Either tell me how to reproduce the crash or start the application from a console and send me the error message. Thanks.
QUOTE(schnofler @ Feb 29 2004, 06:39 PM)
QUOTE(robUx4 @ Feb 29 2004, 08:14 AM)
I'm conducting the test right now, but sometimes the applet crashes

Well, if you want me to do something about it, you have to be a bit more precise in your error description... Either tell me how to reproduce the crash or start the application from a console and send me the error message. Thanks.
Crashed here too occasionally. Since I wanted to finish the test without unnecessary repetitions I didn't try to reproduce the problem. I'll try to describe as good as I can though:
- Crash means that it becomes unresponsive, e.g. ABX window can't be closed anymore, when moving another window in foreground, things disappear, only possibility to close it down is Task manager (Win2ksp4 here).
- These crashs happened when I did too much with the mouse, e.g. changing playback range before playback is finished (normally playback just stops, but sometimes crashs happen) or when changing between A,B and X very often during short time (Fast switching disabled here).
I can try to reproduce the problem if you tell what exactly "start the application from a console" means - and what I have to do.
BTW: Results sent. Was a hard piece of work...
schnofler
Feb 29 2004, 12:14
QUOTE(tigre)
I can try to reproduce the problem if you tell what exactly "start the application from a console" means - and what I have to do.
By this I mean opening a command prompt and using "java -jar abchr.jar" (in the directory where you unzipped the files) to start the application, instead of just clicking on abchr.jar in Explorer. If there's a fatal crash, Java will print debug information which can be useful to find the error.
Won't be necessary in this case, however. I'm pretty sure what the cause of these effects is. Thanks for your description, I'll try to fix it in the next version.
AstralStorm
Feb 29 2004, 12:30
It may be the same crash, but well...
ABC-HR Java 0.4b3 SE crashes when I press stop at the end of the file (less than buffer length).
It bugs out most often with buffer length set to 2000ms.
Gentoo Linux post-2004.0, built with 2.6.3 headers and New Posix Thread Library.
Kernel 2.6.3-mm4
Terratec Aureon 7.1 on ALSA from the kernel, OSS emulation on (1.0.2c?)
ALSA-lib/utils - 1.0.2
No esound/arts/anything.
java version "1.4.1"
Java 2 Runtime Environment, Standard Edition (build Blackdown-1.4.1-01)
Java HotSpot Client VM (build Blackdown-1.4.1-01, mixed mode)
<edit>
More info: No loop, No fast switching
No text printed to std{out,err} on crash.
robUx4
Feb 29 2004, 12:53
QUOTE(schnofler @ Feb 29 2004, 07:14 PM)
QUOTE(tigre)
I can try to reproduce the problem if you tell what exactly "start the application from a console" means - and what I have to do.
By this I mean opening a command prompt and using "java -jar abchr.jar" (in the directory where you unzipped the files) to start the application, instead of just clicking on abchr.jar in Explorer. If there's a fatal crash, Java will print debug information which can be useful to find the error.
Won't be necessary in this case, however. I'm pretty sure what the cause of these effects is. Thanks for your description, I'll try to fix it in the next version.
I also got the same exact problem. I'd just like to add that I use the Loop a lot... Other than that, very nice peace of work
rjamorim
Feb 29 2004, 13:46
About considering ranked references as 5.0, as the ranked score, or drop them...
Here is the rationale Garf sent me:
QUOTE
Ranking a reference means that you underestimated the codecs performance on that sample.
Consider the following: you ABX it 110/200. That's a significant result, meaning you hear a difference. But you can't really tell the encoded one from the original, can you? Certainly reasonable to give 5.0 then.
Nice paradox

Here is the one Schnofler sent me:
QUOTE
Intuitively, I would either count the rating of the reference as the encoded sample's rating (the argument being that choosing sliders is just one more ABX trial) or discard the rating altogether (this being the conservative tactic, because the first method might be regarded sleazy). But as the listener obviously didn't intend to rate the sample as transparent and furthermore proved that it is in fact not transparent to him, your solution isn't immediately clear to me.
As you can see, two very good rationales, and both conflict.
What I will probably do is: at first consider only the "clean" results. If there are enough of these, the official test results will come from them. If not, I'll throw in the ranked references. I guess that's the best way to make everyone happy - otherwise, there would be no test results.
Now, the big question is, if I use the ranked references, should I use the ranked score, or grant a 5.0 score to them?
Please discuss.
Regards;
Roberto.
AstralStorm
Feb 29 2004, 14:03
My method:
If there's an ABX result with (number of trials)+1 on the margin (<0.1 but >0.05), treat the result as 5.0.
Else throw out the ranking.
Roberto, when will the test end exactly? Do you have enough results?
elmar3rd
Feb 29 2004, 14:06
QUOTE(rjamorim @ Feb 29 2004, 07:46 PM)
Now, the big question is, if I use the ranked references, should I use the ranked score, or grant a 5.0 score to them?
Assuming the ranked references are spread equally over the samples, this will of course affect the ratings, but also equally.
So maybe it's unimportant, how to rate.
What are the exact values of the ratings good for? The ratings of different listening-tests can't be directly compared (with a very bad anchor you tend to rate the other samples higher).
This is a "lo-fi" version of our main content. To view the full version with more information, formatting and images, please
click here.