AAC @ 128kbps listening test discussion |
![]() ![]() |
AAC @ 128kbps listening test discussion |
Feb 29 2004, 21:24
Post
#251
|
|
|
Group: Members Posts: 881 Joined: 11-October 02 Member No.: 3523 |
QUOTE (rjamorim @ Feb 29 2004, 08:46 PM) Now, the big question is, if I use the ranked references, should I use the ranked score, or grant a 5.0 score to them? Please discuss. anyone knowing marx' dialectic? here is my approach using this method to find a synthesis between garfs and schnoflers theses imo in the case of a ranked reference there a two possibilities why the user voted this way: he thought to hear a difference which 1) simply wasnt there 2) was there and he liked the enode quality better than the source (for whatever reason) ad 1) this could be caused by a mistake (as i understand schnofler's thesis) but i doubt that anyone does the final ranking as 1 more abxing without double checking that his final vote is correct (at least i wouldnt act this way) so to say it can be divided into serious testers and not serious testers: for people who serioulsy attend this test such "failures" shouldnt happen normally (also considering point 2 i wouldnt discard the results from these) results of people who didnt seriously attend the test and voted in a hurry could be discarded (for example if there are too many (over the average) ranked sources in the results aso...) ad 2) well thats how garfs thesis can be understood in that way the voting would look that way: source: 5 encode: a score higher than 5 as the later isnt possible, a vote of 5 is ok for the encode, when the source was voted worse to sum it up/the synthesis: ranked sources from not serious testers (which look like the person voted anything, too many (over the average) ranked sources aso...) can be discarded all others should be used and used with score 5.0 -------------------- I know, that I know nothing (Socrates)
|
|
|
|
Feb 29 2004, 21:28
Post
#252
|
|
![]() Server Admin Group: Admin Posts: 4808 Joined: 24-September 01 Member No.: 13 |
You forgot the possibility:
a) User could hear a difference, ABX it, but it was rather small and he missed the right slider |
|
|
|
Feb 29 2004, 21:35
Post
#253
|
|
|
Group: Members Posts: 881 Joined: 11-October 02 Member No.: 3523 |
QUOTE (Garf @ Feb 29 2004, 09:28 PM) You forgot the possibility: a) User could hear a difference, ABX it, but it was rather small and he missed the right slider thats point 1) voting by mistake shouldnt happen (even without abxing), in fact abx helps to avoid these mistakes, i mean if someone handles to really abx the sample with a high propability i doubt that on the final vote he will suddenly make a mistake -------------------- I know, that I know nothing (Socrates)
|
|
|
|
Feb 29 2004, 21:37
Post
#254
|
|
![]() Server Admin Group: Admin Posts: 4808 Joined: 24-September 01 Member No.: 13 |
No, you are completely wrong, see my earlier example.
You can ABX 110/200, which is significant, but your chance of pulling the correct slider is only 55%. |
|
|
|
Feb 29 2004, 21:37
Post
#255
|
|
![]() ABC/HR developer, ff123.net admin Group: Developer (Donating) Posts: 1396 Joined: 24-September 01 Member No.: 12 |
I think a kind of matrix can be constructed, with some of the options and reactions:
Options 1a. Don't allow any ranked references 1b. Allow only 1 ranked reference 1c. Allow multiple ranked references 2a. Ranked reference must be accompanied by ABX to 95% confidence 2b. Ranked reference does not need to be accompanied by ABX results 3a. Score of ranked reference is not lower than another properly-ranked codec 3b. Score of ranked reference is allowed to be lower than another properly-ranked codec Reactions A. Entire file is thrown out B. Score of codec with ranked reference is changed to 5.0 C. Score of codec with ranked reference is given the listener rating Obviously, the most conservative approach is 1a + A Here's how I might order the choices, from most conservative to less so: CODE Options Reaction 1a A 1b, 2a, 3a B 1b, 2a, 3a C 1b, 2a, 3b B 1b, 2a, 3b C 1c, 2a, 3a B 1c, 2a, 3a C 1b, 2b, 3a B 1b, 2b, 3a C ff123 |
|
|
|
Feb 29 2004, 21:37
Post
#256
|
|
![]() Rarewares admin Group: Members Posts: 7515 Joined: 30-September 01 From: Brazil Member No.: 81 |
QUOTE (AstralStorm @ Feb 29 2004, 05:03 PM) Roberto, when will the test end exactly? Do you have enough results? I'll stop accepting results tonight, at midnight brazilian time. I probably have enough results, byt maybe not enough if I dump the ranked references. -------------------- Get up-to-date binaries of Lame, AAC, Vorbis and much more at RareWares:
http://www.rarewares.org |
|
|
|
Feb 29 2004, 21:38
Post
#257
|
|
![]() Group: Members Posts: 715 Joined: 22-April 03 From: /dev/null Member No.: 6130 |
Check my proposal, it should eliminate those 'tiny differences'.
If you can ABX it well and you make a mistake, it shouldn't be counted. If you DIDN'T ABX it (or barely ABXed it), it should be treated as 5.0. - Midnight brazillian? So it is already closed? - What does A option mean? WHOLE results file? I'd consider it if there are >2 ranked references. I'd throw out all results w/o passed ABX in the file then and of course all ranked references. This post has been edited by AstralStorm: Feb 29 2004, 21:46 -------------------- ruxvilti'a
|
|
|
|
Feb 29 2004, 21:40
Post
#258
|
|
|
Group: Members Posts: 881 Joined: 11-October 02 Member No.: 3523 |
the results of abx'es should in no way influence the decision whether to use the ranked sources results or not!!!
noone is forced to do abx, you cant rely on whether someone did abx or not in fact someone can do the whole test without abxing and without being unserious or having anything bad in mind as i proposed unserious testers should be sorted out via the way if there are far over the average ranked sources in the results QUOTE (Garf @ Feb 29 2004, 09:37 PM) No, you are completely wrong, see my earlier example. You can ABX 110/200, which is significant, but your chance of pulling the correct slider is only 55%. well your example is not usable in this case as its unrealistic/only theoretical noone will do 200 abx'es This post has been edited by bond: Feb 29 2004, 21:47 -------------------- I know, that I know nothing (Socrates)
|
|
|
|
Feb 29 2004, 21:50
Post
#259
|
|
![]() Rarewares admin Group: Members Posts: 7515 Joined: 30-September 01 From: Brazil Member No.: 81 |
QUOTE (AstralStorm @ Feb 29 2004, 05:38 PM) Midnight brazillian? So it is already closed? Nope, there are still more than 6 hours to go: http://www.timeanddate.com/worldclock/ -------------------- Get up-to-date binaries of Lame, AAC, Vorbis and much more at RareWares:
http://www.rarewares.org |
|
|
|
Feb 29 2004, 21:54
Post
#260
|
|
![]() Rarewares admin Group: Members Posts: 7515 Joined: 30-September 01 From: Brazil Member No.: 81 |
QUOTE (bond @ Feb 29 2004, 05:40 PM) as i proposed unserious testers should be sorted out via the way if there are far over the average ranked sources in the results That makes no sense. Just because a listener is serious doesn't mean he'll come close to the average results. QUOTE well your example is not usable in this case as its unrealistic/only theoretical noone will do 200 abx'es Garf did almost that once, in the MAD challenge :B This post has been edited by rjamorim: Feb 29 2004, 21:56 -------------------- Get up-to-date binaries of Lame, AAC, Vorbis and much more at RareWares:
http://www.rarewares.org |
|
|
|
Feb 29 2004, 21:55
Post
#261
|
|
![]() ABC/HR developer, ff123.net admin Group: Developer (Donating) Posts: 1396 Joined: 24-September 01 Member No.: 12 |
QUOTE (bond @ Feb 29 2004, 12:40 PM) the results of abx'es should in no way influence the decision whether to use the ranked sources results or not!!! noone is forced to do abx, you cant rely on whether someone did abx or not in fact someone can do the whole test without abxing and without being unserious or having anything bad in mind as i proposed unserious testers should be sorted out via the way if there are far over the average ranked sources in the results In my matrix, this is an option, so it is a matter of deciding (debating) how to order the list from most conservative to least conservative. Another way of stating your proposal would be to come up with a numerical score which says how bad ranking the reference is with respect to the other scores. For example, let's say somebody scores: A = 4.9 ranked reference B = 3.0 C = 2.0 A figure of merit score might be the ratio of the ranked reference to the average of the other scores. First, transform into difference scores: A = -0.1 ranked reference B = -2.0 C = -3.0 then, F.O.M = A / average(B, C) and if this ratio is under some acceptable value, you would either accept the results file as is, or change the rating of codec A to 5.0 This is a mess, isn't it? Much easier just to discard files with ranked references. ff123 |
|
|
|
Feb 29 2004, 21:58
Post
#262
|
|
|
Group: Members Posts: 881 Joined: 11-October 02 Member No.: 3523 |
QUOTE (rjamorim @ Feb 29 2004, 09:54 PM) QUOTE (bond @ Feb 29 2004, 05:40 PM) as i proposed unserious testers should be sorted out via the way if there are far over the average ranked sources in the results That makes no sense. Just because a listener is serious doesn't mean he'll come close to the average results. sure it makes sense, if you see it from that point that you dont want to sort out the serious ones, but the unserious ones (which will surely be far over the average) This post has been edited by bond: Feb 29 2004, 22:02 -------------------- I know, that I know nothing (Socrates)
|
|
|
|
Feb 29 2004, 21:59
Post
#263
|
|
![]() Rarewares admin Group: Members Posts: 7515 Joined: 30-September 01 From: Brazil Member No.: 81 |
QUOTE (ff123 @ Feb 29 2004, 05:55 PM) This is a mess, isn't it? Jesus Christ, it is! -------------------- Get up-to-date binaries of Lame, AAC, Vorbis and much more at RareWares:
http://www.rarewares.org |
|
|
|
Feb 29 2004, 22:03
Post
#264
|
|
|
Group: Members Posts: 881 Joined: 11-October 02 Member No.: 3523 |
QUOTE (ff123 @ Feb 29 2004, 09:55 PM) This is a mess, isn't it? Much easier just to discard files with ranked references. first of all i assume that we dont have that much results, we can discard many, only because there are some ranked references second your calculations show you are a developer (this isnt meant in a bad way my proposal: i would say the average user has 1 ranked reference per sample if someone has an average (over all samples) of 2,5 he is out, all others are in with ranked sources scored as 5.0 (of course the reality can be different, but rjamorim will soon find this out) easy and clean solution with no mess third everyone who does this hard test and than gets discarded because he did ranked sources will feel pissed off This post has been edited by bond: Feb 29 2004, 22:06 -------------------- I know, that I know nothing (Socrates)
|
|
|
|
Feb 29 2004, 22:11
Post
#265
|
|
![]() Rarewares admin Group: Members Posts: 7515 Joined: 30-September 01 From: Brazil Member No.: 81 |
QUOTE if someone has an average (over all samples) of 2,5 he is out, all others are in with ranked sources scored as 5.0 (of course the reality can be different, but rjamorim will soon find this out) First, it was never in the plans to drop all of a listener's results because he ranked part of them. Unless something very creepy is going on (check the results package I linked earlier), even if a guy got 11 ranked references, the clean result will stay. Also, what's with the 2,5? -------------------- Get up-to-date binaries of Lame, AAC, Vorbis and much more at RareWares:
http://www.rarewares.org |
|
|
|
Feb 29 2004, 22:17
Post
#266
|
|
![]() ABC/HR developer, ff123.net admin Group: Developer (Donating) Posts: 1396 Joined: 24-September 01 Member No.: 12 |
QUOTE (AstralStorm @ Feb 29 2004, 12:38 PM) What does A option mean? WHOLE results file? I'd consider it if there are >2 ranked references. I'd throw out all results w/o passed ABX in the file then and of course all ranked references. A means throw out the whole file, which in the past was done if any reference was ranked. If you don't throw out the whole file, then you have the option of changing the scores of the ranked references to 5.0 or keeping the score (but assigning it to the codec instead of to the reference, of course). It isn't possible (at least with my statistics software) to only throw out part of a file. ff123 |
|
|
|
Feb 29 2004, 22:18
Post
#267
|
|
|
Group: Members Posts: 881 Joined: 11-October 02 Member No.: 3523 |
QUOTE (rjamorim @ Feb 29 2004, 10:11 PM) First, it was never in the plans to drop all of a listener's results because he ranked part of them. Unless something very creepy is going on (check the results package I linked earlier), even if a guy got 11 ranked references, the clean result will stay. Also, what's with the 2,5? 2.5 ranked references per sample (or a similar value) can be used as indication as unserious testing, meaning the whole tester is out all other ranked references are considered as from serious testers and will not be dropped and given a 5.0 score as garf proposed thats my proposal, but it doesnt seem to have much friends anyways so do as you like -------------------- I know, that I know nothing (Socrates)
|
|
|
|
Feb 29 2004, 22:21
Post
#268
|
|
![]() Rarewares admin Group: Members Posts: 7515 Joined: 30-September 01 From: Brazil Member No.: 81 |
QUOTE (bond @ Feb 29 2004, 06:18 PM) 2.5 ranked references per sample (or a similar value) Another problem introduced by this is: where to draw the line? -------------------- Get up-to-date binaries of Lame, AAC, Vorbis and much more at RareWares:
http://www.rarewares.org |
|
|
|
Feb 29 2004, 22:24
Post
#269
|
|
|
Group: Members Posts: 881 Joined: 11-October 02 Member No.: 3523 |
QUOTE (rjamorim @ Feb 29 2004, 10:21 PM) QUOTE (bond @ Feb 29 2004, 06:18 PM) 2.5 ranked references per sample (or a similar value) Another problem introduced by this is: where to draw the line? calculate the average (i guessed its 1 ranked reference per sample) and add 1.5 also someone can say that a user has a 50% chance to vote the reference, which is equal to 2.5 votes per sample, over this 50% is too much bad luck for a serious tester This post has been edited by bond: Feb 29 2004, 22:27 -------------------- I know, that I know nothing (Socrates)
|
|
|
|
Feb 29 2004, 22:25
Post
#270
|
|
![]() Group: Members Posts: 144 Joined: 5-May 02 Member No.: 1974 |
QUOTE (rjamorim @ Feb 29 2004, 09:21 PM) Another problem introduced by this is: where to draw the line? Maybe those, who rated the reference below 4.5 or 4.0? |
|
|
|
Feb 29 2004, 22:37
Post
#271
|
|
![]() ABC/HR developer, ff123.net admin Group: Developer (Donating) Posts: 1396 Joined: 24-September 01 Member No.: 12 |
QUOTE (elmar3rd @ Feb 29 2004, 01:25 PM) QUOTE (rjamorim @ Feb 29 2004, 09:21 PM) Another problem introduced by this is: where to draw the line? Maybe those, who rated the reference below 4.5 or 4.0? bond and Roberto were referring to how many ranked references should be acceptable. bond's proposal is to actually look at the data and find out how many references are ranked on average in those results files where it occurs. Then add 1.5 to that number to determine where to draw the line of how many ranked references are acceptable. Over that line and the entire file is thrown out. Under that line, and the scores of the ranked reference codecs are changed to 5.0. This is a reasonable and conservative proposal. ff123 |
|
|
|
Feb 29 2004, 22:47
Post
#272
|
|
![]() Group: Members Posts: 715 Joined: 22-April 03 From: /dev/null Member No.: 6130 |
Not a bad thing to do.
I'm for this proposal, it should filter out people trying to make results a white noise. -------------------- ruxvilti'a
|
|
|
|
Feb 29 2004, 23:10
Post
#273
|
|
|
Java ABC/HR developer Group: Developer Posts: 175 Joined: 17-September 03 Member No.: 8879 |
I still think that the existence of a successful ABX test is a much better indication of seriousness. Of course, as bond noted, a listener can be serious without doing any ABX tests. But if he has done some, then we can be *certain* that he was indeed serious about it.
Just to clarify, my thoughts, which Roberto posted above, were only about ranked references with successful ABX tests. If there is no ABX test or a failed ABX, I think the file should be discarded, since for all we know the listener just played around with the sliders (see the results package Roberto posted earlier). If there is a successful ABX result, I personally don't find it necesary to discard the results, especially if there are not enough results for such luxury. It's unfortunate that we can't just throw out the ranked reference and still consider the other rankings. On the other hand, it's obvious that it might cast a shadow on the professionalism of the test if we consider the rating of the reference for the encoded sample. In my personal opinion, I wouldn't mind using this method if the listener obviously made an effort, but I can see that it might not be in the best interest of the test as a serious reference later on. That considered, it might be best to count a ranked reference with a valid ABX result as 5.0 (contrary to my earlier thoughts). Furthermore, I guess Roberto will as usual publish the results files, so if anyone is interested in what the results would have looked like when calculated with a laxer method, we could still do this after the test ("inofficially" so to say). This post has been edited by schnofler: Feb 29 2004, 23:12 |
|
|
|
Feb 29 2004, 23:22
Post
#274
|
|
|
Group: Members Posts: 881 Joined: 11-October 02 Member No.: 3523 |
QUOTE (schnofler @ Feb 29 2004, 11:10 PM) If there is no ABX test or a failed ABX, I think the file should be discarded, since for all we know the listener just played around with the sliders i think our different opinions on that maybe simply depend on the personal way of doing the test? (as i said before i dont think that the relationship pointed out by schnofler is that clear at all) anyways i think i made my point clear on how to detect unserious testers on a, imho, clearer way than via abx results -------------------- I know, that I know nothing (Socrates)
|
|
|
|
Feb 29 2004, 23:57
Post
#275
|
|
|
Java ABC/HR developer Group: Developer Posts: 175 Joined: 17-September 03 Member No.: 8879 |
QUOTE i think our different opinions on that maybe simply depend on the personal way of doing the test? It is indeed the case that I never rank a sample without an ABX unless the difference is extremely obvious to me. However, I prefer to see it the other way round: my preference on how to do the test stems from my opinion on what should be considered serious, not vice versa. Anyway, please don't take this personal. As I said above I personally don't have any problems considering results as valid if it is somehow clear to me that the listener was serious (and yes, simply because you participate in this discussion, I trust you on your seriousness). The problem is how to make this reasonable for others. We have to choose a definite, easily justifiable way of doing this, so the results of this test can be used as a serious reference. A successful ABX test is, in my opinion, a very strong sign that the listener did indeed put a considerable effort into this. The proposal you support just doesn't seem as strong to me. Just to clear things up, I think there is a bit of misunderstanding about this proposal. If I understood you correctly (please correct me if I'm wrong), you want to look at all the results files from a certain listener, count the ranked references, and calculate the average of ranked references per file. If it's above 2.5 throw out the whole set of files from this listener. But from this QUOTE (ff123) Over that line and the entire file is thrown out. I understood that ff123 wants to decide this on a file-by-file-basis (again, please correct me if it's a misunderstanding). In any case, if you move sliders randomly you still have a 50% chance that your results (single files or the whole set) will be accepted, which is far too high in my opinion. If it is done that way, I would draw the line much lower (e.g. 1 ranked reference on average). I still like using ABX results better, because they actually give you evidence of the listener's efforts, while the other proposal just aims at making an educated guess. |
|
|
|
![]() ![]() |
|
Lo-Fi Version | Time is now: 23rd May 2013 - 19:17 |