IPB

Welcome Guest ( Log In | Register )

Sample specific discussions: sample #2, Public MP3 listening test @ 128 kbps
Alex B
post Nov 26 2008, 14:45
Post #1





Group: Members
Posts: 1303
Joined: 14-September 05
From: Helsinki, Finland
Member No.: 24472



As I said in the test results thread I would like to discuss about each sample separately. The overall results were tied and all encoders seem to be equal. In closer inspection it seems that each encoder had problems with at least some samples.

It would be useful to analyze each sample separately in order to find out what kind of problems the testers noticed and how severe they are. The discussion would help to understand the test results and probably also help the codec developers in their work. This has not been done before, but I think the outcome would be valuable.

Some testers added comments to the result files. Those comments are useful if the tester intends to revisit a saved session later. Unfortunately the comments in the result files are quite hidden and they cannot be easily evaluated and compared. That's why I didn't add comments to my results (expect the some unfinished, partially wrong comments in one of my first result files - I meant to delete them, but I forgot to do that.)

This thread is for the Sample #2. Please try to keep the discussion on topic. If you want to discuss about any other sample feel free to start a new thread for it. I am hoping that eventually we'll have 14 separate threads - one for each sample. I'll add them by myself if others have not done that before me.


Sample #2 - Vangelis_Chariots_of_Fire

The overall results:



The results from the individual testers:



I sorted the testers so that the most critical tester is the first on the left.

Since Sebastian already removed the test sample links I uploaded the first two sample packages to RapidShare so that anyone can listen to the actual samples: http://rapidshare.com/files/167567675/Samples_01_and_02.zip (7.5 MB)


I'll check my results and relisten to the samples later today. I'll post my personal comments after that.

This post has been edited by Alex B: Nov 26 2008, 15:11


--------------------
http://listening-tests.freetzi.com
Go to the top of the page
+Quote Post
 
Start new topic
Replies
Sebastian Mares
post Nov 26 2008, 21:18
Post #2





Group: Members
Posts: 3629
Joined: 14-May 03
From: Bad Herrenalb
Member No.: 6613



The problem is that as long as the reference isn't ranked, I cannot simply refuse to accept results even if the low anchor is rated higher than a contender. Otherwise people might blame me for selecting only the results I like. This would be fatal for me in an AAC test for example because there were already persons who told me that I am biased since working for Nero.
The only thing I did was to discard all results if a user had a very high number of results with ranked references (like 9 out of 14). In that case, I contacted the submitter and asked why this happened. Some of the people replied telling me that they simply guessed and after asking them to redo the test with ABX if possible I included the new results only, others were affected by the ABC/HR problem and wrote down the results on paper first without knowing that reloading the configuration files re-randomizes the contenders and others didn't reply at all. However, only a very low number of people were affected (I think a total of maybe 3 submitters).


--------------------
http://listening-tests.hydrogenaudio.org/sebastian/
Go to the top of the page
+Quote Post
Alex B
post Nov 27 2008, 00:42
Post #3





Group: Members
Posts: 1303
Joined: 14-September 05
From: Helsinki, Finland
Member No.: 24472



QUOTE (Sebastian Mares @ Nov 26 2008, 22:18) *
The problem is that as long as the reference isn't ranked, I cannot simply refuse to accept results even if the low anchor is rated higher than a contender...

But I can. smile.gif

I removed the results which I decided to be unsuitable for the purpose of this thread. (They were absolutely valid for the public listening test, but I am trying to get more info about the detected differences between the actual contenders.)



Before:
iTunes: 4.37
LAME 3.98: 4.34
Low anchor: 2.59
FhG: 4.22
LAME 3.97: 3.25
Helix: 4.67
After:
iTunes: 4.19
LAME 3.98: 4.04
Low anchor: 1.55
FhG: 3.99
LAME 3.97: 2.77
Helix: 4.55


This post has been edited by Alex B: Nov 27 2008, 00:57


--------------------
http://listening-tests.freetzi.com
Go to the top of the page
+Quote Post

Posts in this topic


Reply to this topicStart new topic
1 User(s) are reading this topic (1 Guests and 0 Anonymous Users)
0 Members:

 



RSS Lo-Fi Version Time is now: 17th April 2014 - 18:17