What if...
Most samples where rated 5.0 and some testers even ranked the references. A few of the more experienced testers where able to avoid using 5.0 at all.
The criteria for valid results is currently no ranked references. However this causes some tester to be very conservative and rating 5.0 when in doubt.
What if the ranked references where used?
What if only results that had no 5.0 rankings where used?
How would the overall results then look?
In the second line in the table below I have converted references ranked 4.0 and above to 5.0 ratings and included these "invalid" results.
In the third line I have reduced the number of test results to only include results without 5.0 rankings.
iTunes LAME Nero Shine AuTuV WMA pro
Official result 4.74 4.60 4.68 2.35 4.79 4.70 (402)
Ranked references 4.74 4.60 4.70 2.38 4.78 4.72 (464)
No 5.0 ratings 3.90 3.74 3.57 1.51 3.91 3.69 (54)
ANOVA analysis for "No 5.0 ratings":
FRIEDMAN version 1.24 (Jan 17, 2002) http://ff123.net/
Blocked ANOVA analysis
Number of listeners: 54
Critical significance: 0.05
Significance of data: 0.00E+00 (highly significant)
---------------------------------------------------------------
ANOVA Table for Randomized Block Designs Using Ratings
Source of Degrees Sum of Mean
variation of Freedom squares Square F p
Total 323 368.45
Testers (blocks) 53 57.63
Codecs eval'd 5 233.42 46.68 159.82 0.00E+00
Error 265 77.40 0.29
---------------------------------------------------------------
Fisher's protected LSD for ANOVA: 0.205
Means:
AuTuV iTunes LAME WMA-pro Nero Shine
3.91 3.90 3.74 3.69 3.57 1.51
---------------------------- p-value Matrix ---------------------------
iTunes LAME WMA-pro Nero Shine
AuTuV 0.943 0.099 0.042* 0.001* 0.000*
iTunes 0.114 0.049* 0.002* 0.000*
LAME 0.696 0.110 0.000*
WMA-pro 0.227 0.000*
Nero 0.000*
-----------------------------------------------------------------------
AuTuV is better than WMA-pro, Nero, Shine
iTunes is better than WMA-pro, Nero, Shine
LAME is better than Shine
WMA-pro is better than Shine
Nero is better than Shine
Edit: Added ANOVA analysis