It would be useful to analyze each sample separately in order to find out what kind of problems the testers noticed and how severe they are. The discussion would help to understand the test results and probably also help the codec developers in their work. This has not been done before, but I think the outcome would be valuable.
Some testers added comments to the result files. Those comments are useful if the tester intends to revisit a saved session later. Unfortunately the comments in the result files are quite hidden and they cannot be easily evaluated and compared. That's why I didn't add comments to my results (expect the some unfinished, partially wrong comments in one of my first result files - I meant to delete them, but I forgot to do that.)
This thread is for the Sample #1. Please try to keep the discussion on topic. If you want to discuss about any other sample feel free to start a new thread for it. I am hoping that eventually we'll have 14 separate threads - one for each sample. I'll add them by myself if others have not done that before me.
Sample #1 - finalfantasy
The overall results:

The results from the individual testers:

I sorted the testers so that the most critical tester is the first on the left.
Since Sebastian already removed the test sample links I uploaded the first two sample packages to RapidShare so that anyone can listen to the actual samples: http://rapidshare.com/files/167567675/Samples_01_and_02.zip (7.5 MB)
I'll check my results and relisten to the samples later today. I'll post my personal comments after that.