rjamorim
May 23 2004, 23:33
Hello.
I'd like to announce the results of the Multiformat at 128kbps listening test
Vorbis aoTuV is tied to Musepack at first place, Lame MP3 is tied to iTunes AAC at second place, WMA Standard is in third place and Atrac3 gets last place.
The results page is here:
http://www.rjamorim.com/test/multiformat128/results.htmlFor those in a hurry, here are the zoomed overall results:

Big thanks to everyone that helped and participated.
Best regards;
Roberto.
magic75
May 23 2004, 23:45
Now that was a surprise... Lame as good as AAC??? Anyone expected that?
ScorLibran
May 23 2004, 23:55
Vorbis (aoTuV) and MPC tied for first place. LAME and iTunes tied for second. Then WMA-S in third, and ATRAC3 at the back of the pack.
Funny that there was no real consistency this time across music types with the formats tested. Tends to oppose theories about certain formats excelling with certain types of music. At least among these samples.
What the!

... surprised!
QuantumKnot
May 24 2004, 00:01
Whoa, look at aoTuV!!
It is now as good as MPC. Very good work, Aoyumi. Vorbis is now back in the spotlight.
harashin
May 24 2004, 00:07
I believed Musepack would win the test especially such bitrate range(-q4.15). Anyway, it's very interesting result, good job Roberto and all participants.
guruboolez
May 24 2004, 00:07
Surprisingly, MPC 1.14 (same tested last year) isn't tied anymore with iTunes AAC, but “win”.
ATRAC3 (minidisc) is obviously a poor encoding solution.
aoTuV is without doubt a great step behind for Vorbis!
rjamorim
May 24 2004, 00:15
The codes:
1 - Vorbis aoTuV
2 - Musepack
3 - Lame MP3
4 - iTunes AAC
5 - Atrac3
6 - WMA Std.
The decryption key:
http://www.rjamorim.com/test/multiformat12...multiformat.key
Very good results by aoTuV. It seems all the others have a new target for 128kbps quality now.
One thing which this test shows is that VBR coding (aoTuV, MPC) is definitely way to go for 128kbps, and with good enough VBR tweaking it's certainly possible to be clearly better than CBR (iTunes).
rjamorim
May 24 2004, 00:21
QUOTE(JohnV @ May 24 2004, 03:18 AM)
One thing which this test shows is that VBR coding (AoTuV, MPC) is definitely way to go for 128kbps, and with good enough VBR tweaking it's certainly possible to be clearly better than CBR (iTunes).
Yes. That is also true for Lame. With a very good VBR implementation, it got close to the best AAC implementation at that bitrate.
Let's hope Apple implements VBR in their codec, and Ahead improves their implementation considerably.
ScorLibran
May 24 2004, 00:22
QUOTE(harashin @ May 24 2004, 01:07 AM)
I believed Musepack would win the test especially such bitrate range(-q4.15).
I thought so too.
I anticipated a tie between MPC and QT-AAC, then Vorbis in second place, then LAME, then WMA-S and ATRAC at the back. Vorbis and QT-AAC both surprised me.
harashin
May 24 2004, 00:29
My browsers(Firefox, MSIE) don't show test comments correctly. Also, the title of
this page seems to be wrong.
rjamorim
May 24 2004, 00:33
QUOTE(harashin @ May 24 2004, 03:29 AM)
My browsers(Firefox, MSIE) don't show test comments correctly.
It's XML. IE should show something like this:
http://esc17.midphase.com/~calmerc/screenshots/screen-1.jpgXML is worse for readability but easier to be parsed. That's why Schnofler switched to XML results in recent versions of ABC/HR Java.
QUOTE
Also, the title of
this page seems to be wrong.
Fixed. Thanks for reporting.
harashin
May 24 2004, 00:38
QUOTE(rjamorim @ May 24 2004, 03:33 PM)
XML is worse for readability but easier to be parsed. That's why Schnofler switched to XML results in recent versions of ABC/HR Java.
I expected something like in raw *.txt format. Thanks for clarification.
rjamorim
May 24 2004, 00:40
QUOTE(harashin @ May 24 2004, 03:38 AM)
I expected something like in raw *.txt format. Thanks for clarification.
Schnofler already has a converter from xml -> txt in ABC/HR. But it only works for encrypted results ATM. Hopefully he'll add support for already decrypted results.
Wow, what really impresses me is that I don't think there was one sample where the vorbis encoder did poorly. This is a little shocking after last test. Excellent work aoTuV!
Der_Iltis
May 24 2004, 01:32
Surprise surprise!
I hope this'll give vorbis development a new boost.
Gabriel
May 24 2004, 01:34
Oh! Joy!
rjamorim
May 24 2004, 01:38
QUOTE(Gabriel @ May 24 2004, 04:34 AM)
Oh! Joy!
I'm happy my test is spreading happiness.
woow, now thats what i not expected
- vorbis aotuv: vorbis is back, and i am proud to have helped finding out what vorbis encoder should be used

- mpc vs aac: funny that mpc was that better than itunes (with a only 0.15 higher setting than in the last test)
- wma9: lol, worse than mp3! (and i even wonder that it got rated that high, even at 128 it had this metallic sound sometimes) -> go away m$
- atrac3: even worse than wma9 -> go away sony
and if you take this test as a comparison between some online music stores (itunes vs. wma9 based ones vs. sonys new store) itunes clearly comes out as the winner, leaving wma9 behind by far!
FireStarter
May 24 2004, 01:56
I see there is a very small margin between mpc and aoTuv, how would aoTuv react
in higher bitrates.?
JeanLuc
May 24 2004, 01:59
Very interesting results ...
I think it could be an interesting addition to show the bitrate for each encoder in the specific diagrams for each sample ...
The more I think of it the more impressed I am with the performance of LAME. Very good work Gabriel (and consider changing -V 5 default --athaa-sensitivity).
Raptus
May 24 2004, 02:00
How many results were discarded because of ranked refs?
rjamorim
May 24 2004, 02:06
QUOTE(Raptus @ May 24 2004, 05:00 AM)
How many results were discarded because of ranked refs?
54
Mind you that I didn't discard results that ranked the reference but on that sample pair ABXd the samples to a pval of 0.05 or less.
Some comments:
1. mpc encoded debussy.wav at too low of a bitrate (98 kbit/s), apparently, because multiple people commented on a distorted sound, and its low rating on this sample (3.53) hurt it in comparison with vorbis. Note that problem samples are not synonymous with high bitrate! I would hope Frank could look into what's going on with mpc on this sample.
2. It's not clear that the all of the samples which didn't show significant differences (there were 4) would have benefited much with a larger listener sample size. The Bartok_strings2.wav and OrdinaryWorld.wav samples in particular are pretty evenly rated across the board.
Roberto did a separate analysis omitting these 4 samples and the overall results were very similar to the results with all 18 samples, except that with 18 samples the confidence level increased. So I'd say they helped out, even if individually they didn't show significant differences between codecs.
3. The absolute ratings of iTunes is remarkably stable in the tests it's been featured in (4.39, 4.42, 4.20, and 4.26 on this one), even though the tests are not strictly comparable.
4. MPC should have been expected (and it did appear) to be slightly better this time around than the last multiformat test since its quality setting was tweaked up slightly (from 4 to 4.15).
5. Excellent job on AoTuVb2, Ayumi and everybody else who was involved. Seeing such a high score in the test shouldn't have been a real big surprise since those virtuoso tuning ears were rating the beta2 version at around 4.0 overall.
6. Lame is still improving. Good job Gabriel and [proxima]
ff123
Edit: After checking, I see that MPC's absolute score went down from 4.51 to 4.47, so comment 4 is not consistent with what actually happened. But then again, it's not strictly correct to compare scores on one test with scores on another.
Raptus
May 24 2004, 02:25
QUOTE(rjamorim @ May 24 2004, 12:06 AM)
QUOTE(Raptus @ May 24 2004, 05:00 AM)
How many results were discarded because of ranked refs?
54
Mind you that I didn't discard results that ranked the reference but on that sample pair ABXd the samples to a pval of 0.05 or less.
Ok.
Thats around 15% of the results... And for me it still doesn't feel right to take them as irrelevant for the stats...
What about all the /.ers?
Seems they were just interested in wasting bandwidth after all
Grease
May 24 2004, 02:28
I found my chanchan listening test result wrongly classified as a NewYorkCity result.
-Grease
rjamorim
May 24 2004, 02:32
QUOTE(Raptus @ May 24 2004, 05:25 AM)
What about all the /.ers?
Seems they were just interested in wasting bandwidth after all
More than
500 people downloaded the samples through bittorrent only - not counting HTTP downloads! :B
I won't ever understand these people.
QUOTE(Raptus @ May 24 2004, 12:25 AM)
QUOTE(rjamorim @ May 24 2004, 12:06 AM)
QUOTE(Raptus @ May 24 2004, 05:00 AM)
How many results were discarded because of ranked refs?
54
Mind you that I didn't discard results that ranked the reference but on that sample pair ABXd the samples to a pval of 0.05 or less.
Ok.
Thats around 15% of the results... And for me it still doesn't feel right to take them as irrelevant for the stats...
What about all the /.ers?
Seems they were just interested in wasting bandwidth after all
Some results with ranked refs are worse than others. Roberto showed me results from one person whose listening results I wouldn't trust at all, they were so bad (meaning lots of ranked refs).
There is always a question about how these results should be treated, and there are probably multiple ways of handling them. The fairest and simplest way seems to be to just throw them away if you have enough results that you can afford to do that, which in this case seems to be true.
ff123
rjamorim
May 24 2004, 02:33
QUOTE(Grease @ May 24 2004, 05:28 AM)
I found my chanchan listening test result wrongly classified as a NewYorkCity result.
No worries, that classification happened while uploading. I'll move it back to the correct folder later.
guruboolez
May 24 2004, 02:37
QUOTE(ff123 @ May 24 2004, 09:21 AM)
1. mpc encoded debussy.wav at too low of a bitrate (98 kbit/s), apparently, because multiple people commented on a distorted sound, and its low rating on this sample (3.53) hurt it in comparison with vorbis. Note that problem samples are not synonymous with high bitrate! I would hope Frank could look into what's going on with mpc on this sample.
The problem seems to be low-volume. MPC --radio have some troubles with low-volume sample, especially when there's a slight amout of noise. Debussy.wav is just an exemple amoung hundred of this problem.
Problem is shoking if playback volume is exceptionnaly high, but is probably less annoying on normal playback conditions (which explain maybe the overall relative good notation of the encoding - I expected to be lower).
Note that standard preset also suffers from this problem, but it's less critical...
Wow. That is interesting. LAME with the --athaa-sensitivity switch and aoTuv being that strong.
Thanks to all participating and - of course - to all these great codec developers and exspecially to Roberto himself!!!
This particular test should be called, "The 128 kbps test for iTunes/WMA, and the low-130 test for AC3 and LAME, and the close-to-160 test for MPC/Vorbious.
Leahy iTunes MPC Vorbis Lame WMA Atrac3
bitrate 128 155 149 133 128 132
Score 4.34 4.41 4.68 4.11 4.37 3.76
I am aware of the rationalization. I am aware of the overall average. But let this be a n"oh?" to those that don't and aren't.
AC3??? Vorbious???
Get some sleep.
maybe it would make sence to rename "iTunes" to "iTunes AAC" in the summary chart, so that people do not mistake the iTunes result with its lousy mp3-encoder?
- Lyx
lame's result is fairly amazing. I was about to begin encoding my cd collection into iTunes aac for an iPod im about to purchase. I think ill just stick with lame now. It's level of quality combined with it's compatiblity between mp3 players is an unbeatable combination.
rjamorim, can you plz make a zoomed "music store codecs only" chart too (aac, wma9, atrac3), i think it would be very interesting and important to have such a chart handy for showing people that when they have to choose where they should buy songs from, that not only the prices, but also the quality is very important and varries a lot

btw did i already thank you for your great test? thanks a lot!

QUOTE(Lyx @ May 24 2004, 11:52 AM)
maybe it would make sence to rename "iTunes" to "iTunes AAC" in the summary chart, so that people do not mistake the iTunes result with its lousy mp3-encoder?
yepa and maybe add "mp3" to lame too, (and maybe ogg to vorbis) at least in the final chart to exclude all possible misunderstandings
QuantumKnot
May 24 2004, 05:11
A big thank you to Roberto for his efforts in conducting this test. Let's hope that it is not the last too
SebastianG
May 24 2004, 05:28
QUOTE(XXX @ May 24 2004, 02:12 AM)
This particular test should be called, "The 128 kbps test for iTunes/WMA, and the low-130 test for Atrac3 and LAME, and the close-to-160 test for MPC/Vorbis.
Yup, it's hard to compare CBR encoders with VBR encoders.
Everything you do is wrong

Usually all encders tend to produce files at around 128 kbps on an "average" sound file with the same settings. That's why I think it's ok to compare these codecs with these settings. Many test samples were chosen to be hard-to-encode (weren't they?). VBR encoders use higher bitrates in those complex situations. CBR encoders don't.
Bad Luck for the CBR encoders.
So... you can ask yourself: Is the choice of test samples fair ?
I don't know...
bye,
Sebastian
JeanLuc
May 24 2004, 05:37
QUOTE(XXX @ May 24 2004, 10:12 AM)
This particular test should be called, "The 128 kbps test for iTunes/WMA, and the low-130 test for AC3 and LAME, and the close-to-160 test for MPC/Vorbious.
Leahy iTunes MPC Vorbis Lame WMA Atrac3
bitrate 128 155 149 133 128 132
Score 4.34 4.41 4.68 4.11 4.37 3.76
I am aware of the rationalization. I am aware of the overall average. But let this be a n"oh?" to those that don't and aren't.
That's why I suggested to put the bitrates into the score graphs for each sample ... so everyone can see at which average bitrate the codec's result has been obtained.
QUOTE(cuan @ May 24 2004, 02:01 PM)
lame's result is fairly amazing. I was about to begin encoding my cd collection into iTunes aac for an iPod im about to purchase. I think ill just stick with lame now. It's level of quality combined with it's compatiblity between mp3 players is an unbeatable combination.
I suggest you do also your own tests concentrading for example on pre-echo etc. (I'm not saying that either one is better, I have not compared LAME 3.96 -V5 --athaa-sensitivity 1 against iTunes 4.2 with pre-echo).
Remember however that these are average results of a group with restricted amount of samples and listeners with different abilities. It shows pretty well the quality on average, but doesn't necessarely show some of the details which might be interesting for you.
Also I think that Lame 3.96 -V5 --athaa-sensitivity 1 is not tested enough to say it doesn't fail (badly) in certain cases even pretty often. Imo iTunes 4.2 AAC in this sense is more safe.
But, if it's not so big deal, that Lame setting does seem on average pretty good.
QUOTE(QuantumKnot @ May 24 2004, 12:11 PM)
A big thank you to Roberto for his efforts in conducting this test. Let's hope that it is not the last too
second the thanks to Roberto and everyone elso involved (including all the testers).
Roberto: come on, be honest, you would realy miss all the hick-hack and nag-nag going hand in hand with the tests, wouldn't you
diskvask
May 24 2004, 06:04
QUOTE(rjamorim @ May 24 2004, 09:32 AM)
QUOTE(Raptus @ May 24 2004, 05:25 AM)
What about all the /.ers? :rolleyes:
Seems they were just interested in wasting bandwidth after all :lol:
More than
500 people downloaded the samples through bittorrent only - not counting HTTP downloads! :B
I won't ever understand these people. :frustrated:
I think a lot of people thought that the test was going to be very easy (me included), "Come on, it's 128kbit! That sounds like crap, everybody knows that.".
...only to find out that there couldn't be found any major imperfections in the couple of samples tried. Sample 1 looks like it was one of the hardest ones to abx; very tough start, especially for someone who had set his mind on the assumtion above.
And besides, abx is an exhausting way of testing and it can be very frustrating/unmotivating if you don't get the results you're expecting ;).
QUOTE(bond @ May 23 2004, 11:50 PM)
woow, now thats what i not expected
- wma9: lol, worse than mp3! (and i even wonder that it got rated that high, even at 128 it had this metallic sound sometimes) -> go away m$
it's a pitty that wma9
Pro was included in the test

...last test it was included it performed quite well
QUOTE(Jojo @ May 24 2004, 03:07 PM)
QUOTE(bond @ May 23 2004, 11:50 PM)
woow, now thats what i not expected
- wma9: lol, worse than mp3! (and i even wonder that it got rated that high, even at 128 it had this metallic sound sometimes) -> go away m$
it's a pitty that wma9
Pro was included in the test

...last test it was included it performed quite well
Answer why wma9 pro was not included is here:
http://www.hydrogenaudio.org/forums/index....ndpost&p=199103
QUOTE(XXX @ May 24 2004, 11:12 AM)
This particular test should be called, "The 128 kbps test for iTunes/WMA, and the low-130 test for AC3 and LAME, and the close-to-160 test for MPC/Vorbious.
Leahy iTunes MPC Vorbis Lame WMA Atrac3
bitrate 128 155 149 133 128 132
Score 4.34 4.41 4.68 4.11 4.37 3.76
I am aware of the rationalization. I am aware of the overall average. But let this be a n"oh?" to those that don't and aren't.
Where did you get those numbers from?
@ Roberto
A big thanks for making this test possible. I hope you reconsider making more tests in the future.
About the test results, I noticed that for some samples there are no confidence intervals on the graphs (bartok_strings, leahy, mahler, ordinary world). Did everybody score exactly the same on these samples, or maybe you just forgot to put the intervals on the graphs?
QUOTE(XXX @ May 24 2004, 01:12 PM)
This particular test should be called, "The 128 kbps test for iTunes/WMA, and the low-130 test for AC3 and LAME, and the close-to-160 test for MPC/Vorbious.
Leahy iTunes MPC Vorbis Lame WMA Atrac3
bitrate 128 155 149 133 128 132
Score 4.34 4.41 4.68 4.11 4.37 3.76
I am aware of the rationalization. I am aware of the overall average. But let this be a n"oh?" to those that don't and aren't.
See here how the average bitrates were decided for this test (personally I'm not absolutely sure if it was enough). Obviously those settings in the table close to 128 were used:
http://www.hydrogenaudio.org/forums/index....ndpost&p=207203Also the correct average bitrates for the 18 samples tested are (instead of what you said):
CODE
iTunes MPC aoTuV Lame WMA Atrac3
128 136 135 134 128 132
guruboolez
May 24 2004, 06:54
Roberto> what software did you used to obtain wma9 files? Is it VBR-2 pass 128 kbps? What decoder? I've tried to reproduce the same wavform with different settings, and I wasn't able to do it.
This is a "lo-fi" version of our main content. To view the full version with more information, formatting and images, please
click here.