Help - Search - Members - Calendar
Full Version: Multiformat@128kbps listening test - FINISHED
Hydrogenaudio Forums > Hydrogenaudio Forum > Listening Tests
Pages: 1, 2, 3, 4, 5, 6
rjamorim
Hello.

I'd like to announce the results of the Multiformat at 128kbps listening test

Vorbis aoTuV is tied to Musepack at first place, Lame MP3 is tied to iTunes AAC at second place, WMA Standard is in third place and Atrac3 gets last place.

The results page is here:
http://www.rjamorim.com/test/multiformat128/results.html

For those in a hurry, here are the zoomed overall results:
user posted image

Big thanks to everyone that helped and participated.

Best regards;
Roberto.
magic75
Now that was a surprise... Lame as good as AAC??? Anyone expected that?
ScorLibran
Vorbis (aoTuV) and MPC tied for first place. LAME and iTunes tied for second. Then WMA-S in third, and ATRAC3 at the back of the pack.

Funny that there was no real consistency this time across music types with the formats tested. Tends to oppose theories about certain formats excelling with certain types of music. At least among these samples.
bidz
What the! huh.gif ... surprised!
QuantumKnot
Whoa, look at aoTuV!! ohmy.gif

It is now as good as MPC. Very good work, Aoyumi. Vorbis is now back in the spotlight. biggrin.gif
harashin
I believed Musepack would win the test especially such bitrate range(-q4.15). Anyway, it's very interesting result, good job Roberto and all participants.
guruboolez
Surprisingly, MPC 1.14 (same tested last year) isn't tied anymore with iTunes AAC, but “win”.
ATRAC3 (minidisc) is obviously a poor encoding solution.
aoTuV is without doubt a great step behind for Vorbis!
rjamorim
The codes:

1 - Vorbis aoTuV
2 - Musepack
3 - Lame MP3
4 - iTunes AAC
5 - Atrac3
6 - WMA Std.

The decryption key:
http://www.rjamorim.com/test/multiformat12...multiformat.key
JohnV
Very good results by aoTuV. It seems all the others have a new target for 128kbps quality now.
One thing which this test shows is that VBR coding (aoTuV, MPC) is definitely way to go for 128kbps, and with good enough VBR tweaking it's certainly possible to be clearly better than CBR (iTunes).
rjamorim
QUOTE(JohnV @ May 24 2004, 03:18 AM)
One thing which this test shows is that VBR coding (AoTuV, MPC) is definitely way to go for 128kbps, and with good enough VBR tweaking it's certainly possible to be clearly better than CBR (iTunes).

Yes. That is also true for Lame. With a very good VBR implementation, it got close to the best AAC implementation at that bitrate.

Let's hope Apple implements VBR in their codec, and Ahead improves their implementation considerably.
ScorLibran
QUOTE(harashin @ May 24 2004, 01:07 AM)
I believed Musepack would win the test especially such bitrate range(-q4.15).

I thought so too.

I anticipated a tie between MPC and QT-AAC, then Vorbis in second place, then LAME, then WMA-S and ATRAC at the back. Vorbis and QT-AAC both surprised me.
harashin
My browsers(Firefox, MSIE) don't show test comments correctly. Also, the title of this page seems to be wrong.
rjamorim
QUOTE(harashin @ May 24 2004, 03:29 AM)
My browsers(Firefox, MSIE) don't show test comments correctly.

It's XML. IE should show something like this:
http://esc17.midphase.com/~calmerc/screenshots/screen-1.jpg

XML is worse for readability but easier to be parsed. That's why Schnofler switched to XML results in recent versions of ABC/HR Java.

QUOTE
Also, the title of this page seems to be wrong.


Fixed. Thanks for reporting.
harashin
QUOTE(rjamorim @ May 24 2004, 03:33 PM)
XML is worse for readability but easier to be parsed. That's why Schnofler switched to XML results in recent versions of ABC/HR Java.

I expected something like in raw *.txt format. Thanks for clarification.
rjamorim
QUOTE(harashin @ May 24 2004, 03:38 AM)
I expected something like in raw *.txt format. Thanks for clarification.

Schnofler already has a converter from xml -> txt in ABC/HR. But it only works for encrypted results ATM. Hopefully he'll add support for already decrypted results.
Bonzi
Wow, what really impresses me is that I don't think there was one sample where the vorbis encoder did poorly. This is a little shocking after last test. Excellent work aoTuV!
Der_Iltis
Surprise surprise!
I hope this'll give vorbis development a new boost.
Gabriel
Oh! Joy!
rjamorim
QUOTE(Gabriel @ May 24 2004, 04:34 AM)
Oh! Joy!

laugh.gif

I'm happy my test is spreading happiness.
bond
woow, now thats what i not expected

- vorbis aotuv: vorbis is back, and i am proud to have helped finding out what vorbis encoder should be used smile.gif
- mpc vs aac: funny that mpc was that better than itunes (with a only 0.15 higher setting than in the last test)
- wma9: lol, worse than mp3! (and i even wonder that it got rated that high, even at 128 it had this metallic sound sometimes) -> go away m$
- atrac3: even worse than wma9 -> go away sony

and if you take this test as a comparison between some online music stores (itunes vs. wma9 based ones vs. sonys new store) itunes clearly comes out as the winner, leaving wma9 behind by far!
FireStarter
I see there is a very small margin between mpc and aoTuv, how would aoTuv react
in higher bitrates.?
JeanLuc
Very interesting results ...

I think it could be an interesting addition to show the bitrate for each encoder in the specific diagrams for each sample ...
dev0
The more I think of it the more impressed I am with the performance of LAME. Very good work Gabriel (and consider changing -V 5 default --athaa-sensitivity).
Raptus
How many results were discarded because of ranked refs?
rjamorim
QUOTE(Raptus @ May 24 2004, 05:00 AM)
How many results were discarded because of ranked refs?

54

Mind you that I didn't discard results that ranked the reference but on that sample pair ABXd the samples to a pval of 0.05 or less.
ff123
Some comments:

1. mpc encoded debussy.wav at too low of a bitrate (98 kbit/s), apparently, because multiple people commented on a distorted sound, and its low rating on this sample (3.53) hurt it in comparison with vorbis. Note that problem samples are not synonymous with high bitrate! I would hope Frank could look into what's going on with mpc on this sample.

2. It's not clear that the all of the samples which didn't show significant differences (there were 4) would have benefited much with a larger listener sample size. The Bartok_strings2.wav and OrdinaryWorld.wav samples in particular are pretty evenly rated across the board.

Roberto did a separate analysis omitting these 4 samples and the overall results were very similar to the results with all 18 samples, except that with 18 samples the confidence level increased. So I'd say they helped out, even if individually they didn't show significant differences between codecs.

3. The absolute ratings of iTunes is remarkably stable in the tests it's been featured in (4.39, 4.42, 4.20, and 4.26 on this one), even though the tests are not strictly comparable.

4. MPC should have been expected (and it did appear) to be slightly better this time around than the last multiformat test since its quality setting was tweaked up slightly (from 4 to 4.15).

5. Excellent job on AoTuVb2, Ayumi and everybody else who was involved. Seeing such a high score in the test shouldn't have been a real big surprise since those virtuoso tuning ears were rating the beta2 version at around 4.0 overall.

6. Lame is still improving. Good job Gabriel and [proxima]

ff123

Edit: After checking, I see that MPC's absolute score went down from 4.51 to 4.47, so comment 4 is not consistent with what actually happened. But then again, it's not strictly correct to compare scores on one test with scores on another.
Raptus
QUOTE(rjamorim @ May 24 2004, 12:06 AM)
QUOTE(Raptus @ May 24 2004, 05:00 AM)
How many results were discarded because of ranked refs?

54

Mind you that I didn't discard results that ranked the reference but on that sample pair ABXd the samples to a pval of 0.05 or less.

Ok.
Thats around 15% of the results... And for me it still doesn't feel right to take them as irrelevant for the stats...

What about all the /.ers? rolleyes.gif
Seems they were just interested in wasting bandwidth after all laugh.gif
Grease
I found my chanchan listening test result wrongly classified as a NewYorkCity result.


-Grease
rjamorim
QUOTE(Raptus @ May 24 2004, 05:25 AM)
What about all the /.ers?  rolleyes.gif
Seems they were just interested in wasting bandwidth after all  laugh.gif

More than 500 people downloaded the samples through bittorrent only - not counting HTTP downloads! :B

I won't ever understand these people. headbang.gif
ff123
QUOTE(Raptus @ May 24 2004, 12:25 AM)
QUOTE(rjamorim @ May 24 2004, 12:06 AM)
QUOTE(Raptus @ May 24 2004, 05:00 AM)
How many results were discarded because of ranked refs?

54

Mind you that I didn't discard results that ranked the reference but on that sample pair ABXd the samples to a pval of 0.05 or less.

Ok.
Thats around 15% of the results... And for me it still doesn't feel right to take them as irrelevant for the stats...

What about all the /.ers? rolleyes.gif
Seems they were just interested in wasting bandwidth after all laugh.gif

Some results with ranked refs are worse than others. Roberto showed me results from one person whose listening results I wouldn't trust at all, they were so bad (meaning lots of ranked refs).

There is always a question about how these results should be treated, and there are probably multiple ways of handling them. The fairest and simplest way seems to be to just throw them away if you have enough results that you can afford to do that, which in this case seems to be true.

ff123
rjamorim
QUOTE(Grease @ May 24 2004, 05:28 AM)
I found my chanchan listening test result wrongly classified as a NewYorkCity result.

No worries, that classification happened while uploading. I'll move it back to the correct folder later.
guruboolez
QUOTE(ff123 @ May 24 2004, 09:21 AM)
1. mpc encoded debussy.wav at too low of a bitrate (98 kbit/s), apparently, because multiple people commented on a distorted sound, and its low rating on this sample (3.53) hurt it in comparison with vorbis.  Note that problem samples are not synonymous with high bitrate!  I would hope Frank could look into what's going on with mpc on this sample.

The problem seems to be low-volume. MPC --radio have some troubles with low-volume sample, especially when there's a slight amout of noise. Debussy.wav is just an exemple amoung hundred of this problem.
Problem is shoking if playback volume is exceptionnaly high, but is probably less annoying on normal playback conditions (which explain maybe the overall relative good notation of the encoding - I expected to be lower).

Note that standard preset also suffers from this problem, but it's less critical...
amano
Wow. That is interesting. LAME with the --athaa-sensitivity switch and aoTuv being that strong.

Thanks to all participating and - of course - to all these great codec developers and exspecially to Roberto himself!!!
XXX
This particular test should be called, "The 128 kbps test for iTunes/WMA, and the low-130 test for AC3 and LAME, and the close-to-160 test for MPC/Vorbious.

Leahy iTunes MPC Vorbis Lame WMA Atrac3

bitrate 128 155 149 133 128 132

Score 4.34 4.41 4.68 4.11 4.37 3.76

I am aware of the rationalization. I am aware of the overall average. But let this be a n"oh?" to those that don't and aren't.
amano
AC3??? Vorbious???

Get some sleep. biggrin.gif
Lyx
maybe it would make sence to rename "iTunes" to "iTunes AAC" in the summary chart, so that people do not mistake the iTunes result with its lousy mp3-encoder?

- Lyx
cuan
lame's result is fairly amazing. I was about to begin encoding my cd collection into iTunes aac for an iPod im about to purchase. I think ill just stick with lame now. It's level of quality combined with it's compatiblity between mp3 players is an unbeatable combination.
bond
rjamorim, can you plz make a zoomed "music store codecs only" chart too (aac, wma9, atrac3), i think it would be very interesting and important to have such a chart handy for showing people that when they have to choose where they should buy songs from, that not only the prices, but also the quality is very important and varries a lot smile.gif

btw did i already thank you for your great test? thanks a lot! smile.gif

QUOTE(Lyx @ May 24 2004, 11:52 AM)
maybe it would make sence to rename "iTunes" to "iTunes AAC" in the summary chart, so that people do not mistake the iTunes result with its lousy mp3-encoder?

yepa and maybe add "mp3" to lame too, (and maybe ogg to vorbis) at least in the final chart to exclude all possible misunderstandings smile.gif
QuantumKnot
A big thank you to Roberto for his efforts in conducting this test. Let's hope that it is not the last too wink.gif
SebastianG
QUOTE(XXX @ May 24 2004, 02:12 AM)
This particular test should be called, "The 128 kbps test for iTunes/WMA, and the low-130 test for Atrac3 and LAME, and the close-to-160 test for MPC/Vorbis.


Yup, it's hard to compare CBR encoders with VBR encoders.
Everything you do is wrong smile.gif

Usually all encders tend to produce files at around 128 kbps on an "average" sound file with the same settings. That's why I think it's ok to compare these codecs with these settings. Many test samples were chosen to be hard-to-encode (weren't they?). VBR encoders use higher bitrates in those complex situations. CBR encoders don't.
Bad Luck for the CBR encoders.

So... you can ask yourself: Is the choice of test samples fair ?
I don't know...

bye,
Sebastian
JeanLuc
QUOTE(XXX @ May 24 2004, 10:12 AM)
This particular test should be called, "The 128 kbps test for iTunes/WMA, and the low-130 test for AC3 and LAME, and the close-to-160 test for MPC/Vorbious.

Leahy iTunes MPC Vorbis Lame WMA Atrac3

bitrate 128 155 149 133 128 132

Score 4.34 4.41 4.68 4.11 4.37 3.76

I am aware of the rationalization. I am aware of the overall average. But let this be a n"oh?" to those that don't and aren't.

That's why I suggested to put the bitrates into the score graphs for each sample ... so everyone can see at which average bitrate the codec's result has been obtained.
JohnV
QUOTE(cuan @ May 24 2004, 02:01 PM)
lame's result is fairly amazing. I was about to begin encoding my cd collection into iTunes aac for an iPod im about to purchase. I think ill just stick with lame now. It's level of quality combined with it's compatiblity between mp3 players is an unbeatable combination.

I suggest you do also your own tests concentrading for example on pre-echo etc. (I'm not saying that either one is better, I have not compared LAME 3.96 -V5 --athaa-sensitivity 1 against iTunes 4.2 with pre-echo).
Remember however that these are average results of a group with restricted amount of samples and listeners with different abilities. It shows pretty well the quality on average, but doesn't necessarely show some of the details which might be interesting for you.
Also I think that Lame 3.96 -V5 --athaa-sensitivity 1 is not tested enough to say it doesn't fail (badly) in certain cases even pretty often. Imo iTunes 4.2 AAC in this sense is more safe.

But, if it's not so big deal, that Lame setting does seem on average pretty good. smile.gif
Digga
QUOTE(QuantumKnot @ May 24 2004, 12:11 PM)
A big thank you to Roberto for his efforts in conducting this test.  Let's hope that it is not the last too  wink.gif

second the thanks to Roberto and everyone elso involved (including all the testers).

Roberto: come on, be honest, you would realy miss all the hick-hack and nag-nag going hand in hand with the tests, wouldn't you wink.gif biggrin.gif
diskvask
QUOTE(rjamorim @ May 24 2004, 09:32 AM)
QUOTE(Raptus @ May 24 2004, 05:25 AM)
What about all the /.ers?  :rolleyes:
Seems they were just interested in wasting bandwidth after all  :lol:

More than 500 people downloaded the samples through bittorrent only - not counting HTTP downloads! :B

I won't ever understand these people. :frustrated:

I think a lot of people thought that the test was going to be very easy (me included), "Come on, it's 128kbit! That sounds like crap, everybody knows that.".

...only to find out that there couldn't be found any major imperfections in the couple of samples tried. Sample 1 looks like it was one of the hardest ones to abx; very tough start, especially for someone who had set his mind on the assumtion above.

And besides, abx is an exhausting way of testing and it can be very frustrating/unmotivating if you don't get the results you're expecting ;).
Jojo
QUOTE(bond @ May 23 2004, 11:50 PM)
woow, now thats what i not expected

- wma9: lol, worse than mp3! (and i even wonder that it got rated that high, even at 128 it had this metallic sound sometimes) -> go away m$

it's a pitty that wma9 Pro was included in the test sad.gif...last test it was included it performed quite well
JohnV
QUOTE(Jojo @ May 24 2004, 03:07 PM)
QUOTE(bond @ May 23 2004, 11:50 PM)
woow, now thats what i not expected

- wma9: lol, worse than mp3! (and i even wonder that it got rated that high, even at 128 it had this metallic sound sometimes) -> go away m$

it's a pitty that wma9 Pro was included in the test sad.gif...last test it was included it performed quite well

Answer why wma9 pro was not included is here: http://www.hydrogenaudio.org/forums/index....ndpost&p=199103
dev0
QUOTE(XXX @ May 24 2004, 11:12 AM)
This particular test should be called, "The 128 kbps test for iTunes/WMA, and the low-130 test for AC3 and LAME, and the close-to-160 test for MPC/Vorbious.

Leahy iTunes MPC Vorbis Lame WMA Atrac3

bitrate 128 155 149 133 128 132

Score 4.34 4.41 4.68 4.11 4.37 3.76

I am aware of the rationalization. I am aware of the overall average. But let this be a n"oh?" to those that don't and aren't.

Where did you get those numbers from?
echo
@ Roberto

A big thanks for making this test possible. I hope you reconsider making more tests in the future.

About the test results, I noticed that for some samples there are no confidence intervals on the graphs (bartok_strings, leahy, mahler, ordinary world). Did everybody score exactly the same on these samples, or maybe you just forgot to put the intervals on the graphs?
JohnV
QUOTE(XXX @ May 24 2004, 01:12 PM)
This particular test should be called, "The 128 kbps test for iTunes/WMA, and the low-130 test for AC3 and LAME, and the close-to-160 test for MPC/Vorbious.

Leahy iTunes MPC Vorbis Lame WMA Atrac3

bitrate 128 155 149 133 128 132

Score 4.34 4.41 4.68 4.11 4.37 3.76

I am aware of the rationalization. I am aware of the overall average. But let this be a n"oh?" to those that don't and aren't.

See here how the average bitrates were decided for this test (personally I'm not absolutely sure if it was enough). Obviously those settings in the table close to 128 were used:
http://www.hydrogenaudio.org/forums/index....ndpost&p=207203

Also the correct average bitrates for the 18 samples tested are (instead of what you said):
CODE
iTunes MPC   aoTuV  Lame    WMA  Atrac3
128    136     135   134    128    132
guruboolez
Roberto> what software did you used to obtain wma9 files? Is it VBR-2 pass 128 kbps? What decoder? I've tried to reproduce the same wavform with different settings, and I wasn't able to do it.
This is a "lo-fi" version of our main content. To view the full version with more information, formatting and images, please click here.
Invision Power Board © 2001-2008 Invision Power Services, Inc.