Help - Search - Members - Calendar
Full Version: Dial-up bitrate listening test - Finished!
Hydrogenaudio Forums > Hydrogenaudio Forum > Validated News
Pages: 1, 2
rjamorim
Hello.

I'd like to announce the results of the Dial-up bitrates (32kbps) listening test

Nero Digital Audio won, tied to CodingTechnologies' MP3pro. Ogg Vorbis, WMA Std., 7kHz lowpass, Real Audio and QDesign Music Codec come in second place, with Vorbis a little below the others. Lame loses.

The results page is here:
http://www.rjamorim.com/test/32kbps/results.html

For those in a hurry, here are the zoomed overall results:
user posted image

Big thanks to everyone that helped and participated.

Best regards;

Roberto Amorim.
rjamorim
...and, with this test concluded, I retire from the listening test "scene", at least for the foreseeable future. It's time to move on, take a break, and to give others the oportunity to conduce their own tests. I might return to test conducing later, but don't expect anything for 2004.

As proper, I would like to take some time to acknowledge everyone that helped me in one way or another. Credit where credit is due.

First, and of course most of all, I would like to thank my master Darryl Miyaguchi (ff123). For his support, ideas, opinions, apps, and for his willingness to share his knowledge. If it wasn't for him, most likely these tests wouldn't have happened.

Second, the listening tests participants. I just grabbed your results and threw them together, the real show was being performed by you, on the background. I can't thank you enough, and you are the reason of the success of my tests.

Now, more or less out of order (I'm sure my memory won't be able to remember everyone and everything you did to contribute. Please cut me some slack...) :

-JohnV, Garf, Guruboolez and everyone else that contributed with opinions, hints, suggestions and criticism.
-Guruboolez, Tigre, ff123, [proxima], QuantumKnot, dev0 and others, for conducing parallel listening tests that contributed with valuable information to the main listening tests.
-Verloren, and later 1and1.com, for providing the page's hosting space.
-Menno, Verloren, Spoon, Dibrom and ScorLibran for providing package hosting space.
-Spoon, Guruboolez, JohnV, ff123 and everyone else that tested bitrate deviation of VBR profiles over a large amount of tracks.
-Everyone that uploaded samples to be featured in my tests.
-Ivan Dimkovic, Gabriel Bouvigne, Menno Bakker, Alexander Lerch, Karl Lillevold, two Dolby developers and an Apple developer, for providing valuable information on the codecs developed by them.
-ScorLibran, mdmuir, rpop, Mac, ff123 and several other people that supported me through difficult times.
-schnofler, for creating the wonderful Java ABC/HR comparator, and for always being eager to help me with it.
-Phong, for coming up with "chunky", a result file parser that helped me a lot and made result processing much, much faster.
-AstralStorm, for providing an obfuscated .bat file (that unfortunately didn't get used much), and Stux for converting my .bat to .sh script. Also ErikS, Roynux and Saeger for coming up with .sh scripts for my tests.
-Everyone else that I shamefully forgot to mention... blush.gif

Also, of course, I would like to thank HydrogenAudio, for all the help and support provided, by both the community and the staff.

Very best regards;

Roberto José de Amorim
25/07/2004
Omion
At last! I've been waiting for this with bated breath and wetted pants since the 11th! Thanks, Roberto! biggrin.gif
QuantumKnot
Very interesting results, esp how SBR has helped MP3 so much. smile.gif

EDIT: Oh, and many thanks to Roberto for all his efforts in organising these superb listening tests. smile.gif
rc55
I was very surprised to see RA and WM get such a hammering! Wow.

All we need now is an end-to-end streaming solution for NeroDigital and we're as most uk blokes over 50 would say "cooking with gas".

I'd keep typing but I think the RSI in my hands is gonna kill me.

Ruairi
JohnV
Thank you for all your hard work, Roberto. All glory also to ff123, but I think you really "standardized" the testing. Hopefully similar kind of testing will continue.
This is very early but there are some early plans to create a web based result delivery system on HA server, which would make the testing and result processing/calculations faster. Hopefully it will happen.
I think everybody agrees that testing like this is very valuable and shouldnt end here.
PoisonDan
Thank you very much for this listening test and previous listening tests, Roberto. They've been extremely valuable. smile.gif
Garf
I'm a bit surprised by the large error margins in the result. A quick count shows more than 900 results per codec (a great number), which I'd expect to give more confident results.
rjamorim
QUOTE(Garf @ Jul 25 2004, 09:16 PM)
I'm a bit surprised by the large error margins in the result. A quick count shows more than 900 results per codec (a great number), which I'd expect to give more confident results.
*



You probably didn't notice it at the results page: I'm using Tukey Parametric HSD now.

ff123 recommended it. I think it's because we can give up some sensitivity (according to ff123, Tukey is less sensitive than Anova) because there are plenty of listeners to bring the error margins relatively down. The advantage is that Tukey is less likely to give you false positives.
Garf
I saw it. But even so, >900 results is a large number. So I'm surprised the margins are still 0.4 or so points wide.

For comparison, the previous test had no more than 300 results per sample and the margins were as small as 0.2!

I'd expect a nonparametric test to give larger margins, but this is extreme. Also, the requirement for a normal distribution should be much easier satisfied with this low bitrate than with the previous high bitrate tests, so the choice to use parametric before and nonparametric now looks very weird to me.
JohnV
If the previous tests used Anova, can we get Anova results also from this test please.
It would be interesting.
Ivan Dimkovic
I'd also go with the same statistical analysis method so we could compare earlier listening tests - at least, ANOVA should be added as addition - if people desire to use a different method.

Just my $0.02 smile.gif
guruboolez
Thanks for all efforts done in 18 months, Roberto smile.gif These tests were really informative and enriching.

Two comments on the last one:

- notations are really high. I guess that most people didn't linked their notation to the corresponding description. Understandable.

- Parametric Stereo encodings are not really far from 'old' MP3pro. I expected better results. Maybe further tunings could change the current situation.


I've a technical question: does AAC with parametric stereo use SBR tool? Because I could detect the same kind of artifacts with mp3pro and AAC (a very grainy sound). They are very similar.
glistener
Now that the results are available, I wanted to know how I ranked the samples. The results files are decrypted and it seems that I need to know which sample corresponds to which codec. I know that
  • 1 = Lame
  • 2 = Ogg Vorbis
  • 4 = Lowpass
  • 5 = Nero MP4
And I know that 3, 6, 7 and 8 are WMA, MP3pro, Real and QDesign. But which one is which?

rjamorim, I thank you very much for your superb work with the listening tests. smile.gif
rjamorim
QUOTE(JohnV @ Jul 25 2004, 09:54 PM)
If the previous tests used Anova, can we get Anova results also from this test please.
It would be interesting.
*


No problem smile.gif

I'll do it tomorrow. I badly need some sleep now.

If you want, you can check the raw Anova results for the overall plot:

CODE
Fisher's protected LSD for ANOVA:   0.272

Means:

Nero     MP3pro   WMA      Real     QDesign  Lowpass  Vorbis   Lame    
 3.30     3.10     2.68     2.59     2.58     2.56     2.48     1.79  

---------------------------- p-value Matrix ---------------------------

        MP3pro   WMA      Real     QDesign  Lowpass  Vorbis   Lame    
Nero     0.146    0.000*   0.000*   0.000*   0.000*   0.000*   0.000*  
MP3pro            0.003*   0.000*   0.000*   0.000*   0.000*   0.000*  
WMA                        0.506    0.449    0.378    0.144    0.000*  
Real                                0.926    0.828    0.423    0.000*  
QDesign                                      0.900    0.478    0.000*  
Lowpass                                               0.559    0.000*  
Vorbis                                                         0.000*  
-----------------------------------------------------------------------

Nero is better than WMA, Real, QDesign, Lowpass, Vorbis, Lame
MP3pro is better than WMA, Real, QDesign, Lowpass, Vorbis, Lame
WMA is better than Lame
Real is better than Lame
QDesign is better than Lame
Lowpass is better than Lame
Vorbis is better than Lame


So, the only diference in the final rankings is that MP3pro becomes statistically better than WMA (with Tukey, they are tied by a tiny margin that I considered safe enough to disregard when choosing who is better than who)

QUOTE(guruboolez @ Jul 25 2004, 10:02 PM)
- notations are really high. I guess that most people didn't linked their notation to the corresponding description. Understandable.


Indeed. I believe they used the numbers just as a parameter to compare one codec against the other. Well, shouldn't affect the final rankings negatively, unless someone bases himself on the ranking labels to extrapolate quality.

QUOTE
I've a technical question: does AAC with parametric stereo use SBR tool?
*


Yes.
rjamorim
QUOTE(glistener @ Jul 25 2004, 10:04 PM)
But which one is which?
*


Here:

1 - Lame
2 - Vorbis
3 - WMA
4 - Lowpass
5 - Nero
6 - MP3pro
7 - Real
8 - QDesign
guruboolez
I'd like to add a little critic about one on your conclusions.

QUOTE
how well codecs evolved since mid-2000, when EBU conduced their acclaimed MUSHRA formal listening test. You can compare the results of this test with that one using the 7kHz lowpass. Back then, the anchor clearly won over all other codecs. This time, it often lost to most of them in each individual sample.


QUOTE
QDesign also was a pleasant surprise. Considering it's an encoder that hasn't been developed since mid-1999


I suppose that MUSHRA listening test include the same QDesign encoder than this last test. In other words, if Qdesign was inferior to 7Khz anchor on MUSHRA test, it should be inferior again this time. But it's not the case. In 2000, 7Khz anchor was ranked at ~60 and Qdesign was far: inferior to 40. In 2004, both are at the same position (2.56 vs 2.58).

I suppose that the "collective taste" of people performing MUSHRA test was significantly different from the "collective taste" of the participants of this listening test. Otherwise, Qdesign had obtain a lower notation (or Anchor a much better one).
In these conditions, it seems difficult to conclude on "progress" of encoders. Or at least, it's very difficult or maybe impossible to compare MUSHRA results and the recent ones.
rjamorim
Well, I believe one of the reasons QDesign performed so badly at that test is because it had many more speech samples, proportionally (4 in 9, while my test had 3 in 18). And you can clearly see that speech is pure poison for QDesign.

But this is a punctual problem with QDesign. I'm sure it would have performed much worse, maybe close to the results obtained by MUSHRA, if the proportion of speech samples was the same. This problem doesn't affect other codecs, so I still believe codecs other than QDesign can be related to MUSHRA, to some extent


Edit: Besides, QDesign has been resampled to 32kHz for my test. MUSHRA tested it at 44.1kHz (section 7.4)
guruboolez
You're right: these two differences (resampling and poison-sample) could explain the whole difference. Nice clarification.

Sweet dreams smile.gif
ff123
QUOTE(Garf @ Jul 25 2004, 04:24 PM)
I saw it. But even so, >900 results is a large number. So I'm surprised the margins are still 0.4 or so points wide.

For comparison, the previous test had no more than 300 results per sample and the margins were as small as 0.2!

I'd expect a nonparametric test to give larger margins, but this is extreme. Also, the requirement for a normal distribution should be much easier satisfied with this low bitrate than with the previous high bitrate tests, so the choice to use parametric before and nonparametric now looks very weird to me.
*



For clarification, the analysis is still parametric (Tukey's parametric analysis), but it corrects for the large number of comparisons being made, a more sophisticated version of a Bonferroni correction.

The confidence intervals of the individual samples are probably seeing more of a reduction than the overall results. That's where the large number of listeners are being applied.

For the overall results, you're basically averaging the listeners for each individual sample, and then using those averaged results in computing the confidence intervals. So N = 18, not > 900.

If you expect a huge number of listeners, it's much better to increase the number of music samples than to let only a few samples accumulate lots of listeners.

ff123
ff123
BTW, Roberto,

Great job as always, and sorry to see you leave the testing scene (hopefully not permanently).

ff123
Birch
Would it be too much trouble to post separate results for music vs. speech? I wonder how speech affected the results.

Thanks, for all your work. I've been waiting for this. It will be great to see what happens at this bitrate over the next couple years.

Birch
rjamorim
QUOTE(Birch @ Jul 25 2004, 11:21 PM)
Would it be too much trouble to post separate results for music vs. speech?  I wonder how speech affected the results.
*


I will definitely post results for only the music samples, since it's interesting to see how much QDesign improves when removing the voice samples.

I won't post results for the speech samples only because there are only 3 of them, and that's not enough to generate significative results. It's a better ides to conduce a speech listening test comparing vocodecs. I discussed some of this idea with jmvalin already.
kanuac
QUOTE(rjamorim @ Jul 26 2004, 12:21 AM)
...and, with this test concluded, I retire from the listening test "scene", at least for the foreseeable future. It's time to move on, take a break, and to give others the oportunity to conduce their own tests. I might return to test conducing later, but don't expect anything for 2004.

Thank you very much, Roberto. Your work has been quite valuable.
And hope it won't be a too long break. smile.gif wink.gif
Gabriel
Thank you very much for your work Roberto.
robUx4
Yes just thank you !
I guess you'll probably still be working on Rarewares smile.gif

I expected Vorbis to do a little better. But I'm glad Nero (HE-AAC + PS) does so well. We need support in embedded devices now.
SirGrey
At last !
Thanks, Roberto !
---
Yeah, seems that SBR is good idea when bitrate is low...
fileman
Another Thanks from me, Roberto! You always did a great job with the tests - I hope there will be somebody who will continue...
vinu
Thank you Roberto, for all your listening tests, not just this one. They have all been very informative (to me). Good luck on your future.

I've been more of a lurker than a poster on these forums, but have learnt (and am still learning) a great deal from the collective knowledge of all the 'regulars' who frequent these forums.

Thank you!
eltoder
QUOTE(Garf @ Jul 26 2004, 06:24 AM)
I saw it. But even so, >900 results is a large number. So I'm surprised the margins are still 0.4 or so points wide.

For comparison, the previous test had no more than 300 results per sample and the margins were as small as 0.2!

I'd expect a nonparametric test to give larger margins, but this is extreme. Also, the requirement for a normal distribution should be much easier satisfied with this low bitrate than with the previous high bitrate tests, so the choice to use parametric before and nonparametric now looks very weird to me.
*


The margins are so wide beacuse results vary much between samples.

-Eugene
Vietwoojagig
QUOTE(rjamorim @ Jul 26 2004, 12:21 AM)
Hello.

I'd like to announce the results of the Dial-up bitrates (32kbps) listening test
Hi Rjamorim,

nice test, intersting results, great job.

One question. Do you have an excel-file with all results. I'd like to make my own statistic analysis.

thanks
bleh
w00t, awesome.
rjamorim
QUOTE(robUx4 @ Jul 26 2004, 05:03 AM)
I guess you'll probably still be working on Rarewares smile.gif
*


Sure. And even if I stopped, I believe John33 and xmixahlx would keep doing a good job maintaining it. smile.gif

QUOTE
I expected Vorbis to do a little better.


Yeah, I hope Monty looks into enabling 32kHz encoding at 32kbps.

QUOTE
One question. Do you have an excel-file with all results. I'd like to make my own statistic analysis.


Sure. The Excel file is here:
http://www.rarewares.org/rja/32kbps.zip

I also included the result tables generated by chunky, so that you can feed them to Friedman using different parameters.

Please disregard the comments in the spreadsheet. They haven't been updated since the MP3 test.

Regards;

Roberto.
glistener
In order to reduce a slashdot effect I wgeted http://www.rjamorim.com/test/32kbps/results.html (--mirror --no-parent etc.) Now I am ready to zip it and share it via the ed2k net. Others might make a torrent or use another p2p application that has secure hashes (i.e. not KaZaA). Is that okay, or would I be infringing your copyright, rjamorim? Or the listening test participants' copyright? (Their comments are included in the zip as well.)

BTW crawling the results, I noticed that elmar3rd's results are not included, because the "#" isn't escaped in the href.

EDIT: punctuation
rjamorim
QUOTE(glistener @ Jul 26 2004, 12:07 PM)
In order to reduce a slashdot effect I wgeted http://www.rjamorim.com/test/32kbps/results.html (--mirror --no-parent etc.) Now I am ready to zip it and share it via the ed2k net. Others might make a torrent or use another p2p application that has secure hashes (i.e. not KaZaA).
*


Well, that's a welcome initiative, but how it is going to help save bandwidth?

Maybe include the ed2k link in the Slashdot article, if it's ever posted?

If someone creates a torrent, I can put it at RW's tracker.

QUOTE
Is that okay, or would I be infringing your copyright, rjamorim?


Nah, all those pages are free for anyone to copy and mirror if they want.

I actually created a big mirror archive of the listening tests page (except the samples folder), so that people can keep it and mirror elsewhere if rjamorim.com ever goes down.

QUOTE
Or the listening test participants' copyright? (Their comments are included in the zip as well.)


They are informed in the readme that all results are going to be published, so I believe they can't (and wouldn't) complain.

QUOTE
BTW crawling the results, I noticed that elmar3rd's results are not included, because the "#" isn't escaped in the href.


I'll ask Mac to fix the php indexer. You can also get the .rar mirror file to obtain his results.

Thank-you.
ff123
This line needs updating in the text of the explanation:

"For example, in the chanchan plot below, Lame is rated better than Atrac3 with 95% confidence. And iTunes is rated better than Lame with greater than 95% confidence."

ff123

I wonder why Nero didn't do much better than mp3pro? Are the advantages of aac over mp3 not as big at this bitrate?
rjamorim
QUOTE(ff123 @ Jul 26 2004, 02:08 PM)
"For example, in the chanchan plot below, Lame is rated better than Atrac3 with 95% confidence. And iTunes is rated better than Lame with greater than 95% confidence."
*


Oops. Thank-you very much for pointing that out. Just fixed it.
bond
roberto, thanks a lot for all your hard work conducing the listening tests! they were really more than very important in bringing light into the codec jungle smile.gif


my comments on the results:
ahead: plz start investing much more time on working on your LC aac encoder!
Garf
QUOTE(ff123 @ Jul 26 2004, 07:08 PM)
I wonder why Nero didn't do much better than mp3pro?  Are the advantages of aac over mp3 not as big at this bitrate?
*



If you look you'll see all other codecs are close together too (hence my pendanticism about the analysis).

It's more like SBR being so big an advantage that it swaps out the other differences.
JohnV
QUOTE(ff123 @ Jul 26 2004, 08:08 PM)
I wonder why Nero didn't do much better than mp3pro?  Are the advantages of aac over mp3 not as big at this bitrate?
*


There's imo still much more potential for the ND to develope at 32kbps compared to mp3pro, remember that this was the first semi-public version of this low-bitrate implementation which isn't actually even released yet officially but only for this test.
Ivan Dimkovic
I'd like to point out that Nero Digital technology will certainly improve - as well as SBR in the future - at this moment, it has shown a great potential and definitely more space to become even better.

3.3 is not a bad score for 32 kb/s and beta technology smile.gif

I would like to thank everybody who took part in this test.
Gabriel
QUOTE
I wonder why Nero didn't do much better than mp3pro? Are the advantages of aac over mp3 not as big at this bitrate?

I have the feeling that the AAC-LC part of the Ahead encoder is "far" from beeing optimal.
It does not mean that it is bad, but means that there is still an high progress margin.

note: from an external point of view, I think that it would have been a bad choice to spend a lot of time fine tuning the LC part if that would imply spending less time on SBR and PS.
StoneRoses
QUOTE(Gabriel @ Jul 27 2004, 02:40 PM)
I have the feeling that the AAC-LC part of the Ahead encoder is "far" from beeing optimal.
It does not mean that it is bad, but means that there is still an high progress margin.

note: from an external point of view, I think that it would have been a bad choice to spend a lot of time fine tuning the LC part if that would imply spending less time on SBR and PS.
*



I feel it too.
I compare mid-low bitrate of Nero VBR AAC-LC with those of Quicktime (CBR), (for using on my mobile phone, which does not support AAC-HE) I think there are a lot of room for improvement.

When Quicktime offers its VBR encoder, Ahead will get serious competition on mid-to-high bitrate. Or Nero decide to avoide competition and focus only on low bitrate side?
ScorLibran
Based on it's performance in the 128kbps test, I didn't expect Vorbis to be tied with the back of the pack in this one. Just goes to show that formats perform very differently at different bitrates when compared to other formats.

Thanks for all the work you've done on this test, Roberto, and for the work you've done on all the others. You've been an invaluable part of the audio encoding community with these selfless contributions of time and effort.

Best of luck to you in your future aspirations. smile.gif
Birch
Hi Roberto,
Did you get a chance to to a total music only results chart? If so where is it posted?

Thanks,
Birch
thana
QUOTE(Birch @ Jul 30 2004, 04:58 AM)
Hi Roberto,
Did you get a chance to to a total music only results chart?  If so where is it posted?


i would be interested in those too. also the results with anova analysis (for comparability purposes with the other tests) would be nice. smile.gif
rjamorim
Surprisingly enough, I'm on another trip :B

I'll try to produce these graphs next monday.
vinnie97
A late thank you to Roberto as well! BTW, excuse the noob question but how does one view the .ecf files created in the test? I'd like to review my results and I guess they need to be decrypted in some fashion.
rjamorim
QUOTE(vinnie97 @ Aug 1 2004, 07:46 AM)
how does one view the .ecf files created in the test? I'd like to review my results and I guess they need to be decrypted in some fashion.
*


On ABC-HR Java, go to Tools -> Process test results.

Once there, you can load the .ecf files and decrypt them with the admin key:
http://www.rjamorim.com/test/32kbps/comments/32kbps.key

You can also convert them from XML to txt. txt is easier for reading, and XML is easier for parsing.

Regards;

Roberto.
vinnie97
Again, thanks. I found that I most consistently rated WMA and mp3pro the highest followed by nero and qdesign. ohmy.gif I think the grainyness of SBR prevented me from voting for mp3pro and nero as the clear winners.
This is a "lo-fi" version of our main content. To view the full version with more information, formatting and images, please click here.
Invision Power Board © 2001-2008 Invision Power Services, Inc.