Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: Dial-up bitrate listening test - Finished! (Read 29338 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

Dial-up bitrate listening test - Finished!

Hello.

I'd like to announce the results of the Dial-up bitrates (32kbps) listening test

Nero Digital Audio won, tied to CodingTechnologies' MP3pro. Ogg Vorbis, WMA Std., 7kHz lowpass, Real Audio and QDesign Music Codec come in second place, with Vorbis a little below the others. Lame loses.

The results page is here:
http://www.rjamorim.com/test/32kbps/results.html

For those in a hurry, here are the zoomed overall results:


Big thanks to everyone that helped and participated.

Best regards;

Roberto Amorim.

Dial-up bitrate listening test - Finished!

Reply #1
...and, with this test concluded, I retire from the listening test "scene", at least for the foreseeable future. It's time to move on, take a break, and to give others the oportunity to conduce their own tests. I might return to test conducing later, but don't expect anything for 2004.

As proper, I would like to take some time to acknowledge everyone that helped me in one way or another. Credit where credit is due.

First, and of course most of all, I would like to thank my master Darryl Miyaguchi (ff123). For his support, ideas, opinions, apps, and for his willingness to share his knowledge. If it wasn't for him, most likely these tests wouldn't have happened.

Second, the listening tests participants. I just grabbed your results and threw them together, the real show was being performed by you, on the background. I can't thank you enough, and you are the reason of the success of my tests.

Now, more or less out of order (I'm sure my memory won't be able to remember everyone and everything you did to contribute. Please cut me some slack...) :

-JohnV, Garf, Guruboolez and everyone else that contributed with opinions, hints, suggestions and criticism.
-Guruboolez, Tigre, ff123, [proxima], QuantumKnot, dev0 and others, for conducing parallel listening tests that contributed with valuable information to the main listening tests.
-Verloren, and later 1and1.com, for providing the page's hosting space.
-Menno, Verloren, Spoon, Dibrom and ScorLibran for providing package hosting space.
-Spoon, Guruboolez, JohnV, ff123 and everyone else that tested bitrate deviation of VBR profiles over a large amount of tracks.
-Everyone that uploaded samples to be featured in my tests.
-Ivan Dimkovic, Gabriel Bouvigne, Menno Bakker, Alexander Lerch, Karl Lillevold, two Dolby developers and an Apple developer, for providing valuable information on the codecs developed by them.
-ScorLibran, mdmuir, rpop, Mac, ff123 and several other people that supported me through difficult times.
-schnofler, for creating the wonderful Java ABC/HR comparator, and for always being eager to help me with it.
-Phong, for coming up with "chunky", a result file parser that helped me a lot and made result processing much, much faster.
-AstralStorm, for providing an obfuscated .bat file (that unfortunately didn't get used much), and Stux for converting my .bat to .sh script. Also ErikS, Roynux and Saeger for coming up with .sh scripts for my tests.
-Everyone else that I shamefully forgot to mention...

Also, of course, I would like to thank HydrogenAudio, for all the help and support provided, by both the community and the staff.

Very best regards;

Roberto José de Amorim
25/07/2004

Dial-up bitrate listening test - Finished!

Reply #2
At last! I've been waiting for this with bated breath and wetted pants since the 11th! Thanks, Roberto!
"We demand rigidly defined areas of doubt and uncertainty!" - Vroomfondel, H2G2

Dial-up bitrate listening test - Finished!

Reply #3
Very interesting results, esp how SBR has helped MP3 so much.

EDIT:  Oh, and many thanks to Roberto for all his efforts in organising these superb listening tests. 

Dial-up bitrate listening test - Finished!

Reply #4
I was very surprised to see RA and WM get such a hammering! Wow.

All we need now is an end-to-end streaming solution for NeroDigital and we're as most uk blokes over 50 would say "cooking with gas".

I'd keep typing but I think the RSI in my hands is gonna kill me.

Ruairi
rc55.com - nothing going on

Dial-up bitrate listening test - Finished!

Reply #5
Thank you for all your hard work, Roberto. All glory also to ff123, but I think you really "standardized" the testing. Hopefully similar kind of testing will continue.
This is very early but there are some early plans to create a web based result delivery system on HA server, which would make the testing and result processing/calculations faster. Hopefully it will happen.
I think everybody agrees that testing like this is very valuable and shouldnt end here.
Juha Laaksonheimo

Dial-up bitrate listening test - Finished!

Reply #6
Thank you very much for this listening test and previous listening tests, Roberto. They've been extremely valuable.
Over thinking, over analyzing separates the body from the mind.

Dial-up bitrate listening test - Finished!

Reply #7
I'm a bit surprised by the large error margins in the result. A quick count shows more than 900 results per codec (a great number), which I'd expect to give more confident results.

Dial-up bitrate listening test - Finished!

Reply #8
Quote
I'm a bit surprised by the large error margins in the result. A quick count shows more than 900 results per codec (a great number), which I'd expect to give more confident results.
[a href="index.php?act=findpost&pid=228944"][{POST_SNAPBACK}][/a]


You probably didn't notice it at the results page: I'm using Tukey Parametric HSD now.

ff123 recommended it. I think it's because we can give up some sensitivity (according to ff123, Tukey is less sensitive than Anova) because there are plenty of listeners to bring the error margins relatively down. The advantage is that Tukey is less likely to give you false positives.

Dial-up bitrate listening test - Finished!

Reply #9
I saw it. But even so, >900 results is a large number. So I'm surprised the margins are still 0.4 or so points wide.

For comparison, the previous test had no more than 300 results per sample and the margins were as small as 0.2!

I'd expect a nonparametric test to give larger margins, but this is extreme. Also, the requirement for a normal distribution should be much easier satisfied with this low bitrate than with the previous high bitrate tests, so the choice to use parametric before and nonparametric now looks very weird to me.

Dial-up bitrate listening test - Finished!

Reply #10
If the previous tests used Anova, can we get Anova results also from this test please.
It would be interesting.
Juha Laaksonheimo

Dial-up bitrate listening test - Finished!

Reply #11
I'd also go with the same statistical analysis method so we could compare earlier listening tests - at least,  ANOVA should be added as addition - if people desire to use a different method.

Just my $0.02

Dial-up bitrate listening test - Finished!

Reply #12
Thanks for all efforts done in 18 months, Roberto  These tests were really informative and enriching.

Two comments on the last one:

- notations are really high. I guess that most people didn't linked their notation to the corresponding description. Understandable.

- Parametric Stereo encodings are not really far from 'old' MP3pro. I expected better results. Maybe further tunings could change the current situation.


I've a technical question: does AAC with parametric stereo use SBR tool? Because I could detect the same kind of artifacts with mp3pro and AAC (a very grainy sound). They are very similar.

Dial-up bitrate listening test - Finished!

Reply #13
Now that the results are available, I wanted to know how I ranked the samples. The results files are decrypted and it seems that I need to know which sample corresponds to which codec. I know that
  • 1 = Lame
  • 2 = Ogg Vorbis
  • 4 = Lowpass
  • 5 = Nero MP4
And I know that 3, 6, 7 and 8 are WMA, MP3pro, Real and QDesign. But which one is which?

rjamorim, I thank you very much for your superb work with the listening tests.

Dial-up bitrate listening test - Finished!

Reply #14
Quote
If the previous tests used Anova, can we get Anova results also from this test please.
It would be interesting.
[a href="index.php?act=findpost&pid=228953"][{POST_SNAPBACK}][/a]


No problem

I'll do it tomorrow. I badly need some sleep now.

If you want, you can check the raw Anova results for the overall plot:

Code: [Select]
Fisher's protected LSD for ANOVA:   0.272

Means:

Nero     MP3pro   WMA      Real     QDesign  Lowpass  Vorbis   Lame    
 3.30     3.10     2.68     2.59     2.58     2.56     2.48     1.79  

---------------------------- p-value Matrix ---------------------------

        MP3pro   WMA      Real     QDesign  Lowpass  Vorbis   Lame    
Nero     0.146    0.000*   0.000*   0.000*   0.000*   0.000*   0.000*  
MP3pro            0.003*   0.000*   0.000*   0.000*   0.000*   0.000*  
WMA                        0.506    0.449    0.378    0.144    0.000*  
Real                                0.926    0.828    0.423    0.000*  
QDesign                                      0.900    0.478    0.000*  
Lowpass                                               0.559    0.000*  
Vorbis                                                         0.000*  
-----------------------------------------------------------------------

Nero is better than WMA, Real, QDesign, Lowpass, Vorbis, Lame
MP3pro is better than WMA, Real, QDesign, Lowpass, Vorbis, Lame
WMA is better than Lame
Real is better than Lame
QDesign is better than Lame
Lowpass is better than Lame
Vorbis is better than Lame


So, the only diference in the final rankings is that MP3pro becomes statistically better than WMA (with Tukey, they are tied by a tiny margin that I considered safe enough to disregard when choosing who is better than who)

Quote
- notations are really high. I guess that most people didn't linked their notation to the corresponding description. Understandable.


Indeed. I believe they used the numbers just as a parameter to compare one codec against the other. Well, shouldn't affect the final rankings negatively, unless someone bases himself on the ranking labels to extrapolate quality.

Quote
I've a technical question: does AAC with parametric stereo use SBR tool?
[a href="index.php?act=findpost&pid=228956"][{POST_SNAPBACK}][/a]


Yes.

Dial-up bitrate listening test - Finished!

Reply #15
Quote
But which one is which?
[a href="index.php?act=findpost&pid=228958"][{POST_SNAPBACK}][/a]


Here:

1 - Lame
2 - Vorbis
3 - WMA
4 - Lowpass
5 - Nero
6 - MP3pro
7 - Real
8 - QDesign

Dial-up bitrate listening test - Finished!

Reply #16
I'd like to add a little critic about one on your conclusions.

Quote
how well codecs evolved since mid-2000, when EBU conduced their acclaimed MUSHRA formal listening test. You can compare the results of this test with that one using the 7kHz lowpass. Back then, the anchor clearly won over all other codecs. This time, it often lost to most of them in each individual sample.


Quote
QDesign also was a pleasant surprise. Considering it's an encoder that hasn't been developed since mid-1999


I suppose that MUSHRA listening test include the same QDesign encoder than this last test. In other words, if Qdesign was inferior to 7Khz anchor on MUSHRA test, it should be inferior again this time. But it's not the case. In 2000, 7Khz anchor was ranked at ~60 and Qdesign was far: inferior to 40. In 2004, both are at the same position (2.56 vs 2.58).

I suppose that the "collective taste" of people performing MUSHRA test was significantly different from the "collective taste" of the participants of this listening test. Otherwise, Qdesign had obtain a lower notation (or Anchor a much better one).
In these conditions, it seems difficult to conclude on "progress" of encoders. Or at least, it's very difficult or maybe impossible to compare MUSHRA results and the recent ones.

Dial-up bitrate listening test - Finished!

Reply #17
Well, I believe one of the reasons QDesign performed so badly at that test is because it had many more speech samples, proportionally (4 in 9, while my test had 3 in 18). And you can clearly see that speech is pure poison for QDesign.

But this is a punctual problem with QDesign. I'm sure it would have performed much worse, maybe close to the results obtained by MUSHRA, if the proportion of speech samples was the same. This problem doesn't affect other codecs, so I still believe codecs other than QDesign can be related to MUSHRA, to some extent


Edit: Besides, QDesign has been resampled to 32kHz for my test. MUSHRA tested it at 44.1kHz (section 7.4)

Dial-up bitrate listening test - Finished!

Reply #18
You're right: these two differences (resampling and poison-sample) could explain the whole difference. Nice clarification.

Sweet dreams

Dial-up bitrate listening test - Finished!

Reply #19
Quote
I saw it. But even so, >900 results is a large number. So I'm surprised the margins are still 0.4 or so points wide.

For comparison, the previous test had no more than 300 results per sample and the margins were as small as 0.2!

I'd expect a nonparametric test to give larger margins, but this is extreme. Also, the requirement for a normal distribution should be much easier satisfied with this low bitrate than with the previous high bitrate tests, so the choice to use parametric before and nonparametric now looks very weird to me.
[a href="index.php?act=findpost&pid=228946"][{POST_SNAPBACK}][/a]


For clarification, the analysis is still parametric (Tukey's parametric analysis), but it corrects for the large number of comparisons being made, a more sophisticated version of a Bonferroni correction.

The confidence intervals of the individual samples are probably seeing more of a reduction than the overall results.  That's where the large number of listeners are being applied.

For the overall results, you're basically averaging the listeners for each individual sample, and then using those averaged results in computing the confidence intervals.  So N = 18, not > 900.

If you expect a huge number of listeners, it's much better to increase the number of music samples than to let only a few samples accumulate lots of listeners.

ff123

Dial-up bitrate listening test - Finished!

Reply #20
BTW, Roberto,

Great job as always, and sorry to see you leave the testing scene (hopefully not permanently).

ff123

Dial-up bitrate listening test - Finished!

Reply #21
Would it be too much trouble to post separate results for music vs. speech?  I wonder how speech affected the results.

Thanks, for all your work.  I've been waiting for this.  It will be great to see what happens at this bitrate over the next couple years.

Birch

Dial-up bitrate listening test - Finished!

Reply #22
Quote
Would it be too much trouble to post separate results for music vs. speech?  I wonder how speech affected the results.
[a href="index.php?act=findpost&pid=228974"][{POST_SNAPBACK}][/a]


I will definitely post results for only the music samples, since it's interesting to see how much QDesign improves when removing the voice samples.

I won't post results for the speech samples only because there are only 3 of them, and that's not enough to generate significative results. It's a better ides to conduce a speech listening test comparing vocodecs. I discussed some of this idea with jmvalin already.

Dial-up bitrate listening test - Finished!

Reply #23
Quote
...and, with this test concluded, I retire from the listening test "scene", at least for the foreseeable future. It's time to move on, take a break, and to give others the oportunity to conduce their own tests. I might return to test conducing later, but don't expect anything for 2004.

Thank you very much, Roberto. Your work has been quite valuable.
And hope it won't be a too long break.   

Dial-up bitrate listening test - Finished!

Reply #24
Thank you very much for your work Roberto.