IPB

Welcome Guest ( Log In | Register )

5 Pages V  < 1 2 3 4 > »   
Reply to this topicStart new topic
MPC vs OGG VORBIS vs MP3 at 175 kbps, listening test on non-killer samples
Pio2001
post Jul 12 2004, 19:59
Post #26


Moderator


Group: Super Moderator
Posts: 3936
Joined: 29-September 01
Member No.: 73



Since you used sequencial ABX tests, with a max number of trials equal to 50, and stopping at p<=0.05, then, according to this post, the corrected p value that you got for the ones that are successful is
p=0.1579
We can see from your logs that among the 60 possible original vs encoded ABX tests, you succeeded 21 of them with p<=0.05. If you were guessing, 9 successes would have been expected instead of 21.
Go to the top of the page
+Quote Post
Pio2001
post Jul 12 2004, 20:18
Post #27


Moderator


Group: Super Moderator
Posts: 3936
Joined: 29-September 01
Member No.: 73



I fed this table in ff123's analyzer :

CODE
MP3-V2    MP3-V3    MPC-q5    MGX-q5.5  MGX-q5.99 MGX-q6    
2.00      1.50      3.00      2.00      2.00      3.20      
1.50      1.00      4.00      2.90      2.90      3.50      
3.00      2.50      2.80      3.00      3.30      4.00      
3.00      2.00      4.00      2.00      2.00      2.30      
1.50      1.00      4.90      2.50      2.50      3.30      
3.00      1.80      3.80      2.20      2.40      3.00      
1.50      1.20      3.50      1.80      2.30      3.40      
1.50      2.70      4.00      2.00      2.00      2.30      
3.00      2.80      4.20      1.60      1.50      3.00      
3.00      2.30      4.00      2.30      2.50      3.50      



I chose Anova, p=0.05, which gives

CODE
FRIEDMAN version 1.24 (Jan 17, 2002) http://ff123.net/
Blocked ANOVA analysis

Number of listeners: 10
Critical significance:  0.05
Significance of data: 1.24E-08 (highly significant)
---------------------------------------------------------------
ANOVA Table for Randomized Block Designs Using Ratings

Source of         Degrees     Sum of    Mean
variation         of Freedom  squares   Square    F      p

Total               59          45.38
Testers (blocks)     9           3.67
Codecs eval'd        5          26.01    5.20   14.92  1.24E-08
Error               45          15.69    0.35
---------------------------------------------------------------
Fisher's protected LSD for ANOVA:   0.532

Means:

MPC-q5   MGX-q6   MGX-q5.9 MP3-V2   MGX-q5.5 MP3-V3  
 3.82     3.15     2.34     2.30     2.23     1.88  

---------------------------- p-value Matrix ---------------------------

        MGX-q6   MGX-q5.9 MP3-V2   MGX-q5.5 MP3-V3  
MPC-q5   0.015*   0.000*   0.000*   0.000*   0.000*  
MGX-q6            0.004*   0.002*   0.001*   0.000*  
MGX-q5.9                   0.880    0.679    0.088    
MP3-V2                              0.792    0.119    
MGX-q5.5                                     0.192    
-----------------------------------------------------------------------

MPC-q5 is better than MGX-q6, MGX-q5.99, MP3-V2, MGX-q5.5, MP3-V3
MGX-q6 is better than MGX-q5.99, MP3-V2, MGX-q5.5, MP3-V3


Conclusion : if I understand properly the above, for Guruboolez' ears and samples,
  • MPC standard is the winner
  • Vorbis Megamix -q6 is second
  • All other are tied at third place.
Go to the top of the page
+Quote Post
guruboolez
post Jul 13 2004, 02:08
Post #28





Group: Members (Donating)
Posts: 3474
Joined: 7-November 01
From: Strasbourg (France)
Member No.: 420



First, thanks for the analysis. I can't do this. But...

I wonder: lame -V 3 appeared to sound the worst on 8/10 samples; and on one of the two remaining samples, -V 3 obtained the same note than vorbis -q 5,50 and a lower note than -q 5,99. Lame -V3 is sometimes showing weird artifacts (organ, harpsichord), not audible with vorbis.
To be short, lame -V 3 is eight time worse than vorbis -q 5,99, one time eaqual, and one time better, and have the stronger artifacts. That's why it makes no doubt than -V 3 is not competitive against other contenders.


So how is it possible that a statistical tool conclude on the "identity" of both encoders? For me (I'm unfortunately not statistician, but I was the tester, and not Mr Friedman wink.gif) it devoids the common sense or at least my overall impression.

Are this kind of analysis adapted to results performed by ONE listener on MULTIPLE samples? I saw:
CODE
FRIEDMAN version 1.24 (Jan 17, 2002) http://ff123.net/
Blocked ANOVA analysis

Number of ***listeners***: 10


Could someone enlight me?

This post has been edited by guruboolez: Jul 13 2004, 02:14
Go to the top of the page
+Quote Post
ff123
post Jul 13 2004, 02:17
Post #29


ABC/HR developer, ff123.net admin


Group: Developer (Donating)
Posts: 1396
Joined: 24-September 01
Member No.: 12



QUOTE (guruboolez @ Jul 12 2004, 05:08 PM)
Are this kind of analysis adapted to results performed by ONE listener on MULTIPLE samples? I saw:
CODE
FRIEDMAN version 1.24 (Jan 17, 2002) http://ff123.net/<!--QuoteEBegin-->Blocked ANOVA analysis<!--QuoteEBegin--><!--QuoteEBegin-->Number of ***listeners***: 10


Could someone enlight me?
*


The tool does make the assumption that if you were to draw a histogram of the music samples by "difficulty," (average rating across all codecs) you would end up with a bell curve. But even if this assumption is violated, it is robust enough that you'd probably still get a reasonable answer.

Short answer: you can replace "listeners" with "music samples" to give an indication of which encoder you personally prefer, with Pio's important qualifications that the results apply only for you, and only for the group of samples you tested.

ff123

This post has been edited by ff123: Jul 13 2004, 02:18
Go to the top of the page
+Quote Post
guruboolez
post Jul 13 2004, 02:30
Post #30





Group: Members (Donating)
Posts: 3474
Joined: 7-November 01
From: Strasbourg (France)
Member No.: 420



But how do you explain the fact than this analysis completely change the conclusions of the listeners? In this exemple, how could lame -V 3 appear as equal to vorbis -q 5,99, for me and for the tested sample, if for me and for the tested samples -V 3 is inferior 80% of the time. It's something I can't understand.

This post has been edited by guruboolez: Jul 13 2004, 02:35
Go to the top of the page
+Quote Post
ff123
post Jul 13 2004, 04:03
Post #31


ABC/HR developer, ff123.net admin


Group: Developer (Donating)
Posts: 1396
Joined: 24-September 01
Member No.: 12



QUOTE (guruboolez @ Jul 12 2004, 05:30 PM)
But how do you explain the fact than this analysis completely change the conclusions of the listeners? In this exemple, how could lame -V 3 appear as equal to vorbis -q 5,99, for me and for the tested sample, if for me and for the tested samples -V 3 is inferior 80% of the time. It's something I can't understand.
*


MGX-q5.99 is better than MP3-V3 with a p-value of 0.088, so it doesn't meet statistical significance, but the numbers suggest it is better. You'd probably get more definitive results with a handful more samples.

ff123
Go to the top of the page
+Quote Post
guruboolez
post Jul 13 2004, 04:13
Post #32





Group: Members (Donating)
Posts: 3474
Joined: 7-November 01
From: Strasbourg (France)
Member No.: 420



OK. Another question: are these "confidence values" linked to the notation, or to the ABX results?
If I had choosen to be close to the EBU (or ITU, I never know) ranking system, with most notations comprise between 4 and 5 (rather than 1 and 4), wouldn't the confidence margin be ruined?
Go to the top of the page
+Quote Post
ff123
post Jul 13 2004, 05:07
Post #33


ABC/HR developer, ff123.net admin


Group: Developer (Donating)
Posts: 1396
Joined: 24-September 01
Member No.: 12



QUOTE (guruboolez @ Jul 12 2004, 07:13 PM)
OK. Another question: are these "confidence values" linked to the notation, or to the ABX results?
If I had choosen to be close to the EBU (or ITU, I never know) ranking system, with most notations comprise between 4 and 5 (rather than 1 and 4), wouldn't the confidence margin be ruined?
*


ABX results are not considered at all when the ANOVA results are computed.

It doesn't matter at all whether you use a ranking scale from 1 to 5 or from 1 to 10. The only thing that matters is the relative difference between the codecs. Also, the fact that the analysis is "blocked" means that the program accounts for fact that some music samples (the difficult ones) have lower average ratings than others.

The single best way to improve confidence in your results is to listen to as many different samples as possible.

ff123
Go to the top of the page
+Quote Post
Pio2001
post Jul 13 2004, 11:15
Post #34


Moderator


Group: Super Moderator
Posts: 3936
Joined: 29-September 01
Member No.: 73



QUOTE (ff123 @ Jul 13 2004, 04:03 AM)
QUOTE (guruboolez @ Jul 12 2004, 05:30 PM)
how could lame -V 3 appear as equal to vorbis -q 5,99, for me and for the tested sample, if for me and for the tested samples -V 3 is inferior 80% of the time. It's something I can't understand.
*


MGX-q5.99 is better than MP3-V3 with a p-value of 0.088,
*



In other words, it's not impossible that -V3 was inferior 80 % of the time by chance, because the difference between the notations are not so big compared to the random variations of your notations.
The result might have been different if I chose a threshold superior to 0.05. Which means that though not significant regarding this threshold, -V3 is nonetheless likely inferior to q5.99. It's likely, but not certain.
Go to the top of the page
+Quote Post
guruboolez
post Jul 13 2004, 12:01
Post #35





Group: Members (Donating)
Posts: 3474
Joined: 7-November 01
From: Strasbourg (France)
Member No.: 420



Anyway, I plan to progressively add more results with time smile.gif
Go to the top of the page
+Quote Post
2Bdecided
post Jul 13 2004, 13:07
Post #36


ReplayGain developer


Group: Developer
Posts: 4945
Joined: 5-November 01
From: Yorkshire, UK
Member No.: 409



Fascinating thread. Thank you guruboolez!

D.
Go to the top of the page
+Quote Post
westgroveg
post Jul 13 2004, 13:16
Post #37





Group: Members
Posts: 1235
Joined: 5-October 01
Member No.: 220



QUOTE (guruboolez @ Jul 13 2004, 11:01 PM)
Anyway, I plan to progressively add more results with time smile.gif
*

Great, thanks a lot for sharing your results with us guruboolez.
Go to the top of the page
+Quote Post
phong
post Jul 13 2004, 15:53
Post #38





Group: Members
Posts: 346
Joined: 7-July 03
From: 15 & Ryan
Member No.: 7619



This may be the thread that pushes me into actually reading some vorbis code. It would be interesting to find the real culprit behind this 5.99 -> 6.0 discontinuity, or at least eliminate some possibilities. For example, it would be interesting to produce an encoder (just for testing purposes) that turned on lossless channel coupling at 5 instead of 6. Based on others' posts though, I doubt that's the culprit.

So where's the gremlin in our cherrios?

Another issue is the whole point of the -q settings... According to vorbis docs, if you pick a -q setting, future versions of vorbis will have the same "quality" at that setting but at a lower bitrate. In the tuned versions that are being produced, mostly the quality has increased but at the expense of increasing bitrate. "In theory" the whole scale could be adjusted so that the same -q levels produced the same bitrates, or if there were some way to quantify quality, they could produce the same quality at a lower bitrate. "In practice" that seems technically difficult, not to mention there is no consistent definition of what each -q level is supposed to achieve, or a standard corpus of music to benchmark bitrates on.

A common question is what the "transparency setting" for a given codec is. Strictly speaking, the answer always is "listen for yourself". For mp3 or mpc, the practical answer is "lame --preset standard" or "mpc --standard". For vorbis, noone can agree because nobody ever decided on any particular "excellence step" (to steal guru's terminology, which I hope becomes a meme). Some will say "start with -q 4 and work your way up", others will recommend -q 5 or -q 6 (which, from these and previous tests, is the one I think is best supported by the evidence.) Even at -q 6, does vorbis even approach the consistency of mpc --standard, or even lame aps?

I guess the good news is that there's lots of interest in tuning vorbis suddenly after what seems like years of inactivity. Maybe there is finally some progress to some sort of excellence step.

This post has been edited by phong: Jul 13 2004, 15:54


--------------------
I am *expanding!* It is so much *squishy* to *smell* you! *Campers* are the best! I have *anticipation* and then what? Better parties in *the middle* for sure.
http://www.phong.org/
Go to the top of the page
+Quote Post
guruboolez
post Jul 13 2004, 16:09
Post #39





Group: Members (Donating)
Posts: 3474
Joined: 7-November 01
From: Strasbourg (France)
Member No.: 420



QUOTE (phong @ Jul 13 2004, 03:53 PM)
This may be the thread that pushes me into actually reading some vorbis code.  It would be interesting to find the real culprit behind this 5.99 -> 6.0 discontinuity, or at least eliminate some possibilities.  For example, it would be interesting to produce an encoder (just for testing purposes) that turned on lossless channel coupling at 5 instead of 6.  Based on others' posts though, I doubt that's the culprit.
*

Uncoupled vorbis encoders were released by QuantumKnot and Aoyumi (or Nyaochi, or Harashin, can't remember), and the coarseness of vorbis disappeared, even at lower -q setting. But this bitrate is seriously higher.

Anyway, the aoTuV tuning severly reduces this problem. But some traces remains...

This post has been edited by guruboolez: Jul 13 2004, 16:12
Go to the top of the page
+Quote Post
ScorLibran
post Jul 14 2004, 04:07
Post #40





Group: Banned
Posts: 769
Joined: 1-July 03
Member No.: 7495



Thanks for the time and effort you put into this test, guru. It provides invaluable info for those of us interested in these codecs, but without enough time to perform the tests ourselves.

I'd like to perform a similar test using a sample set of rock music. Though since my hearing sensitivity isn't NEAR what yours is, I may end up not being able to distinguish any differences at this bitrate. I can at least try, though.
Go to the top of the page
+Quote Post
indybrett
post Jul 14 2004, 04:09
Post #41





Group: Members (Donating)
Posts: 1350
Joined: 4-March 02
From: Indianapolis, IN
Member No.: 1440



@Guruboolez

Do you think Megamix II would improve the results of this test?

Edit: Sorry, I should open that same question up to QuantumKnot smile.gif

This post has been edited by indybrett: Jul 14 2004, 04:12


--------------------
flac>fb2k>kernel streaming>audiophile 2496>magni>dt990 pro
Go to the top of the page
+Quote Post
QuantumKnot
post Jul 14 2004, 04:16
Post #42





Group: Developer
Posts: 1245
Joined: 16-December 02
From: Australia
Member No.: 4097



QUOTE (phong @ Jul 14 2004, 12:53 AM)
A common question is what the "transparency setting" for a given codec is. Strictly speaking, the answer always is "listen for yourself". For mp3 or mpc, the practical answer is "lame --preset standard" or "mpc --standard". For vorbis, noone can agree because nobody ever decided on any particular "excellence step" (to steal guru's terminology, which I hope becomes a meme). Some will say "start with -q 4 and work your way up", others will recommend -q 5 or -q 6 (which, from these and previous tests, is the one I think is best supported by the evidence.) Even at -q 6, does vorbis even approach the consistency of mpc --standard, or even lame aps?

*


One of the problems with Vorbis quality is that it doesn't seem consistent. At -q 4.35, Roberto's 128 kbps listening test showed that aoTuV beta 2 was quite good in quality. But as we go up the q scale, bitrate gets consistently higher, yet problems still exist here and there. There doesn't seem to be a particular q that is transparent. Either it is pre-echo that kills transparency or coarse rendering or something else. I think more tuning needs to be done in the q 5,6,7 range to iron out all these problems.
Go to the top of the page
+Quote Post
QuantumKnot
post Jul 14 2004, 04:18
Post #43





Group: Developer
Posts: 1245
Joined: 16-December 02
From: Australia
Member No.: 4097



QUOTE (indybrett @ Jul 14 2004, 01:09 PM)
@Guruboolez

Do you think Megamix II would improve the results of this test?

Edit: Sorry, I should open that same question up to QuantumKnot smile.gif
*


I think only the wonderful ears of guruboolez or other golden-eared members can answer that question with certainty. For me, the only concern is whether or not I've missed something again while doing the merging. sad.gif
Go to the top of the page
+Quote Post
guruboolez
post Jul 14 2004, 11:32
Post #44





Group: Members (Donating)
Posts: 3474
Joined: 7-November 01
From: Strasbourg (France)
Member No.: 420



QUOTE (indybrett @ Jul 14 2004, 04:09 AM)
@Guruboolez

Do you think Megamix II would improve the results of this test?

Edit: Sorry, I should open that same question up to QuantumKnot smile.gif
*

One of the last file I've add to this first bunch of results is Orion II.wav, which is problematic with vorbis non-GT3 (something like micro-attacks are generated by the trombone). On this sample, the results would probably be much better:

http://audiotests.free.fr/tests/200..._megamix_q5.png
http://audiotests.free.fr/tests/200..._megamix_q6.png

As you can see, I've heard serious improvements with GT3 in a very recent past.

But I don't know if I must retest this sample again: is it acceptable?

A second result might improve with megamix: I think it's with the Weihnachts-Oratorium sample. There's a short passage with brass, and IIRC the feeling I had during the blind test, a slight blurring was audible with the vorbis encodings. But here I don't think that results could be much better.

For the eight other files, I don't know. Maybe the additionnal tunings performed by the SVN team have audible consequence on quality with all samples. Good or bad. Megamix II is released before I saw any test of this 1.1 RC1, and before I have not tested it.

[edit: in bold]

This post has been edited by guruboolez: Dec 29 2005, 21:49
Go to the top of the page
+Quote Post
QuantumKnot
post Jul 14 2004, 11:41
Post #45





Group: Developer
Posts: 1245
Joined: 16-December 02
From: Australia
Member No.: 4097



QUOTE (guruboolez @ Jul 14 2004, 08:32 PM)
For the eight other files, I don't know. Maybe the additionnal tunings performed by the SVN team have audible consequence on quality with all samples. Good or bad. Megamix II is released before I saw any test of this 1.1 RC1, and before I have tested it.
*


The impression I got from Monty's announcement and the commit logs is that 1.1 RC1 is essentially aoTuV beta 2 with some fixes for bitrate management and a tonality bug of some sort. Low pass filter cutoffs have changed (about 18 kHz now for q 4 as opposed to 20 kHz) but I'm not sure if that was in aoTuV beta 2 or is a new tweak.
Go to the top of the page
+Quote Post
indybrett
post Jul 15 2004, 03:04
Post #46





Group: Members (Donating)
Posts: 1350
Joined: 4-March 02
From: Indianapolis, IN
Member No.: 1440



I would really like to see FAAC in future tests, unless it is already known to be so inferior as to not be worth testing.

It's free (sort of), it's gapless, and there are nice encoder/frontends for it.

I could not even guess what quality setting would produce results equal to Lame -APS or Vorbis -q6, or if any quality setting would achieve this level of quality.

Edit: I suppose what I'm really saying is that it would be nice if it were being actively tuned the way Vorbis now is.

This post has been edited by indybrett: Jul 15 2004, 03:38


--------------------
flac>fb2k>kernel streaming>audiophile 2496>magni>dt990 pro
Go to the top of the page
+Quote Post
guruboolez
post Jul 15 2004, 03:39
Post #47





Group: Members (Donating)
Posts: 3474
Joined: 7-November 01
From: Strasbourg (France)
Member No.: 420



Well, I can't test every encoder. I'm not considering faac, because it's not really optimized to my taste. It also suffers from severe problems even at high bitrate, especially with vocal or other tonal signals due to weird short-block artifacts (see the warbling with compostelle.flac).
BTW, even in developer's opinion (Krzysztof aka knik), faac isn't optimized for high bitrate:

QUOTE
I don't think faac is very optimized for high bitrates (and it's still not very optimized at all). I usually use it at ~125kbps.

Author: knik
Date: 11-09-03 16:55

source

faac has improved with time, but this bug is still present.
Nero AAC or a hypothetical gapless QuickTime AAC encoder are preferable in my opinion, though they are not free, and not as friendly as as CLI encoder like faac.

This post has been edited by guruboolez: Jul 15 2004, 03:39
Go to the top of the page
+Quote Post
indybrett
post Jul 15 2004, 03:41
Post #48





Group: Members (Donating)
Posts: 1350
Joined: 4-March 02
From: Indianapolis, IN
Member No.: 1440



Nero would be great, except you have to buy a rather large software package, and then use external software to encode from FLAC or anything worthwhile.

If only iTunes/Quicktime were gapless...


--------------------
flac>fb2k>kernel streaming>audiophile 2496>magni>dt990 pro
Go to the top of the page
+Quote Post
QuantumKnot
post Jul 15 2004, 04:12
Post #49





Group: Developer
Posts: 1245
Joined: 16-December 02
From: Australia
Member No.: 4097



QUOTE (indybrett @ Jul 15 2004, 12:04 PM)
Edit: I suppose what I'm really saying is that it would be nice if it were being actively tuned the way Vorbis now is.
*


If someone gave me an iPod, I'd probably be compelled to working on FAAC since it's in my interest to have a free VBR AAC encoder. laugh.gif Just kidding. wink.gif I wouldn't have much of a clue anyway.
Go to the top of the page
+Quote Post
guruboolez
post Jul 16 2004, 23:22
Post #50





Group: Members (Donating)
Posts: 3474
Joined: 7-November 01
From: Strasbourg (France)
Member No.: 420



indybrett, or someone having some experience of faac > what setting could give me an approximate bitrate for ~175 kbps? I've only a little experience of faac, and according to this, the quality scale (-q) doesn't apparently correspond to a target bitrate (i.e. -q 100 doesn't seem to output 100 kbps, at least with some material - cf. Roberto's 128 AAC test: setting was -q 115, and not -q 128).

I'm interested to give to faac a chance (at least in a preliminary test), but I'm not really motivated to find the ideal setting for that. I also don't want to test something and be flamed for using false or wrong settings. Help would be appreciated smile.gif

This post has been edited by guruboolez: Jul 16 2004, 23:25
Go to the top of the page
+Quote Post

5 Pages V  < 1 2 3 4 > » 
Reply to this topicStart new topic
1 User(s) are reading this topic (1 Guests and 0 Anonymous Users)
0 Members:

 



RSS Lo-Fi Version Time is now: 19th April 2014 - 21:46