Help - Search - Members - Calendar
Full Version: MPC vs OGG VORBIS vs MP3 at 175 kbps
Hydrogenaudio Forums > Hydrogenaudio Forum > Listening Tests
Pages: 1, 2, 3
guruboolez
PRELIMINARY NOTES

• My access to internet is now very limited. Therefore, the encoder I’m using for my tests are not necessary the most recent available on the web. Here, tests were done when vorbis 1.1 RC1 was released, but I didn’t have access to this information…

• This test is something like a work-in-progress. I plan to add more results with time.



I. PURPOSE OF THE TEST

Like many people of this board, my principal motivation for audio encoding lie in the possibility to listen and enjoy music in high quality directly from computer, which allows a very fast browsing and the access to an entire record collection. High quality encoding is a requirement, security a need. I used successively lame mp3, musepack audio and now lossless, which offer the security of identical digital-data with CD.
Nevertheless, lossy encoding is still interesting: modern hard disks are not necessary big enough for all collections, and I think that there’s some benefits to feed expensive digital jukebox with “better than just good” quality audio encodings, like AAC/Vorbis 128 – fine but perfectible.

The choice of the best lossy encoder isn’t really problematic. Musepack (mpc) is still winning most approvals, and is considered as fully transparent with --standard preset. Some elements encouraged me to seriously question this leading position of mpc.

• 1/ by testing occasionally the standard preset of mpc, I discovered that small differences are sometimes audible with usual music. Now if mpc isn’t fully transparent at 175 kbps, this format is definitively comparable (it doesn’t mean “equal”) to other lossy solution, which are suffering from the same report.

• 2/ the leading position of mpc was admitted long time ago. It was defined as “best lossy format” when challengers where not very strong: beta of vorbis, lame < 3.90, suboptimal aac encoders. But now, there are powerful vorbis encoders (the recent “megamix” merging looks like a serious challenger), optimized AAC encoders (QuickTime CBR and Nero VBR), and mature MP3 solutions (VBR presets of lame). The leading position must therefore be questioned again, at least by people able to detect differences.

• 3/ This challenge becomes necessary with the growing numbers of device supporting new audio formats like AAC and Vorbis. MPC is still confined to computer, or in best case on PDA – and is maybe doomed to this limited usage.


In consequence, I’ve tried to oppose to mpc --standard other serious encoding solutions, in order to have a better, modern and personal idea of the relative quality of this encoder compared to modern and convenient challengers.



II. CHALLENGERS

Against musepack --standard, I decided to oppose two formats: MP3 with lame 3.97a3 and OGG VORBIS with the recent combined encoder named “megamix”. Explanations.

• first, no AAC encoder in the arena. I was tempted to use Nero AAC, but the last version I have (2.6.2.0) have some recognized quality problems and is promise to an imminent conceptual death, with the Third version of Ivan Dimkovic encoder. No need to test something outdated… I was also tempted to take QuickTime AAC, though it’s not VBR and not very flexible (nothing between 160 and 192 kbps: annoying for fair comparison with MPC --standard). But this encoder is not really suitable in my opinion for HQ listening, at least when user is found of opera and when most of his CD absolutely need a real gapless playback. AAC will be add later, but for now, it’s absent from this test.


• the choice of lame MP3 version is highly problematic too. Three choices are possible: the last “tested” release (3.90.3), the last gold release (3.96) or the last alpha release (3.97 alpha 3). I’ve decided to not use 3.90.3. I know that for some people this encoder is the best mp3 codec ever released; I also know that for historical reason 3.90.3 is probably the safest choice. But the difference between 3.90.x dead branch and the active 3.9x one is not only related to quality: 3.9x are much faster (not a luxury considering slowness of 3.90.x presets), more complete (full and redesigned VBR preset scale: the nice –V 5 used in Roberto listening test is for example a new feature inaccessible for 3.90.x), and last but not least in perpetual evolution. There’s nobody to correct flaws on 3.90.x, whereas bug audible with 3.9x could be corrected or lowered by Gabriel, Robert, Takehiro and other developers.
I definitively forget the choice of 3.90.x for another important reason: there’s no VBR preset corresponding to the MPC –standard bitrate. –alt-preset standard is clearly too high, --medium too low, -Y switch a hack, and ABR is probably not efficient enough. With 3.9x branch, there’s an existing preset between –standard and –medium: -V 3. And –V 3 average bitrate should be close to the MPC –standard one.

Then: 3.96 “gold” or 3.97 alpha? I’ve decided for the alpha release. I know the risks (for regression but also for progress). But I also know that 3.96 is buggy on –fast mode: it decides me to use a corrected release, even if the test doesn’t concern the fast mode of lame.


• the choice of vorbis version is less problematic. Recent tests were done. CVS/GT3b2 couldn’t resists against aoTuV/GT3/QK32 dream team (aka megamix), at least up to 5,99. And even higher, GT3b2 (previous reference encoder for high bitrate) doesn’t really sound superior (except maybe for one family of problems: micro-attacks). I also recall that I’ve began this test by being unaware of the release of 1.1 RC1. This last encoder nevertheless seems to be inferior to “megamix” (the essential but maybe ‘excessive’ tuning of Garf, used at bitrate > -q 5,00, are apparently missing from this RC1 version). The use of “megamix” is therefore pertinent, and my test is probably not outdated by this enjoying pre-release of oggenc 1.1


• I don’t forget the promising WMApro: I was really pleased and even enthusiastic by the quality reached by this format with classical music at mid-bitrate. Nevertheless, I didn’t include this format in the test. First, I had to limit the number of competitors. Then, I’m not familiar with this encoder and don’t know what setting is the best (which VBR mode? And is WMApro VBR implementation reliable, or isn’t ABR 2-PASS preferable, etc…). Last: still no hardware device for WMApro (though it’s not a reason to exclude an audio format from a test including MPC, it’s a disappointing situation).



III. SAMPLES

Mid/High bitrate tests are, for me at least, especially painful. It doesn't mean that I hate them, quite the opposite. ...
Samples only concern « classical music », with one exception. I deliberately limited my choice on the music I like. It's not by snobbism; and it's not an egocentric attitude: other music is much harder for me to ABX, and my motivation would quickly disappear with music I don't really like. In other word, the impact of these results is VERY LIMITED: they concern my subjectivity (and only mine), and a particular genre of samples (natural instruments, recording according to high-fidelity principles - and not to the marketing “loud” one).
There are solo instruments (organ with Dom Bedos; harpsichord with Fuga; trombones with Orion II), instruments with small accompaniment (cymbals with Krall and Marche Royale, drums with Marche Royale, 2nd part), orchestra (Weihnachts-Oratorium and Platée), chorus (Ich bin der Welt abhanden gekommen) and voice ( “Dover, giustizia, amor” ). Additional information (artist, performer…) are available on file tags.



IV. SETTINGS

Comparing VBR encoders/settings is problematic. The ideal thing is to fix a target bitrate, and then to find the corresponding preset for each encoder. I followed the usual (and IMHO the best) methodology: the setting must be related to a wide selection of music, and not to the selected samples.
The targeted bitrate is the average bitrate of MUSEPACK --standard preset. The average bitrate can’t be evaluated precisely: it’s something comprise between 170 and 180 kbps. 175 kbps approximately. I have verified this value with classical music library, and people have reported similar value with completely different music.
The remaining task is now to find the corresponding VBR settings for LAME MP3 and Vorbis “megamix”.
The problems are beginning…


4.1. VORBIS SETTINGS

• The biggest problem lies in the average bitrate’s difference of vorbis, occurring at the same setting, depending on the kind of encoded music. Classical is bitrate friendly compared to most other stereo and modern material. With CVS encoder, I estimated this difference at 10…15 kbps on average for –q 5…6. With “megamix” (or other GT3b2 based encoders), this difference might reach 25…30 kbps for the same setting. I don’t know what to do…
- by testing vorbis with a –q value corresponding to 175 kbps for classical but 200…210 kbps on pop/rock… people may blame me for opposing to musepack an advantaged vorbis challenger.
- by testing vorbis with a –q value corresponding to a 175 kbps for pop/rock but 140 kbps on classical, the test will be pointless for me (the winner between mpc@175 and vorbis@145 isn’t very hard to guess…).
- by testing vorbis with a half-baked –q value, I fear that the test won’t corresponding to neither of both situation.

• The second big problem is related to vorbis rupture in the linearity of the quality scale. Between -5,99 and -6,00, there’s a consequent bitrate difference (~10 kbps), also corresponding to a serious quality difference, at least with vorbis 1.00 – 1.01 (including GT3b2). aoTuV (and therefore “megamix”) is based on the same code, but the tuning tried to correct or to minimize the quality gap between the two settings. I discovered that for classical music, the fair vorbis setting is very close to this 5,99 value. 6,00 is slightly to high, and I could disadvantage mpc by comparing it to vorbis –q 6,00. On the other side, I have the feeling that -q6,00 would show the full potential of vorbis, and that the extra 8…10 kbps could be worth for daily use. Would someone renounce to the correction of a quality bug at low prince (+5% increase in filesize), especially with archiving in mind? Seriously, I don’t think so.

For all these reasons, I’ve decided to put in the arena vorbis megamix at three different settings:
-q 6,00: clearly to “heavy” compared to mpc --standard with non-classical music, but interesting to test against -q 5,99 (to see if the frontier between these two settings still exists with aoTuV/Megamix/1.1)
-q 5,99: the corresponding setting for a matching bitrate with mpc –standard for classical music (still too heavy with other music), but maybe suboptimal quality for vorbis
-q 5,50: more universal setting for acceptable test against mpc --standard. It would be interesting to compare the quality difference between 5,50/5,99 and 5,99/6,00. I suspect (and fear) a much greater jump between the last pair than with the first one.


4.2. LAME SETTINGS

I discovered that bitrate of –V 3 preset (lame 3.97a3) is really close to the average bitrate of mpc --standard. This applies at least for classical music (I don’t have enough material to measure average bitrate on other musical genre). –V 3 will therefore be tested.
I’ve also decided to add –V 2 (--preset standard). The bitrate is higher, but I really want to see if this historical leading preset of lame MP3 is competitive against musepack. It would also be interesting to see how will perform lame –V 2 compared to vorbis megamix, also playable on portable player, but with bad consequences on battery life.


4.3. BITRATE TABLE

Instead of posting of a bitrate table of the short samples used for the test, I prefer posting data about more audio material. Average bitrate for ~20 albums (classical for most), and additionnal datas for track coming from 50 different CDs (+15 other in mono) are available on the following tables:
OpenOffice: http://audiotests.free.fr/tests/200...RATE175kbps.sxc
Excel: http://audiotests.free.fr/tests/200...RATE175kbps.xls




V. RESULTS AND CONCLUSIONS





First comment: I've add 10 points to each note. I had to find a solution to prevent misinterpretation of notes which could first appear as excessively severe. I didn't use low anchor for this test, and slight flaws sometimes appear as terribly annoying on such tests, lowering very much the notes. By artificially changing all notes, I also had in mind to disconnect the notation I used from the EBU scale (4= "perceptible but not annoying"; 3 = "slightly annoying", etc...).



With 10 results only, I couldn’t make strong conclusions. But some elements of conclusions are now appearing:

MPC –standard has serious chance to be the best of the three competitors. Eight time on the first place, one time second, and never on the last. Very good performances. We could also note that –standard setting wasn’t sufficient for reaching the “transparency” level (except for the organ sample, with negative ABX tests). Nevertheless, I could seriously expect full transparency with higher setting: none of this sample (except maybe the chorus one) showed severe artifacts, but just slight differences. It’s typically the kind of “problems” that disappear with a higher bitrate. Anyway, I’m impressed, because I didn’t thought that MPC –standard was so in advance...

LAME MP3 has few chances in my opinion to compete with vorbis and musepack at ~175 kbps. The new –V 3 setting sit on the last place eight times: too much… even with a limited set of samples. It doesn't mean that -V 3 sounds bad, but it's just inferior to modern lossy format at similar bitrate. But with improvements, who knows...
But the –V 2 setting (aka –alt-preset standard) is apparently competitive, and could fight (and sometime win) with vorbis “megamix” –q5.50 and –q5.99. Only problem: bitrate is not the same anymore (195 kbps vs something comprise between 162 and 180 kbps, but with classical music only). But it’s imperative to precise that LAME –V2 and –V3 suffers from huge artifacts (the harpsichord and the organ samples are severely wounded to my ears), whereas vorbis artifacts were never so bad (except, maybe, with Orion II sample – micro-attack problems).
To be short, LAME –V2 (--preset standard) is apparently competitive with VORBIS “megamix” –q 5,99, at least with classical music. It would be interesting to see how will perform both contenders with other kind of music at the same setting, which implies a completely different bitrate range (+10..15% with vorbis, and maybe – x% with lame).

• I expected a lot from the vorbis mixture. The progress between “megamix” and CVS are really impressive compared to CVS encoder, and I really wondered how it’ll perform against other challengers. I’m finally disappointed. For some reasons:
- First, the coarse sounding problem of the format is still audible with “megamix” up to 5,99. No need to suspect any of GT3b2 or QK32 tuning to ruin the benefits of original aoTuV in this area: the noise problem is particularly audible on “tonal” moments, encoded with pure aoTuV code (bit to bit identical samples between aoTuV encoder and megamix one). This additional noise is probably not too disturbing on daily listening, but on direct comparison with other challengers, the contrast is still annoying. The problem not really lies on noise, but on coarse rendering of voice or instruments: lack of subtlety, fat texture… I think that this problem is a legacy of internal change occurring during RC3 development of Vorbis, in spring 2002. I think I’ve established this fact at ~128 kbps some months ago (correct me if I’m wrong), and I suppose that’s still true at ~160…170 kbps, even with aoTuV (based on the same buggy “final” CVS code).
- Second reason to be disappointed: due to this remaining coarse problem occurring up to –q5,99, there’s still a consequent quality gap between this setting and the rounded -q 6,00. It’s my fault: I’ve expected from aoTuV tuning to erase the existing frontier between –q 5,99 and –q 6,00: this encoder only reduced the gap. There are ~10 kbps difference between 5,50 and 5,99 but few quality improvements. There are also 10 kbps difference between 5,99 and 6,00, but huge quality progress are audible. For a daily use of vorbis encoder, there’s no real problem with this difference: the 10 additional kbps of –q6,00 are obviously worth if someone is looking for high quality or archiving, and there’s no need to hesitate. But for my test or any similar one, this difference is much more problematic. On one side, I couldn’t oppose mpc –standard to megamix –q 6,00 on fair bases (average bitrate doesn’t match anymore). And on the other side, it’s pointless to compare mpc –standard to an handicapped vorbis setting (5,99). It’s like using musepack at –quality 4.99, which also suffers from problems (and bitrate gap) that don’t exist anymore at –quality 5.00. Cruel dilemma…
- Third reason to be disappointed: even at –q 6,00 (and 10 exceeding kbps), megamix couldn’t apparently reach the quality of musepack –standard. More samples are of course needed to enforce this beginning of conclusions, but I really fear the solution doesn’t lie on a selection of samples, but rather on further development.



As I said it at the very beginning, I consider this test as a first step. Additionnal results should and will normally complete this first phase. I expect a quick release of Nero AAC encoder in 3.0.0.0 version to add some spice to the test. External test, opposing vorbis megamix to the new 1.1 must also be done, in order to be sure that megamix is the best vorbis encoder at this bitrate.

I'd also like to see this test followed by other people. It would help to compare different HQ encoders on empirical bases. Feel free to post some results, even for one sample, on this topic.



APPENDIX. SAMPLE LOCATION

I've upload all samples on a temporary link. I couldn't keep them on-line too long. So don't wait if you're planning to do personal tests. ABX logs are available in each archive. Samples are in OptimFROG lossless audio format.
guruboolez
Additionnal results (2004.08.22):




Cumultative results:




see this post on page 4 for more details.
westgroveg
QUOTE
The problem not really lies on noise, but on coarse rendering of voice or instruments: lack of subtlety, fat texture… I think that this problem is a legacy of internal change occurring during RC3 development of Vorbis, in spring 2002.

From what I can understand this this problem disappears after q5.99.
phong
Outstanding work as usual guru. I don't know if it would really solve the fairness issue, but could you increase the mpc setting to 5.1 or 5.2 or something to make it the same bitrate as megamix at -q 6? The bitrates could be matched that way without having to put either codec on the bad side of one of those annoying "thresholds".
guruboolez
QUOTE (westgroveg @ Jul 12 2004, 01:34 AM)
From what I can understand this this problem disappears after q5.99.
*

Yes, the complete range between -q -1 and -q 5,99 is affected by this phenomenon. It's easy to notice with CVS encoders (except 1.1). RC3 and inferior release are probably free of this problem, and aoTuV/1.1 lower the amplitude of coarsness.
It's a great shame that this quality frontier is located so high in the bitrate scale. At -q4 or -q5, it would be less annoying. But here, this fat sound also affect encoding at 170...210 kbps, HQ setting which should be free of this kind of problem.
westgroveg
I think to be a fair test MPC should use q7/Insane profile & LAME 3.90.3 should also be included using --alt-preset standard this would put all formats at a close bit-rate.

The results of LAME 3.90.3 against 3.96 are not convincing enough.
guruboolez
QUOTE (phong @ Jul 12 2004, 01:36 AM)
Outstanding work as usual guru.  I don't know if it would really solve the fairness issue, but could you increase the mpc setting to 5.1 or 5.2 or something to make it the same bitrate as megamix at -q 6? The bitrates could be matched that way without having to put either codec on the bad side of one of those annoying "thresholds".
*

It's a solution, but I don't like it. It's not to the reference to be fit to the challengers, but the opposite. Most people are using --standard preset with mpc. They won't use --quality 5.2 or 5.4 and wasting bits.
The first step of excellence for mpc is --standard, which correspond to ~175 kbps on average with 1.14 encoder. If the first step of excellence for vorbis megamix lies on -q6, which correspond to 185...210 kbps, it's a vorbis handicap (developers choice - good or wrong, I can't say), which proves that the first encoder is more efficient than the second one. In other words, there's an advantage of using mpc: the optimal quality is accessible on lower bitrate. A test shouldn't break this balance.

Anyway, even with lower bitrate, mpc seems to maintain some distance with megamix -q 6,00. I don't expect great changes by using a slightly higher setting for musepack.
westgroveg
QUOTE (guruboolez @ Jul 12 2004, 12:49 PM)
Anyway, even with lower bitrate, mpc seems to maintain some distance with megamix -q 6,00. I don't expect great changes by using a slightly higher setting for musepack.



QUOTE
We could also note that –standard setting wasn’t sufficient for reaching the “transparency” level (except for the organ sample, with negative ABX tests). Nevertheless, I could seriously expect full transparency with higher setting: none of this sample (except maybe the chorus one) showed severe artifacts, but just slight differences.


I think it would be interesting to see.
guruboolez
QUOTE (westgroveg @ Jul 12 2004, 01:59 AM)
QUOTE (guruboolez @ Jul 12 2004, 12:49 PM)
I don't expect great changes by using a slightly higher setting for musepack.



QUOTE
I could seriously expect full transparency with higher setting: none of this sample (except maybe the chorus one) showed severe artifacts, but just slight differences.



It might appear as a contradiction, but according to my past experience, problems are never solved by adding few kbps. A more consequent inflation (from standard to extreme, there's 30 kbps difference) is - I'm sure - needed in most cases to "solve" problems (i.e. lowering the distortion level below the threshold of hearing of the tester).

Adding 0.2...0.5 point to a quality level is rarely convincing: look on the difference between vorbis 5.50 and 5.99: near inexistant.
indybrett
QUOTE (guruboolez @ Jul 11 2004, 07:41 PM)
QUOTE (westgroveg @ Jul 12 2004, 01:34 AM)
From what I can understand this this problem disappears after q5.99.
*

Yes, the complete range between -q -1 and -q 5,99 is affected by this phenomenon. It's easy to notice with CVS encoders (except 1.1). RC3 and inferior release are probably free of this problem, and aoTuV/1.1 lower the amplitude of coarsness.
It's a great shame that this quality frontier is located so high in the bitrate scale. At -q4 or -q5, it would be less annoying. But here, this fat sound also affect encoding at 170...210 kbps, HQ setting which should be free of this kind of problem.
*



Is it coincidence that the problem goes away after q5.99, which also happens to be the point at which lossless channel coupling begins?
kjoonlee
QUOTE (indybrett @ Jul 12 2004, 10:54 AM)
Is it coincidence that the problem goes away after q5.99, which also happens to be the point at which lossless channel coupling begins?
*

Lossless channel coupling can be used below q5.99 as well. Q5.99 and below can use a mixture of lossy and lossless coupling if neccessary. Q6 is the point at which lossy channel coupling is no longer used.
indybrett
QUOTE (kjoonlee @ Jul 11 2004, 09:16 PM)
QUOTE (indybrett @ Jul 12 2004, 10:54 AM)
Is it coincidence that the problem goes away after q5.99, which also happens to be the point at which lossless channel coupling begins?
*

Lossless channel coupling can be used below q5.99 as well. Q5.99 and below can use a mixture of lossy and lossless coupling if neccessary. Q6 is the point at which lossy channel coupling is no longer used.
*


So, is it coincidence that the problem goes away at the point at which lossy channel coupling is no longer used?
kjoonlee
QUOTE (indybrett @ Jul 12 2004, 11:17 AM)
So, is it coincidence that the problem goes away at the point at which lossy channel coupling is no longer used?

Could be, because q5.99 might have been using lossless coupling exclusively.
Faelix
QUOTE (guruboolez @ Jul 11 2004, 08:50 PM)
MPC is still confined to computer, or in best case on PDA – and is maybe doomed to this limited usage.


It would be wonderful if this best case were true, but no: on my Palm I can only listen to MP3, Ogg Vorbis and WMA. And I know the same applies to PocketPC, besides some obscure AAC player. Musepack is unfortunately really confined to computers.
QuantumKnot
Very interesting test. It does confirm the well-known weakness of Vorbis on classical music and more work needs to be put in to correct this. I'm not sure what is causing the difference in quality between -q 5.99 and 6. The switching off of lossy stereo at -q 6 is one but point stereo only causes stereo collapse on high frequency bands. Noise normalisation also affects higher frequencies and turns off at -q 7 I think so that may not be the reason either. hmm....I don't know. unsure.gif
Dologan
QUOTE (Faelix @ Jul 11 2004, 08:33 PM)
QUOTE (guruboolez @ Jul 11 2004, 08:50 PM)
MPC is still confined to computer, or in best case on PDA – and is maybe doomed to this limited usage.


It would be wonderful if this best case were true, but no: on my Palm I can only listen to MP3, Ogg Vorbis and WMA. And I know the same applies to PocketPC, besides some obscure AAC player. Musepack is unfortunately really confined to computers.
*


Hopefully, not for long. wink.gif See here
.
Gabriel
Very interesting...
Pio2001
Thank you for sharing.

Do you have details on the ABX tests ? Did you do them for every sample ? Do you train before beginning ? How much ABX sessions do you perform ? What were the results ?

Last time you posted something like this, no one cared to perform a statistical analysis in order to rank the encoders with 95 % confidence bars. I guess I'll have to do it myself, but since I don't know how, it will take some time.
Pio2001
By the way, did you use ABC/HR ?
guruboolez
QUOTE (Pio2001 @ Jul 12 2004, 11:58 AM)
Do you have details on the ABX tests ?


If you're talking about ABX log and comments, they're all in the .zip archive, accompagning each sample. Not the best idea I must say. I'll upload the log files in a separate and slim archive.


QUOTE
Did you do them for every sample ?


Yes, but for some files, I've renounced to ABX encoded files against encoded files. Sometimes, difference is very small. These kind of tests need much more concentration. I've nevertheless try to compared encoded files each others when they're sharing the same kind of flaw, in order to have a better idea of which sounded better.

QUOTE
Do you train before beginning ?


No. I didn't use the latest ff123 ABC/HR soft (offering a training module). The only training I've done was with the Diana Krall samples. It's a sample I've discovered some times ago, when I noticed that mpc --standard produces audible distortions on the cymbals. I've first began my test mith this sample as dilettante, without ambition, comparing MPC against one Vorbis encoding and one MP3 encoding. Then, I have decided to avoid some possible criticism about bitrate by using a wider set of encodings for vorbis and mp3, in order to see how are performing these file formats even at higher bitrate: at their optimal quality (the "excellence step" for each format: --standard, --alt-preset standard, and -q 6,00).

QUOTE
How much ABX sessions do you perform ? What were the results ?


Generally, I've stopped when pval was low enough. FOr some files, I've ruined the results by doing some mistake. Angry, I've damaged even more the results. Therefore, for some files, I've went up to 50 trials in order to reach again satisfying pval.
But you can find precise values by downloading the archive on my ftp.
robUx4
Could you consider adding WavPack hybrid mode (only using the lossy part) for similar bitrates ? Because if it doesn't perform bad, it could be a serious alternative (you can have both a lossy and lossless file).
guruboolez
QUOTE (robUx4 @ Jul 12 2004, 12:13 PM)
Could you consider adding WavPack hybrid mode (only using the lossy part) for similar bitrates ? Because if it doesn't perform bad, it could be a serious alternative (you can have both a lossy and lossless file).
*

Hybrid encoders have poor performances at this bitrate. At least with classical: they sound terribly noisy, and coarsness of vorbis is nothing comparing to them. These encoders (DualStream and WavPack lossy) are more interesting at ~300 kbps (or maybe lower, with very loud music, like metal). Otherwise, I had include on of these hybrid encoder.
guruboolez
Log files are more easily accessible >> HERE <<
manusate
Very interesting as always, Guruboolez. Thank you very much.



Enjoy!
dev0
Celsus' trolling attempt has been split into the Recycle Bin.
Pio2001
Since you used sequencial ABX tests, with a max number of trials equal to 50, and stopping at p<=0.05, then, according to this post, the corrected p value that you got for the ones that are successful is
p=0.1579
We can see from your logs that among the 60 possible original vs encoded ABX tests, you succeeded 21 of them with p<=0.05. If you were guessing, 9 successes would have been expected instead of 21.
Pio2001
I fed this table in ff123's analyzer :

CODE
MP3-V2    MP3-V3    MPC-q5    MGX-q5.5  MGX-q5.99 MGX-q6    
2.00      1.50      3.00      2.00      2.00      3.20      
1.50      1.00      4.00      2.90      2.90      3.50      
3.00      2.50      2.80      3.00      3.30      4.00      
3.00      2.00      4.00      2.00      2.00      2.30      
1.50      1.00      4.90      2.50      2.50      3.30      
3.00      1.80      3.80      2.20      2.40      3.00      
1.50      1.20      3.50      1.80      2.30      3.40      
1.50      2.70      4.00      2.00      2.00      2.30      
3.00      2.80      4.20      1.60      1.50      3.00      
3.00      2.30      4.00      2.30      2.50      3.50      



I chose Anova, p=0.05, which gives

CODE
FRIEDMAN version 1.24 (Jan 17, 2002) http://ff123.net/
Blocked ANOVA analysis

Number of listeners: 10
Critical significance:  0.05
Significance of data: 1.24E-08 (highly significant)
---------------------------------------------------------------
ANOVA Table for Randomized Block Designs Using Ratings

Source of         Degrees     Sum of    Mean
variation         of Freedom  squares   Square    F      p

Total               59          45.38
Testers (blocks)     9           3.67
Codecs eval'd        5          26.01    5.20   14.92  1.24E-08
Error               45          15.69    0.35
---------------------------------------------------------------
Fisher's protected LSD for ANOVA:   0.532

Means:

MPC-q5   MGX-q6   MGX-q5.9 MP3-V2   MGX-q5.5 MP3-V3  
 3.82     3.15     2.34     2.30     2.23     1.88  

---------------------------- p-value Matrix ---------------------------

        MGX-q6   MGX-q5.9 MP3-V2   MGX-q5.5 MP3-V3  
MPC-q5   0.015*   0.000*   0.000*   0.000*   0.000*  
MGX-q6            0.004*   0.002*   0.001*   0.000*  
MGX-q5.9                   0.880    0.679    0.088    
MP3-V2                              0.792    0.119    
MGX-q5.5                                     0.192    
-----------------------------------------------------------------------

MPC-q5 is better than MGX-q6, MGX-q5.99, MP3-V2, MGX-q5.5, MP3-V3
MGX-q6 is better than MGX-q5.99, MP3-V2, MGX-q5.5, MP3-V3


Conclusion : if I understand properly the above, for Guruboolez' ears and samples,
  • MPC standard is the winner
  • Vorbis Megamix -q6 is second
  • All other are tied at third place.
guruboolez
First, thanks for the analysis. I can't do this. But...

I wonder: lame -V 3 appeared to sound the worst on 8/10 samples; and on one of the two remaining samples, -V 3 obtained the same note than vorbis -q 5,50 and a lower note than -q 5,99. Lame -V3 is sometimes showing weird artifacts (organ, harpsichord), not audible with vorbis.
To be short, lame -V 3 is eight time worse than vorbis -q 5,99, one time eaqual, and one time better, and have the stronger artifacts. That's why it makes no doubt than -V 3 is not competitive against other contenders.


So how is it possible that a statistical tool conclude on the "identity" of both encoders? For me (I'm unfortunately not statistician, but I was the tester, and not Mr Friedman wink.gif) it devoids the common sense or at least my overall impression.

Are this kind of analysis adapted to results performed by ONE listener on MULTIPLE samples? I saw:
CODE
FRIEDMAN version 1.24 (Jan 17, 2002) http://ff123.net/
Blocked ANOVA analysis

Number of ***listeners***: 10


Could someone enlight me?
ff123
QUOTE (guruboolez @ Jul 12 2004, 05:08 PM)
Are this kind of analysis adapted to results performed by ONE listener on MULTIPLE samples? I saw:
CODE
FRIEDMAN version 1.24 (Jan 17, 2002) http://ff123.net/<!--QuoteEBegin-->Blocked ANOVA analysis<!--QuoteEBegin--><!--QuoteEBegin-->Number of ***listeners***: 10


Could someone enlight me?
*


The tool does make the assumption that if you were to draw a histogram of the music samples by "difficulty," (average rating across all codecs) you would end up with a bell curve. But even if this assumption is violated, it is robust enough that you'd probably still get a reasonable answer.

Short answer: you can replace "listeners" with "music samples" to give an indication of which encoder you personally prefer, with Pio's important qualifications that the results apply only for you, and only for the group of samples you tested.

ff123
guruboolez
But how do you explain the fact than this analysis completely change the conclusions of the listeners? In this exemple, how could lame -V 3 appear as equal to vorbis -q 5,99, for me and for the tested sample, if for me and for the tested samples -V 3 is inferior 80% of the time. It's something I can't understand.
ff123
QUOTE (guruboolez @ Jul 12 2004, 05:30 PM)
But how do you explain the fact than this analysis completely change the conclusions of the listeners? In this exemple, how could lame -V 3 appear as equal to vorbis -q 5,99, for me and for the tested sample, if for me and for the tested samples -V 3 is inferior 80% of the time. It's something I can't understand.
*


MGX-q5.99 is better than MP3-V3 with a p-value of 0.088, so it doesn't meet statistical significance, but the numbers suggest it is better. You'd probably get more definitive results with a handful more samples.

ff123
guruboolez
OK. Another question: are these "confidence values" linked to the notation, or to the ABX results?
If I had choosen to be close to the EBU (or ITU, I never know) ranking system, with most notations comprise between 4 and 5 (rather than 1 and 4), wouldn't the confidence margin be ruined?
ff123
QUOTE (guruboolez @ Jul 12 2004, 07:13 PM)
OK. Another question: are these "confidence values" linked to the notation, or to the ABX results?
If I had choosen to be close to the EBU (or ITU, I never know) ranking system, with most notations comprise between 4 and 5 (rather than 1 and 4), wouldn't the confidence margin be ruined?
*


ABX results are not considered at all when the ANOVA results are computed.

It doesn't matter at all whether you use a ranking scale from 1 to 5 or from 1 to 10. The only thing that matters is the relative difference between the codecs. Also, the fact that the analysis is "blocked" means that the program accounts for fact that some music samples (the difficult ones) have lower average ratings than others.

The single best way to improve confidence in your results is to listen to as many different samples as possible.

ff123
Pio2001
QUOTE (ff123 @ Jul 13 2004, 04:03 AM)
QUOTE (guruboolez @ Jul 12 2004, 05:30 PM)
how could lame -V 3 appear as equal to vorbis -q 5,99, for me and for the tested sample, if for me and for the tested samples -V 3 is inferior 80% of the time. It's something I can't understand.
*


MGX-q5.99 is better than MP3-V3 with a p-value of 0.088,
*



In other words, it's not impossible that -V3 was inferior 80 % of the time by chance, because the difference between the notations are not so big compared to the random variations of your notations.
The result might have been different if I chose a threshold superior to 0.05. Which means that though not significant regarding this threshold, -V3 is nonetheless likely inferior to q5.99. It's likely, but not certain.
guruboolez
Anyway, I plan to progressively add more results with time smile.gif
2Bdecided
Fascinating thread. Thank you guruboolez!

D.
westgroveg
QUOTE (guruboolez @ Jul 13 2004, 11:01 PM)
Anyway, I plan to progressively add more results with time smile.gif
*

Great, thanks a lot for sharing your results with us guruboolez.
phong
This may be the thread that pushes me into actually reading some vorbis code. It would be interesting to find the real culprit behind this 5.99 -> 6.0 discontinuity, or at least eliminate some possibilities. For example, it would be interesting to produce an encoder (just for testing purposes) that turned on lossless channel coupling at 5 instead of 6. Based on others' posts though, I doubt that's the culprit.

So where's the gremlin in our cherrios?

Another issue is the whole point of the -q settings... According to vorbis docs, if you pick a -q setting, future versions of vorbis will have the same "quality" at that setting but at a lower bitrate. In the tuned versions that are being produced, mostly the quality has increased but at the expense of increasing bitrate. "In theory" the whole scale could be adjusted so that the same -q levels produced the same bitrates, or if there were some way to quantify quality, they could produce the same quality at a lower bitrate. "In practice" that seems technically difficult, not to mention there is no consistent definition of what each -q level is supposed to achieve, or a standard corpus of music to benchmark bitrates on.

A common question is what the "transparency setting" for a given codec is. Strictly speaking, the answer always is "listen for yourself". For mp3 or mpc, the practical answer is "lame --preset standard" or "mpc --standard". For vorbis, noone can agree because nobody ever decided on any particular "excellence step" (to steal guru's terminology, which I hope becomes a meme). Some will say "start with -q 4 and work your way up", others will recommend -q 5 or -q 6 (which, from these and previous tests, is the one I think is best supported by the evidence.) Even at -q 6, does vorbis even approach the consistency of mpc --standard, or even lame aps?

I guess the good news is that there's lots of interest in tuning vorbis suddenly after what seems like years of inactivity. Maybe there is finally some progress to some sort of excellence step.
guruboolez
QUOTE (phong @ Jul 13 2004, 03:53 PM)
This may be the thread that pushes me into actually reading some vorbis code.  It would be interesting to find the real culprit behind this 5.99 -> 6.0 discontinuity, or at least eliminate some possibilities.  For example, it would be interesting to produce an encoder (just for testing purposes) that turned on lossless channel coupling at 5 instead of 6.  Based on others' posts though, I doubt that's the culprit.
*

Uncoupled vorbis encoders were released by QuantumKnot and Aoyumi (or Nyaochi, or Harashin, can't remember), and the coarseness of vorbis disappeared, even at lower -q setting. But this bitrate is seriously higher.

Anyway, the aoTuV tuning severly reduces this problem. But some traces remains...
ScorLibran
Thanks for the time and effort you put into this test, guru. It provides invaluable info for those of us interested in these codecs, but without enough time to perform the tests ourselves.

I'd like to perform a similar test using a sample set of rock music. Though since my hearing sensitivity isn't NEAR what yours is, I may end up not being able to distinguish any differences at this bitrate. I can at least try, though.
indybrett
@Guruboolez

Do you think Megamix II would improve the results of this test?

Edit: Sorry, I should open that same question up to QuantumKnot smile.gif
QuantumKnot
QUOTE (phong @ Jul 14 2004, 12:53 AM)
A common question is what the "transparency setting" for a given codec is. Strictly speaking, the answer always is "listen for yourself". For mp3 or mpc, the practical answer is "lame --preset standard" or "mpc --standard". For vorbis, noone can agree because nobody ever decided on any particular "excellence step" (to steal guru's terminology, which I hope becomes a meme). Some will say "start with -q 4 and work your way up", others will recommend -q 5 or -q 6 (which, from these and previous tests, is the one I think is best supported by the evidence.) Even at -q 6, does vorbis even approach the consistency of mpc --standard, or even lame aps?

*


One of the problems with Vorbis quality is that it doesn't seem consistent. At -q 4.35, Roberto's 128 kbps listening test showed that aoTuV beta 2 was quite good in quality. But as we go up the q scale, bitrate gets consistently higher, yet problems still exist here and there. There doesn't seem to be a particular q that is transparent. Either it is pre-echo that kills transparency or coarse rendering or something else. I think more tuning needs to be done in the q 5,6,7 range to iron out all these problems.
QuantumKnot
QUOTE (indybrett @ Jul 14 2004, 01:09 PM)
@Guruboolez

Do you think Megamix II would improve the results of this test?

Edit: Sorry, I should open that same question up to QuantumKnot smile.gif
*


I think only the wonderful ears of guruboolez or other golden-eared members can answer that question with certainty. For me, the only concern is whether or not I've missed something again while doing the merging. sad.gif
guruboolez
QUOTE (indybrett @ Jul 14 2004, 04:09 AM)
@Guruboolez

Do you think Megamix II would improve the results of this test?

Edit: Sorry, I should open that same question up to QuantumKnot smile.gif
*

One of the last file I've add to this first bunch of results is Orion II.wav, which is problematic with vorbis non-GT3 (something like micro-attacks are generated by the trombone). On this sample, the results would probably be much better:

http://audiotests.free.fr/tests/200..._megamix_q5.png
http://audiotests.free.fr/tests/200..._megamix_q6.png

As you can see, I've heard serious improvements with GT3 in a very recent past.

But I don't know if I must retest this sample again: is it acceptable?

A second result might improve with megamix: I think it's with the Weihnachts-Oratorium sample. There's a short passage with brass, and IIRC the feeling I had during the blind test, a slight blurring was audible with the vorbis encodings. But here I don't think that results could be much better.

For the eight other files, I don't know. Maybe the additionnal tunings performed by the SVN team have audible consequence on quality with all samples. Good or bad. Megamix II is released before I saw any test of this 1.1 RC1, and before I have not tested it.

[edit: in bold]
QuantumKnot
QUOTE (guruboolez @ Jul 14 2004, 08:32 PM)
For the eight other files, I don't know. Maybe the additionnal tunings performed by the SVN team have audible consequence on quality with all samples. Good or bad. Megamix II is released before I saw any test of this 1.1 RC1, and before I have tested it.
*


The impression I got from Monty's announcement and the commit logs is that 1.1 RC1 is essentially aoTuV beta 2 with some fixes for bitrate management and a tonality bug of some sort. Low pass filter cutoffs have changed (about 18 kHz now for q 4 as opposed to 20 kHz) but I'm not sure if that was in aoTuV beta 2 or is a new tweak.
indybrett
I would really like to see FAAC in future tests, unless it is already known to be so inferior as to not be worth testing.

It's free (sort of), it's gapless, and there are nice encoder/frontends for it.

I could not even guess what quality setting would produce results equal to Lame -APS or Vorbis -q6, or if any quality setting would achieve this level of quality.

Edit: I suppose what I'm really saying is that it would be nice if it were being actively tuned the way Vorbis now is.
guruboolez
Well, I can't test every encoder. I'm not considering faac, because it's not really optimized to my taste. It also suffers from severe problems even at high bitrate, especially with vocal or other tonal signals due to weird short-block artifacts (see the warbling with compostelle.flac).
BTW, even in developer's opinion (Krzysztof aka knik), faac isn't optimized for high bitrate:

QUOTE
I don't think faac is very optimized for high bitrates (and it's still not very optimized at all). I usually use it at ~125kbps.

Author: knik
Date: 11-09-03 16:55

source

faac has improved with time, but this bug is still present.
Nero AAC or a hypothetical gapless QuickTime AAC encoder are preferable in my opinion, though they are not free, and not as friendly as as CLI encoder like faac.
indybrett
Nero would be great, except you have to buy a rather large software package, and then use external software to encode from FLAC or anything worthwhile.

If only iTunes/Quicktime were gapless...
QuantumKnot
QUOTE (indybrett @ Jul 15 2004, 12:04 PM)
Edit: I suppose what I'm really saying is that it would be nice if it were being actively tuned the way Vorbis now is.
*


If someone gave me an iPod, I'd probably be compelled to working on FAAC since it's in my interest to have a free VBR AAC encoder. laugh.gif Just kidding. wink.gif I wouldn't have much of a clue anyway.
guruboolez
indybrett, or someone having some experience of faac > what setting could give me an approximate bitrate for ~175 kbps? I've only a little experience of faac, and according to this, the quality scale (-q) doesn't apparently correspond to a target bitrate (i.e. -q 100 doesn't seem to output 100 kbps, at least with some material - cf. Roberto's 128 AAC test: setting was -q 115, and not -q 128).

I'm interested to give to faac a chance (at least in a preliminary test), but I'm not really motivated to find the ideal setting for that. I also don't want to test something and be flamed for using false or wrong settings. Help would be appreciated smile.gif
This is a "lo-fi" version of our main content. To view the full version with more information, formatting and images, please click here.
Invision Power Board © 2001-2009 Invision Power Services, Inc.