Last summer I performed a blind listening comparison between three different audio formats, all set for ~175 kbps encodings. The purpose of the test was to investigate about encoding quality with classical music (and only classical) and to see which format would be the most efficient (i.e. the closest to transparency at lowest bitrate possible) for this kind of music. As jumping-off place for bitrate I took MPC –standard preset which was indisputably recognized as the best encoding solution outputing at 175...190 kbps on average. And indeed, the test ended on musepack superiority. MPC was even superior to Vorbis and MP3 at presets presenting higher bitrate (~195 kbps for LAME, ~185 for Vorbis against ~175 for musepack). Consequently, MPC encodings appeared to sound better and to be smaller at the same time. Amazed by the existent gap between all contenders I conclude my specific test with these words: “I didn’t think that MPC –standard was so in advance”.
My vacation are now quite over. I performed during my free time a big checkup of lossy quality at 80 kbps and 96 kbps (this one has to be translated in english
Why doing again the same test?
As a result of constant evolution of most audio encoders I consider my previous results as really outdated. I recall that Vorbis encodings were done with MEGAMIX I (hybrid encoder melting aoTuV beta 2, Garf Tuned 2 and Quantum Knot tunings). This encoder didn't subsist for a long time... and doesn’t exist anymore; it was replaced by MEGAMIX II, then official 1.1 with Impulse Trigger Profile + Impulse Noisetune switches, which was finally followed by aoTuV beta 3 and beta 4. The same goes for LAME: 3.97 alpha 3 was tested, and during this time LAME developers have submitted eight new versions of this alpha and a few other ones (lame_may, lame_june...)! MPC has also changed: from 1.14 beta to 1.15 alpha which is now considered as safe to use.
As a consequence of this evolution, problems audible last years (kind of ringing for LAME, or noise and coarseness for Vorbis) may be corrected or at least be lowered. The first purpose of my test is therefore to check the outcomes of recent tunings done for high bitrate settings.
There’s also a second point which stimulated me to do again the test and this point is called AAC. I haven’t tested AAC last year for technical and moral reasons. Technically, iTunes encoder couldn’t be set to ~175 kbps; Apple's AAC encoder wasn't also gapless and is for my purpose unsuitable for my conception of artefact-free encodings. I also felt as dishonest the inclusion of Nero AAC: it had recognized issues with classical first and a new encoder supposed to solve these problems was announced as imminent. Some readers suggest me to include faac as competitor, but I felt as unfair to test an encoder which was probably not the state of the art of AAC format and to compare it to the most advanced implementation of other formats (MEGAMIX and LAME 3.97).
I never regret my choice. But this absence of AAC frustrated my curiosity for a long time, because I had strictly no idea about comparative performance of this format with other contenders. That’s why I decided to absolutely include AAC this time. WMAPro will also be tested this time if possible.
The purpose of my test is therefore to obtain a fresh photography of the current performance of all modern lossy formats with classical music using the most advanced implementations for each of them.
I. Choosing the encoders
My purpose being to test most advanced encoders the choice of format hasn't to be controversial for most of them:
• MP3: LAME 3.97 alpha 11. Release date: July 2005. Note: --vbr-new encoding mode.
• MPC: mppenc 1.15v. Release date: march 2005.
• Vorbis: aoTuV beta 4. Release date: June 2005, updated in July 2005 (merged with SVN 1.1.1).
• WMAPro: no choice here: it's 9.1 or nothing. Release date: during 2004.
Choosing the good AAC encoder is much harder:
• Apple AAC: There's still no VBR mode with iTunes. Consequently it's currently impossible to use Apple's AAC encoder unless other contenders will output an average bitrate close to either 160 kbps or 192 kbps. It's unlikely...
• Nero Digital AAC: the most advanced VBR AAC encoder and therefore better placed to represent the AAC format. Big problem: should I use the 'high' and defaulted encoder or rather the 'fast' one which is really better at lower bitrate with classical music? The first one is still recommended by all Nero's developers and it's a valid reason to choose it instead of something they don't consider as stable enough (Garf, JohnV and Ivan Dimokovic). But the situation has maybe changed since their recommendation; I wouldn't also discard too quickly the possibility of using an encoder working better for the difficulties proper to the musical genre I'll test. The debate could be endless if a trivial but objective argument hadn't close the debate: the average bitrate of VBR mode of both encoders (see below).
• faac AAC: testing faac might also be interesting. And even for fun, it would give me the possibility to oppose four different open-source implementations of four different formats
II. Targeting a bitrate
The purpose of my test is not to see what encoders could do with xxx kbps for each sample; I don't plan to force each encoding reaching a precise bitrate. My purpose is to stay close to the real usage of a vast majority of listeners (if not all...): using for every encoding one fixed setting which should statistically corresponds on average to the desired bitrate. That's why it's really fundamental to precisely know the average bitrate corresponding to a defined preset. And there's only one way to get it: encoding several tracks or albums.
Last year, I used as reference ~20 classical (+3 non-classical) albums. This year, I decided to be more methodical. I’m now using 150 different tracks (I mean full tracks) coming from 150 different CD in order to increase the variety of encoded tracks. It’s important to note that I didn’t choose randomly those tracks. I meticulously worked to get a representative microcosm of my full classical library, balanced between different grand ensemble (vocal, orchestral, chamber, soloist recording). This collection is nothing more than the 150 full tracks from which I’ve extracted 150 short samples in order to build a “catalogue raisonné” of musical situations occurring with classical music (see this test).
I genuinely expect from this methodically constructed library to be a highly representative panel of my classical collection. My assumption could be verified by checking the average bitrate of the entire collection encoded with WavPack -fx5 (all my >1000 CD digital library is encoded with this preset): 642 kbps for the selection of 150 tracks against 635 kbps for a complete set of more than 15000 tracks. The deviation is inferior to 1%!
III. Observing bitrates
I started with MPC which must give the reference bitrate. All other competitors have to be set in order to get a similar value.
• MPC: --quality 5 corresponds precisely to 184,54 kbps. This is higher from what I expected first (~175 kbps). The 150 reference tracks are maybe not as representative as supposed. I also tried 1.14 (used last year) with the same preset and --xlevel: 176,28 kbps, much closer to the native average bitrate of --standard profile and reassuring me about the representativity of my collection of tracks. The bitrate has therefore inflated by 4.7% from 1.14 to 1.15v with classical.
=> I'll therefore try to get from all other encoders a setting which outputs to 184,5 kbps ±2% (180,5...188,1 kbps).
• MP3: I first tried -V2 --vbr-new, which corresponds to the former --preset fast standard. Average bitrate is 181,79 kbps. Now, this value is lower from what I estimated last year (and that's why I tested -V3 in addition to -V2)... Indeed, 3.97alpha3 -V2 would output to 192,99 kbps. Nice gain (-5.80%). Obviously LAME developpers also worked on efficiency. Gain is great enough that LAME --preset standard could now be fairly compared to MPC --standard. But I recall another time that it only applies for classical (I suppose that bitrate is higher with other musical suffering from sb21 issue).
• Vorbis: aoTuV beta 4 -q6,00 leads to 181,48 kbps. This is lower than what I expected, and it's also lower than MPC --standard bitrate. I get 186,99 kbps for the old MEGAMIX I. Bitrate has therefore be lowered with latest aoTuV.
-q6,00 could therefore be directly compared to MPC --standard and LAME --preset fast standard (for classical music).
• WMAPro: VBR75 leads to 150,24 kbps. The next available preset is VBR90 and it leads to 203,96 kbps. Both are very far for the range I fixed and consequently WMAPro can't compete in this test.
• Nero Digital AAC: Like LAME and WMAPro Nero Digital doesn't offer any precise VBR scale but seven presets. -internet leads to ~142 kbps for both 'high' and 'fast' encoders. -streaming high corresponds to 176,14 kbps and -streaming fast to 193,33 kbps. Consequently none of them is inside the fixed range; the closest one is -streaming high and is therefore the less unacceptable solution (I recall that the 'high' encoder is still the recommended one).
• faac AAC: this is the only encoder able to fit into the fixed bitrate range (thanks to the precise VBR scale alla vorbis & mpc). AAC faac –q 175 leads to 180,92 kbps. This –q setting won’t probably correspond to 180 kbps with other musical genre and that’s the occasion to recall another time that the whole test is specific to classical music and nothing else.
Recapitulative table
CODE
bitrate_2004 bitrate_2005 evolution in kbps ...in %
MPC 176,28 184,54 +8,26 kbps +4,69 %
MP3 192,99 181,79 -11,20 kbps -5,80 %
Vorbis 186,99 181,48 -5,51 kbps -2,95 %
AAC faac not tested 180,92 -- --
AAC Nero not tested 176,14 -- --
=> faac, LAME, aoTuV are very close each others (difference is inferior to 0,9 kbps!). MPC presents a higher bitrate (+3 kbps) and Nero Digital a lower one (-5 kbps). The gap between the extreme is worrying: approximately 5% corresponding to 8 kbps. That's not a huge difference but these eight missing kbps may lead to a significant difference in quality. I could discard Nero Digital for this test but I would consider this choice as a mistake. For my own curiosity I'm also very impatient to see how would perform an advanced implementation of AAC in comparison to other formats, even if bitrate are not fully comparable.
=> As a consequence I decided to test both Nero Digital AAC and faac AAC, and I will consider Nero Digital presence as a "bonus" interesting to watch rather than an entire competitor. That's why my final diagramme (plots) will graphically separate Nero AAC results from other contenders. I hope this will avoid unecessary debate about any kind of unfairness based on bitrate disparity.
SUMMARY
Are going to be test:
• AAC: faac 1.24.1. Release date: end 2004 (?). Setting: -q175
• AAC: Nero Digital aacenc32 v.3.2.0.15. Release date: June 2005. Setting: -streaming (high/default encoder).
• MP3: LAME 3.97 alpha 11. Release date: July 2005. Setting: -V2 --vbr-new
• MPC: mppenc 1.15v. Release date: march 2005. Setting: --quality 5
• Vorbis: aoTuV beta 4 based on 1.1.1. Release date: July 2005. Setting: -q6,00
IV. Additional information
I performed all my last listening tests on a Creative Audigy 2 soundcard, which resamples everything to 48000 KHz. Some people consider that internal resampling (transparent in my opinion) is treating unfairly musepack and would biased any listening test. To cut the controversial short, I installed my (better) Terratec DMX6Fire 24/96 which doesn't resample 44100 KHz files (I'm not using it anymore for daily listening because of interference with my VIA chipset).
HARDWARE & SOFTWARE SETTINGS:
• soundcard: Terratec DMX6Fire 24/96
• headphone: BeyerDynamic DT-531
• amp: Onkyo MT-5
• software player: Java ABC/HR 0.5 beta 5.
• software decoder: foobar2000 0.83 (in order to automatically get files free of offset and to solve my incompatibility issues occuring with Vorbis).
TESTING PRINCIPLES:
• ABX phase : To limit the listening fatigue and to end the test before I left my appartment, I restricted the ABX tests to the most transparent encodings (note > 4.00).
• Number of trials : eight trials as a minimum. I recall that schnofler's ABC/HR software doesn't reveal to score until the test is closed by the user (and it also can't be resume). Therefore the number of trails hasn't to be fixed: as long as score is hidden the pval isn't ruined. That's why I add more trials when I suspect bad results. I never exceed 16 trials: if something is really transparent I didn't persecute the encoding
• Notation : My notation was very severe last year, with a full dynamic range of notation (a lot of notes were inferior to 2.0). That's why I decided to add 10 points to each score (in order to disconnect the notation from the usual corresponding scale). This year, I tried to respect the ITU scale. When a difference is audible but not really annoying, the notation is at least equal to 4.0 and my hairs must stand on end to allow a notation inferior to 2.0 (from "annoying" to "very annoying"). Notation is still severe (I keep in mind that all encodings were set at 180 kbps) and that's why results I get here can't absolutely not be compared to other listening tests I done, especially those performed for low bitrate settings. By the way, there are no anchors in this test (high anchor is of course unecessary here).
• Samples: Same as last year. See this thread.
• Gain: I hadn't modify the gain of any file. All were played at their original volume.
