AAC: Ahead vs Apple (end 2004)

2004-12-09 07:23:27

Last year, I’ve tested various audio formats at 130 kbps, based on a classical sample suit (15 samples), and including among different audio formats many AAC encoders. Conclusions were highly positive for QuickTime AAC which came at first place by producing the most amazing results I’ve never heard at 128 kbps. But they were also very severe for Nero AAC: quality was simply awful with most samples, and it finished on last position. Obviously, Nero AAC suffered a lot with “classical” music – and, according to the collective test leaded by rjamorim, this encoder (updated) was also inferior to Apple AAC with other kind of music. I’ve found this situation annoying, because Ahead encoder is on PC much more flexible (gapless, many GUI allowing transcoding from various source with tags report, SBR…) than Apple’s encoder. I know that Ivan Dimkovic tried to find the cause of these troubles — probably a difficult and long task. Therefore, I promised to myself to not publish any test including an unfinished AAC encoder: by respect for Ivan first, and then to let him the necessary time to tweak his encoder.

Yesterday, I’ve played a bit with the latest public encoder (2.9.9.998: final 3.0.000 is now very close). Quality is clearly far from disaster I’ve heard last year and now become more than acceptable. Nevertheless, usual problems are still audible: lifeless sound (the encoder removes some precious musical information -> hollow sounding), and sometimes severe distortions (big frequencies holes). Disappointed, I’ve decided to not investigate further, and to wait again for the next big step. For curiosity, I’ve also performed small benches (speed, bitrate), and noticed a strange thing. Though 3.0.000 should be release in the next future, there are still some incoherencies in current encoder. For example, the “fast” mode is not faster than “high” (default mode), and even stranger, bitrate could be completely different. On harpsichord tracks for example, -internet VBR profile lead to ~180 kbps with “fast” and with normal 128 kbps with “high”. Is “fast” mode a totally different codec? An experimental one? Or the future 3.0.00? Suspicious, I’ve encoded in “fast” mode some problem samples, and all quality problems noticed with the default mode completely disappeared (bitrate between “high” and “fast” were similar, and in some case really inferior: see this exemple). In order to avoid some unjustified enthusiasm, I performed quick blind tests: results were simply beyond my optimism, and they confirmed the serious quality difference existing between the two settings. Same conclusions for CBR 128: “fast” is clearly better than “high” with all samples I’ve tested. Not only better, but also excellent — and this is certainly the most interesting point.

Immediate problem with VBR “fast” is the average bitrate. Slightly higher than 128 kbps on average (tested on few tracks), it sometimes jump very high with specific tracks (not only short samples, but full albums are apparently concerned)… especially for the tracks I like the most :/. This situation is annoying in two cases:
• When filling a small flash player (the future iPod flash for example, but it could also be a problem for iPod mini), excessive bitrate will drastically reduce the amount of embedded music.
• When doing a complete listening test: people will always complain about biased results, etc…
Both problems disappear with CBR/ABR encodings. Now, if Nero CBR is close if not better – who knows? – than a VBR mode, why not compare it with current reference, iTunes AAC (CBR only), which could loose its crown. After all, there are some suspicions around Apple new encoders (since strange frequencies behaviour introduced with QuickTime 6.5.1), and Ahead encoder has a chance to win this match. A full test is definitively worth.

Therefore, I’ve decided to do this test now, and to not wait for the final release of Nero AAC 3. But I’m not interested by a basic bloody competition between Apple and Ahead products; I’d like to answer to other interesting questions, like:

• how much differences between Ahead CBR and Ahead VBR? Is VBR necessary better? On all samples? Are those few kbps added by –internet VBR mode worth compared to CBR 128?
• did iTunes progress during one year? Quality was so good last year… Or has it really worsened, as noticed or maybe feared by some people? Old iTunes winning codec had a lowpass cutoff fixed at 16 KHz whereas the new one has now reached 18000 KHz: is it really a good thing?
• how much progress for Ahead AAC? Big jump or limited improvements during one year?

In order to answer to these questions, I’ve decided to take as samples the same I used last year, and to compare directly, on the same test, the old encodings (Nero-looser-2003 & Apple-winner-2003) to the new line (Nero-2004 & Apple-2004). Therefore, challengers are:
—   iTunes 4.1.052 AAC (corresponding to QuickTime 6.4) at 128 kbps [CBR]
—   iTunes 4.7.042 AAC (corresponding to QuickTime 6.5.2) at 128 kbps [CBR]
—   Nero AAC 2.5.9.7 at 128 kbps [CBR rather than VBR: CBR was slightly better on average than VBR on last test]
—   Nero AAC 2.9.998 at CBR 128 kbps “fast”
—   Nero AAC 2.9.998 at VBR –internet “fast”.

By introducing in the same arena both CBR and VBR for Ahead encoder, results could satisfy two categories of people:
—   those which can’t bear bananas and orange at the same time, and refuse to consider any comparison between a CBR encoder and a VBR one. They could compare Apple and Ahead products performance at exactly the same file size.
—   those which don’t see any reason to handicap complete encoders, by forcing constant bitrate when VBR is probably a more efficient (best quality and/or smaller file size) solution. They could compare Ahead at his (supposed) best encoding mode and Apple performances, which is CBR only.

Few words about methodology.

I used an Audigy2 soundcard and a Beyerdynamic DT-531 headphone. ABC/ABX tool was ff123’s ABC/HR. Following Pio2001 recommendations, I’ve ABXed all samples using fixed number of trials; but for personal reasons, I’ve choose two different values: eight when encoding is easy to betray, sixteen in other cases or when some stupid mistakes occurred on the 8 first trials. I’ve just made one exception with one sample and one encoding (Mozart – Requiem and iTunes 4.7). When I failed an ABX test, I didn’t bring the ABC notation to 5.0 (which could be considered as a logical conclusion), but I usually consider again all notations during/after ABX sessions.
Last, according to Gabriel’s recommendations I didn’t listened the first second of each sample (removed by selecting a specific range through ABC/HR). I focused the notation on the first seconds (5…10 sec.) in order to upload smaller files.

And now: [span style='font-size:14pt;line-height:100%']results[/span] :-)

>> click on image/link for results <<

[span style='font-size:14pt;line-height:100%']FEW COMMENTS[/span]

• iTunes 4.1.052 AAC encoder was the very best encoder last year, even superior to all other tested audio formats (including vorbis or wmaPRO 2-pass). This time, the same encoder on the same samples finishes at last place! Developers were obviously very active in 2004! Old iTunes produced the worse sound on 6 samples (on 15). Last year, with less aggressive challengers, this encoder obtained 4.0 as final score (=“perceptible difference but not annoying”). Now, artefacts are less acceptable in regard of concurrence (final score: 3.0/5 = “slightly annoying”).

• iTunes 4.7.042 AAC encoder was slightly inferior to his cousin on two samples only (Compostelle.wav: slight pre-echo on a bell; Mahler: additional distortions on brass), but superior on 10 other samples. In other words, the feared or claimed regression doesn’t happen, and there are even clearly audible progresses. Basing my opinion on this limited test, I don’t see any reason to keep old iTunes software. Few words about quality: sound is less lowpassed (it’s clearly audible with some specific instruments: cymbals, harpsichord…), and this amount of extra-information doesn’t increase the level of audible distortions. There’s also no ringing (at least not more than the old encoder). Some problems are unfortunately still here, like annoying pre-echo amount on a piano piece (Brahms – Hungarian Dance 6). Very nice overall quality!

• Nero 2.5.9.7. Intolerable quality! It wasn’t a surprise: I used it as low anchor, and was perfect in this role (with one exception: Haendel (female), or Hercules.wav).

• Nero 2.9.998 CBR. Different from Apple AAC CBR but performances are absolutely tied. Better with 7 samples, and worse with the eight others. This encoder loses a lot of point with the two harpsichord tracks (Apple’s encoder is excellent here). People could notice that this encoder is better on average (on 9 samples) than the winning encoder of last test! Amazing progress in one year!

• Nero 2.9.998 VBR. Probably one of the best things I’ve never heard at this bitrate (the other strong competitor could be vorbis AoTuV: a direct comparison would be highly interesting). It outperforms both CBR encoders. Best sound on 10 samples. Never worse (which means that VBR mode is reliable and that it apparently doesn’t suffer from unusual situation, like low volume/bitrate encoding). Only problem: some bitrate inflation, sometimes strange to explain (the organ sample for example, slightly worse quality than CBR 128, but bitrate >200…300 kbps on one short moment), sometimes highly judicious (Bayle sample: perfect reproduction of this complex sound; real progress with harpsichord compared to CBR mode, but still inferior to Apple CBR). Probably some progress margin for pre-echo.

[span style='font-size:14pt;line-height:100%']FEW STATISTICS[/span]

I fed ff123 analysis tools with the following table:

Code: [Select]

QT64    QT652    OLDNERO    NEROCBR    NEROVBR
2.0    3.0    1.0    2.0    4.5
2.5    3.0    1.5    4.0    5.0
3.0    3.5    1.5    4.0    5.0
3.0    4.0    1.0    5.0    4.5
3.0    3.2    1.0    4.0    4.0
5.0    5.0    4.4    5.0    5.0
3.5    3.0    1.0    2.5    4.5
4.0    4.5    1.5    2.0    3.5
2.5    2.5    1.5    3.0    4.5
3.0    2.5    1.5    3.5    3.5
3.0    3.0    1.5    4.5    4.0
2.5    3.3    1.5    4.0    4.0
3.0    3.8    1.5    2.0    3.0
2.5    3.5    1.6    3.0    4.5
3.2    3.7    1.4    2.7    3.2

• ANOVA conclusions

Code: [Select]

FRIEDMAN version 1.24 (Jan 17, 2002) http://ff123.net/
Blocked ANOVA analysis

Number of listeners: 15
Critical significance:  0.05
Significance of data: 2.10E-014 (highly significant)
---------------------------------------------------------------
ANOVA Table for Randomized Block Designs Using Ratings

Source of         Degrees     Sum of    Mean
variation         of Freedom  squares   Square    F      p

Total               74         100.71
Testers (blocks)    14          21.40
Codecs eval'd        4          56.20   14.05   34.05  2.10E-014
Error               56          23.11    0.41
---------------------------------------------------------------
Fisher's protected LSD for ANOVA:   0.470

Means:

NEROVBR  QT652    NEROCBR  QT64     OLDNERO  
  4.18     3.43     3.41     3.05     1.56   

---------------------------- p-value Matrix ---------------------------

         QT652    NEROCBR  QT64     OLDNERO  
NEROVBR  0.002*   0.002*   0.000*   0.000*   
QT652             0.932    0.105    0.000*   
NEROCBR                    0.124    0.000*   
QT64                                0.000*   
-----------------------------------------------------------------------

NEROVBR is better than QT652, NEROCBR, QT64, OLDNERO
QT652 is better than OLDNERO
NEROCBR is better than OLDNERO
QT64 is better than OLDNERO

Two reliable conclusions based on my hearing and on 15 classical samples:
- old Nero AAC encoder is inferior to all other challengers
- new Nero AAC encoder, with VBR –internet and “fast” mode is superior to all other challengers.
- old iTunes, new iTunes and new Nero CBR are tied

• TUKEY PARAMETRIC conclusions

Code: [Select]

FRIEDMAN version 1.24 (Jan 17, 2002) http://ff123.net/
Tukey HSD analysis

Number of listeners: 15
Critical significance:  0.05
Tukey's HSD:   0.662

Means:

NEROVBR  QT652    NEROCBR  QT64     OLDNERO  
  4.18     3.43     3.41     3.05     1.56   

-------------------------- Difference Matrix --------------------------

         QT652    NEROCBR  QT64     OLDNERO  
NEROVBR    0.747*   0.767*   1.133*   2.620* 
QT652               0.020    0.387    1.873* 
NEROCBR                      0.367    1.853* 
QT64                                  1.487* 
-----------------------------------------------------------------------

NEROVBR is better than QT652, NEROCBR, QT64, OLDNERO
QT652 is better than OLDNERO
NEROCBR is better than OLDNERO
QT64 is better than OLDNERO

Conclusions are exactly the same than ANOVA analysis.

Applauses for Apple and Ahead developers :-)

• ABX logs are available here
• SAMPLES (11 MB) are available here. Keep them: I can’t keep them very long time online.

AAC: Ahead vs Apple (end 2004)

Reply #1 – 2004-12-09 07:52:48

it is always a pleasure to see you make tests guruboolez, thank you for your efforts.

AAC: Ahead vs Apple (end 2004)

Reply #2 – 2004-12-09 09:29:42

EDIT: LINK for result is now working.... I've also changed the link directing to the wrong old test.

Digga> thanks

Ahead team> some clarifications about this “fast” encoder, which isn't faster, but seems to be a different encoder?

AAC: Ahead vs Apple (end 2004)

Reply #3 – 2004-12-09 09:50:23

I like those results.
Top AAC encoders seems to have made good progress. I guess that now AAC would be quite better than mp3 on a 128kbps group test.
That is also quite good for low bitrates, as previsouly (http://www.rjamorim.com/test/64test/results.html) Ahead's LC core was obviously inferior to Apple's one. In the 64k test Ahead won because of the SBR part. Let's do a few extrapolations based on the 64k test:
Look at the margin between FhG mp3 and mp3pro. This margin was only because of SBR. Now, let's assume that the LC core from Ahead is now tied to Apple's one. Add the sbr margin to the Apple rating, and you have an incredible potential rating that would score past 4!

Regarding the "fast" encoder, my guess would be that it is trying to directly compute scalefactor values instead of trying to reach an optimal combination by iterating. (but I have no knowledge if the inners of this encoder, just a guess)

AAC: Ahead vs Apple (end 2004)

Reply #4 – 2004-12-09 10:36:06

>>Add the sbr margin to the Apple rating, and you have an incredible potential
>>rating that would score past 4!
Heh. Would be really interesting to test.
There were a rumors, that Apple will bring us HE-AAC, but seems that it is still the rumours...
BTW, thanks, guruboolez !

AAC: Ahead vs Apple (end 2004)

Reply #5 – 2004-12-09 12:06:39

As always, it's very comprehensive and very enjoyable to read. Thanks guruboolez.

AAC: Ahead vs Apple (end 2004)

Reply #6 – 2004-12-09 23:59:53

interesting that fast produced better results than high for you guruboolez (thanks for this comparison!)

i hope ivan can give us some explanations here (i always hoped that neros pure lc-aac encoder would get much better, it was really needed)

AAC: Ahead vs Apple (end 2004)

Reply #7 – 2004-12-10 02:46:53

As always, a very interesting read. Thanks for your work, guruboolez. Nero AAC is becoming very interesting, indeed.

AAC: Ahead vs Apple (end 2004)

Reply #8 – 2004-12-10 06:48:38

It's nice to see Nero VBR perform so well. Thank you for this guruboolez. Because I do see AAC being more the real successor to mp3 as it has businesses backing it, it's nice to see the encoders mature as time progresses.

Oh and great work by all you programmers at Ahead. You guys really improved your codec in a fairly short amount of time.

AAC: Ahead vs Apple (end 2004)

Reply #9 – 2004-12-11 18:29:33

Interesting read! Good to get confirmation on the ringing issue with QT, and big props to the nero guys

AAC: Ahead vs Apple (end 2004)

Reply #10 – 2004-12-22 04:39:33

A highly interesting read indeed.
Thank you very much

Notice