QT AAC, aoTuV (Vorbis), libopus, LAME (MP3) at high quality settings

Reply #25 – 2013-02-10 12:36:13

Thanks for the grateful and welcoming reactions, I feel all warm and fuzzy! It was interesting to perform the measurements and a pleasure to share the results with you, and given your reactions I'd definitely do it again. With a bonus: next time I'll include (Jplus-scripted) logs for those who care about it, and provide p-values that are calculated in the same way as in foobar2000.

Speaking of the next time. IgorC, yes I do think I'll test libopus 1.1 while it's still in alpha stage. You provide a solid argument (devs need listening results in order to improve their encoder), and while I'd never use alpha software for my music collection it can't hurt to just try how it performs. To be honest I've become quite curious anyway because of the change from constrained to unconstrained VBR. I really like unconstrained VBR.

I might try aoTuV 6.03b as well, while I'm at it. I can probably save a lot of time by restricting myself to the samples that I already found a difference in.

Question on good HA forums style: should I post the additional results in this thread or in a new topic?

Stay tuned!

QT AAC, aoTuV (Vorbis), libopus, LAME (MP3) at high quality settings

Reply #26 – 2013-02-10 12:54:41

Quote from: Jplus on 2013-02-10 12:36:13

To be honest I've become quite curious anyway because of the change from constrained to unconstrained VBR. I really like unconstrained VBR.

I might try aoTuV 6.03b as well, while I'm at it. I can probably save a lot of time by restricting myself to the samples that I already found a difference in.

Not to give you assignments, but it would be interesting to note the actual bit rate and see how quality/transparency compares at a given target between hard and easy samples.

If all your current samples come from a set of cases considered hard tests for lame, I'd guess they lean to the high rate side for others as well.

QT AAC, aoTuV (Vorbis), libopus, LAME (MP3) at high quality settings

Reply #27 – 2013-02-10 14:43:26

Writing down the (average) bitrate of each encoded file seems like a good idea to me, I'll do that.

Not all of my samples are considered hard for LAME (at least not anymore), in fact sample 8 was not taken from the LAME testing page at all. I'm sure that sample 8 is easy because it's invariably encoded at lesser bitrates than expected for the given preset. I think sample 5 is easy as well because it doesn't sound complex at all and I almost never heard a difference in it, but I'd have to check the bitrates to verify.

QT AAC, aoTuV (Vorbis), libopus, LAME (MP3) at high quality settings

Reply #28 – 2013-02-11 16:37:14

Jplus,

It might be worth to mention that ABX is adequate for "lossless vs lossy" comparison, while if You want to strictly compare a performance of a few lossy encoders then there is ABC/HR for such purpose. ABC/HR Java . It's quite easy to use.

Some links
http://wiki.hydrogenaudio.org/index.php?title=ABC/HR
http://www.rarewares.org/rja/ListeningTest.pdf

QT AAC, aoTuV (Vorbis), libopus, LAME (MP3) at high quality settings

Reply #29 – 2013-02-11 18:04:53

As promised: additional measurements for the latest versions of libopus and aoTuV, with precise bitrate information. Logs are in the appendix.

But first a reaction to IgorC, because their post above appeared while I was writing this report.

IgorC: that's certainly worth pointing out. Thank you! While my primary question is "at what presets do these codecs reach transparency" and secondary "which codec will cost me the least amount of disk space if I encode everything transparently", I am ranking the codecs by these results and that will be misleading at lesser bitrates. If I were to ask instead "what codec will give me the best results if my target bitrate is X kbps", this experiment would not provide an answer (unless X=190).

I'm interested in the latter question as well and I think I might want to perform an ABC/HR test sometime in the future in order to answer it. I'm thinking that 96kbps might be a good target. Perhaps I should try a second target as well. Judging from the documentation that you referred to, and given that I've used samples that "push me to the safe side", I'd probably need to make some changes to my selection of listening samples. I'd be most grateful if anyone is willing to provide their input on these considerations!

Now, back to the current report.

Results in a nutshell
Opus 1.1a shows a considerable improvement over version 1.0.2 in terms of the variability of bitrates. I reached full transparency at preset 192 rather than at 224. I also judged preset 160 to be very close to transparency. In terms of efficiency however the improvement seems to be less extreme; in CBR mode I would trust files with total bitrate 220kbps or greater while I would previously do so at 230kbps.
AoTuV 6.03b seems to be a much more gradual improvement over release 1, which is perhaps unsurprising since aoTuV is older and more mature than Opus. This time I judged q6 to be fully transparent rather than very close to it (but see discussion in the aoTuV section). Regardless, q6 is still my optimal setting if I would use aoTuV for my music. In the OP I stated that I would trust aoTuV files in CBR mode at 200kbps or greater, but that was based solely on expected bitrates. Now that I've paid full attention to the observed bitrates, I have to increase that estimate to a shocking 290kbps.
The new bitrate information does not affect the conclusions for QT AAC and LAME. If I were to rank the codecs for their performance at high quality settings, QT AAC ends up at a distinct first place and Opus at second place, while I'd have a hard time to decide whether to put LAME or aoTuV next. Theoretically aoTuV seems better because of the expected bitrates associated with my optimal setting in both codecs (192-224 for aoTuV q6 versus 256 for LAME V0), but since aoTuV encodes everything at equal or higher bitrates than expected while LAME seems to actually meet the target, aoTuV might not be the most efficient of the two.

Equipment, procedure, and so on
Nothing changed compared to the OP, except that I used the following additional software:

XLD plugin for encoding with aoTuV 6.03b.
XLD plugin for encoding with libopus 1.1a.
opusinfo from opus-tools 0.1.6 for reading out the Opus bitrates.
ogginfo from vorbis-tools 1.4.0 for reading out the Vorbis bitrates.
My own scripts for keeping track of ABX scores, calculating p-values and producing formatted logs.

In the Opus search I restricted myself to the eight numbered samples from the OP, skipping over the other eight samples in which I've never heard any difference. In the aoTuV search I only listened to samples 1, 3 because I didn't expect to hear a difference anywhere else.
The limits for "marginal difference" and "clear difference" are still the same (respectively 0.05 and 0.002), but that means my criteria have become stricter as my p-values are now calculated the same way as in foobar2000, which yields slightly larger numbers.

Observed bitrates
1. Bitrates for the files encoded with libopus 1.1a and aoTuV 6.03b, together with ALAC bitrates of the original files as an indicator of entropy. A zero means "not encoded".
Official expected bitrate for Vorbis q4 (according to ogginfo) is 129kbps.

Code: [Select]

  opus.96 opus.128 opus.160 opus.192 aotuv.q4 aotuv.q5 aotuv.q6 alac
1     114      151      188      226      257      321        0 1078
2     170        0        0        0      214        0        0  673
3      95        0        0        0      139      164      198  872
4     137      172      209      239      138        0        0  817
5     112        0        0        0      137        0        0  683
6     141      180      220      255      165        0        0  835
7     111      147        0      221      161        0        0  983
8      94        0        0        0      146        0        0  811

2. Bitrates for the files encoded with QT AAC and LAME, as they were used in the OP. ALAC bitrates again included as a reference.
Expected bitrate for QT AAC q45 is 105 kbps. Expected for LAME V5 is 135kbps.

Code: [Select]

  qtaac.q45 qtaac.q54 qtaac.q82 qtaac.q91 lame.V5 lame.V3 lame.V0 alac
1       102         0         0         0     168     206     296 1078
2       161       186         0         0     230       0       0  673
3       103         0         0         0     146       0       0  872
4        84         0         0         0     127       0       0  817
5        93         0         0         0     126       0       0  683
6        81        95         0         0     129     171       0  835
7       107         0         0         0     168       0       0  983
8        70         0       128       159     128       0       0  811

Some interesting observations:

Judging by the ALAC bitrates samples 1, 7 have the greatest entropy, which makes sense given the high density of attacks and the large amounts of noise, but in the lossy codecs they are usually not the files with the highest bitrates. The only exception is sample 1 in aoTuV, which might suggest that aoTuV depends more on entropy than the other codecs.
Sample 2 seems to have the lowest entropy, but is usually lossy encoded at the highest bitrate (second-highest in aoTuV). Apparently that sample has the most complex psychoaccoustics. However I could only hear a difference in this sample at Opus 96 and QT AAC q45 (perhaps also at q54 if I were to listen again after training).
Following the same reasoning as above sample 5 is probably both low in entropy and psychoaccoustically simple. This again makes sense as it only contains some drum beats with silence in between and a single voice singing 'aaa' at the start. Sample 8 also seems to be psychoaccoustically simple but at medium entropy.
QT AAC and LAME deviate significantly from the expected bitrate in both directions while Opus and aoTuV seem to deviate only upwards.

Opus
1. Target 96, samples 1-8. Clear differences in 1, 4, 6, 7.
2. Target 192, samples 1, 4, 6, 7. No difference.
3. Target 128, samples 1, 4, 6, 7. Clear differences in 1, 4, marginal difference in 6.
4. Target 160, samples 1, 4, 6. Marginal difference in 4.

AoTuV
1. q4, samples 1, 3. Clear difference in 1 (at 257kbps!), marginal difference in 3.
2. q5, samples 1, 3. Marginal difference in 3.
3. q6, sample 3. No difference (but see comment).
Comment: I believe in stage 3 I heard the same difference as in stage 2, but too subtle to be able to prove it. As you can see from the log I came quite close to the marginal difference limit before the last batch. So meticulously speaking aoTuV q6 might be "extremely close to transparent" instead of "fully transparent".

Appendix: logs
Often there's no log of the tests in which I heard no difference, because I didn't even try to identify the Xs.
The p-values are compatible with the percentages in foobar2000 logs.

Code: [Select]

opus.96.sample1
batch  score  subtotal  p
    1    5/5      5/ 5  0.03125
    2    5/5     10/10  0.0009765625
clear difference

opus.96.sample3
batch  score  subtotal  p
    1    2/5      2/ 5  0.8125
    2    2/5      4/10  0.828125
no difference

opus.96.sample4
batch  score  subtotal  p
    1    5/5      5/ 5  0.03125
    2    5/5     10/10  0.0009765625
clear difference

opus.96.sample6
batch  score  subtotal  p
    1    5/5      5/ 5  0.03125
    2    4/5      9/10  0.01074219
    3    5/5     14/15  0.0004882812
clear difference

opus.96.sample7
batch  score  subtotal  p
    1    4/5      4/ 5  0.1875
    2    5/5      9/10  0.01074219
    3    4/5     13/15  0.003692627
    4    4/5     17/20  0.001288414
clear difference

opus.192.sample4
batch  score  subtotal  p
    1    2/5      2/ 5  0.8125
    2    2/5      4/10  0.828125
no difference

opus.128.sample1
batch  score  subtotal  p
    1    4/5      4/ 5  0.1875
    2    4/5      8/10  0.0546875
    3    4/5     12/15  0.01757812
    4    5/5     17/20  0.001288414
clear difference

opus.128.sample4
batch  score  subtotal  p
    1    5/5      5/ 5  0.03125
    2    5/5     10/10  0.0009765625
clear difference

opus.128.sample6
batch  score  subtotal  p
    1    2/5      2/ 5  0.8125
    2    3/5      5/10  0.6230469
    3    4/5      9/15  0.3036194
    4    4/5     13/20  0.131588
    5    3/5     16/25  0.1147615
    6    5/5     21/30  0.02138697
marginal difference

opus.160.sample4
batch  score  subtotal  p
    1    2/5      2/ 5  0.8125
    2    4/5      6/10  0.3769531
    3    2/5      8/15  0.5
    4    3/5     11/20  0.4119015
    5    4/5     15/25  0.2121781
    6    4/5     19/30  0.1002442
    7    4/5     23/35  0.04476554
marginal difference

opus.160.sample6
batch  score  subtotal  p
    1    3/5      3/ 5  0.5
    2    4/5      7/10  0.171875
    3    4/5     11/15  0.05923462
    4    2/5     13/20  0.131588
    5    1/5     14/25  0.345019
    6    4/5     18/30  0.1807973
    7    3/5     21/35  0.1552523
no difference

aotuv.q4.sample1
batch  score  subtotal  p
    1    4/5      4/ 5  0.1875
    2    5/5      9/10  0.01074219
    3    4/5     13/15  0.003692627
    4    5/5     18/20  0.0002012253
clear difference

aotuv.q4.sample3
batch  score  subtotal  p
    1    3/5      3/ 5  0.5
    2    3/5      6/10  0.3769531
    3    5/5     11/15  0.05923462
    4    3/5     14/20  0.05765915
    5    4/5     18/25  0.02164263
marginal difference

aotuv.q5.sample3
batch  score  subtotal  p
    1    2/5      2/ 5  0.8125
    2    3/5      5/10  0.6230469
    3    3/5      8/15  0.5
    4    4/5     12/20  0.2517223
    5    3/5     15/25  0.2121781
    6    5/5     20/30  0.04936857
    7    4/5     24/35  0.0204798
marginal difference

aotuv.q6.sample3
batch  score  subtotal  p
    1    4/5      4/ 5  0.1875
    2    4/5      8/10  0.0546875
    3    3/5     11/15  0.05923462
    4    3/5     14/20  0.05765915
    5    3/5     17/25  0.05387607
    6    2/5     19/30  0.1002442
no difference

Notice