As promised: additional measurements for the latest versions of libopus and aoTuV, with precise bitrate information. Logs are in the appendix.
But first a reaction to IgorC, because their post above appeared while I was writing this report.
IgorC: that's certainly worth pointing out. Thank you! While my primary question is "at what presets do these codecs reach transparency" and secondary "which codec will cost me the least amount of disk space if I encode everything transparently", I am ranking the codecs by these results and that will be misleading at lesser bitrates. If I were to ask instead "what codec will give me the best results if my target bitrate is X kbps", this experiment would not provide an answer (unless X=190).
I'm interested in the latter question as well and I think I might want to perform an ABC/HR test sometime in the future in order to answer it. I'm thinking that 96kbps might be a good target. Perhaps I should try a second target as well. Judging from the documentation that you referred to, and given that I've used samples that "push me to the safe side", I'd probably need to make some changes to my selection of listening samples. I'd be most grateful if anyone is willing to provide their input on these considerations!
Now, back to the current report.
Results in a nutshell
Opus 1.1a shows a considerable improvement over version 1.0.2 in terms of the variability of bitrates. I reached full transparency at preset 192 rather than at 224. I also judged preset 160 to be very close to transparency. In terms of efficiency however the improvement seems to be less extreme; in CBR mode I would trust files with total bitrate 220kbps or greater while I would previously do so at 230kbps.
AoTuV 6.03b seems to be a much more gradual improvement over release 1, which is perhaps unsurprising since aoTuV is older and more mature than Opus. This time I judged q6 to be fully transparent rather than very close to it (but see discussion in the aoTuV section). Regardless, q6 is still my optimal setting if I would use aoTuV for my music. In the OP I stated that I would trust aoTuV files in CBR mode at 200kbps or greater, but that was based solely on expected bitrates. Now that I've paid full attention to the observed bitrates, I have to increase that estimate to a shocking 290kbps.
The new bitrate information does not affect the conclusions for QT AAC and LAME. If I were to rank the codecs for their performance at high quality settings, QT AAC ends up at a distinct first place and Opus at second place, while I'd have a hard time to decide whether to put LAME or aoTuV next. Theoretically aoTuV seems better because of the expected bitrates associated with my optimal setting in both codecs (192-224 for aoTuV q6 versus 256 for LAME V0), but since aoTuV encodes everything at equal or higher bitrates than expected while LAME seems to actually meet the target, aoTuV might not be the most efficient of the two.
Equipment, procedure, and so on
Nothing changed compared to the OP, except that I used the following additional software:
- XLD plugin for encoding with aoTuV 6.03b.
- XLD plugin for encoding with libopus 1.1a.
- opusinfo from opus-tools 0.1.6 for reading out the Opus bitrates.
- ogginfo from vorbis-tools 1.4.0 for reading out the Vorbis bitrates.
- My own scripts for keeping track of ABX scores, calculating p-values and producing formatted logs.
In the Opus search I restricted myself to the eight numbered samples from the OP, skipping over the other eight samples in which I've never heard any difference. In the aoTuV search I only listened to samples 1, 3 because I didn't expect to hear a difference anywhere else.
The limits for "marginal difference" and "clear difference" are still the same (respectively 0.05 and 0.002), but that means my criteria have become stricter as my p-values are now calculated the same way as in foobar2000, which yields slightly larger numbers.
Observed bitrates
1. Bitrates for the files encoded with libopus 1.1a and aoTuV 6.03b, together with ALAC bitrates of the original files as an indicator of entropy. A zero means "not encoded".
Official expected bitrate for Vorbis q4 (according to ogginfo) is 129kbps.
opus.96 opus.128 opus.160 opus.192 aotuv.q4 aotuv.q5 aotuv.q6 alac
1 114 151 188 226 257 321 0 1078
2 170 0 0 0 214 0 0 673
3 95 0 0 0 139 164 198 872
4 137 172 209 239 138 0 0 817
5 112 0 0 0 137 0 0 683
6 141 180 220 255 165 0 0 835
7 111 147 0 221 161 0 0 983
8 94 0 0 0 146 0 0 811
2. Bitrates for the files encoded with QT AAC and LAME, as they were used in the OP. ALAC bitrates again included as a reference.
Expected bitrate for QT AAC q45 is 105 kbps. Expected for LAME V5 is 135kbps.
qtaac.q45 qtaac.q54 qtaac.q82 qtaac.q91 lame.V5 lame.V3 lame.V0 alac
1 102 0 0 0 168 206 296 1078
2 161 186 0 0 230 0 0 673
3 103 0 0 0 146 0 0 872
4 84 0 0 0 127 0 0 817
5 93 0 0 0 126 0 0 683
6 81 95 0 0 129 171 0 835
7 107 0 0 0 168 0 0 983
8 70 0 128 159 128 0 0 811
Some interesting observations:
- Judging by the ALAC bitrates samples 1, 7 have the greatest entropy, which makes sense given the high density of attacks and the large amounts of noise, but in the lossy codecs they are usually not the files with the highest bitrates. The only exception is sample 1 in aoTuV, which might suggest that aoTuV depends more on entropy than the other codecs.
- Sample 2 seems to have the lowest entropy, but is usually lossy encoded at the highest bitrate (second-highest in aoTuV). Apparently that sample has the most complex psychoaccoustics. However I could only hear a difference in this sample at Opus 96 and QT AAC q45 (perhaps also at q54 if I were to listen again after training).
- Following the same reasoning as above sample 5 is probably both low in entropy and psychoaccoustically simple. This again makes sense as it only contains some drum beats with silence in between and a single voice singing 'aaa' at the start. Sample 8 also seems to be psychoaccoustically simple but at medium entropy.
- QT AAC and LAME deviate significantly from the expected bitrate in both directions while Opus and aoTuV seem to deviate only upwards.
Opus
1. Target 96, samples 1-8. Clear differences in 1, 4, 6, 7.
2. Target 192, samples 1, 4, 6, 7. No difference.
3. Target 128, samples 1, 4, 6, 7. Clear differences in 1, 4, marginal difference in 6.
4. Target 160, samples 1, 4, 6. Marginal difference in 4.
AoTuV
1. q4, samples 1, 3. Clear difference in 1 (at 257kbps!), marginal difference in 3.
2. q5, samples 1, 3. Marginal difference in 3.
3. q6, sample 3. No difference (but see comment).
Comment: I believe in stage 3 I heard the same difference as in stage 2, but too subtle to be able to prove it. As you can see from the log I came quite close to the marginal difference limit before the last batch. So meticulously speaking aoTuV q6 might be "extremely close to transparent" instead of "fully transparent".
Appendix: logs
Often there's no log of the tests in which I heard no difference, because I didn't even try to identify the Xs.
The p-values are compatible with the percentages in foobar2000 logs.
opus.96.sample1
batch score subtotal p
1 5/5 5/ 5 0.03125
2 5/5 10/10 0.0009765625
clear difference
opus.96.sample3
batch score subtotal p
1 2/5 2/ 5 0.8125
2 2/5 4/10 0.828125
no difference
opus.96.sample4
batch score subtotal p
1 5/5 5/ 5 0.03125
2 5/5 10/10 0.0009765625
clear difference
opus.96.sample6
batch score subtotal p
1 5/5 5/ 5 0.03125
2 4/5 9/10 0.01074219
3 5/5 14/15 0.0004882812
clear difference
opus.96.sample7
batch score subtotal p
1 4/5 4/ 5 0.1875
2 5/5 9/10 0.01074219
3 4/5 13/15 0.003692627
4 4/5 17/20 0.001288414
clear difference
opus.192.sample4
batch score subtotal p
1 2/5 2/ 5 0.8125
2 2/5 4/10 0.828125
no difference
opus.128.sample1
batch score subtotal p
1 4/5 4/ 5 0.1875
2 4/5 8/10 0.0546875
3 4/5 12/15 0.01757812
4 5/5 17/20 0.001288414
clear difference
opus.128.sample4
batch score subtotal p
1 5/5 5/ 5 0.03125
2 5/5 10/10 0.0009765625
clear difference
opus.128.sample6
batch score subtotal p
1 2/5 2/ 5 0.8125
2 3/5 5/10 0.6230469
3 4/5 9/15 0.3036194
4 4/5 13/20 0.131588
5 3/5 16/25 0.1147615
6 5/5 21/30 0.02138697
marginal difference
opus.160.sample4
batch score subtotal p
1 2/5 2/ 5 0.8125
2 4/5 6/10 0.3769531
3 2/5 8/15 0.5
4 3/5 11/20 0.4119015
5 4/5 15/25 0.2121781
6 4/5 19/30 0.1002442
7 4/5 23/35 0.04476554
marginal difference
opus.160.sample6
batch score subtotal p
1 3/5 3/ 5 0.5
2 4/5 7/10 0.171875
3 4/5 11/15 0.05923462
4 2/5 13/20 0.131588
5 1/5 14/25 0.345019
6 4/5 18/30 0.1807973
7 3/5 21/35 0.1552523
no difference
aotuv.q4.sample1
batch score subtotal p
1 4/5 4/ 5 0.1875
2 5/5 9/10 0.01074219
3 4/5 13/15 0.003692627
4 5/5 18/20 0.0002012253
clear difference
aotuv.q4.sample3
batch score subtotal p
1 3/5 3/ 5 0.5
2 3/5 6/10 0.3769531
3 5/5 11/15 0.05923462
4 3/5 14/20 0.05765915
5 4/5 18/25 0.02164263
marginal difference
aotuv.q5.sample3
batch score subtotal p
1 2/5 2/ 5 0.8125
2 3/5 5/10 0.6230469
3 3/5 8/15 0.5
4 4/5 12/20 0.2517223
5 3/5 15/25 0.2121781
6 5/5 20/30 0.04936857
7 4/5 24/35 0.0204798
marginal difference
aotuv.q6.sample3
batch score subtotal p
1 4/5 4/ 5 0.1875
2 4/5 8/10 0.0546875
3 3/5 11/15 0.05923462
4 3/5 14/20 0.05765915
5 3/5 17/25 0.05387607
6 2/5 19/30 0.1002442
no difference