Help - Search - Members - Calendar
Full Version: Multi-Codec Listening Test: 96-128-192-256Kbps
Hydrogenaudio Forums > Hydrogenaudio Forum > Uploads
Pages: 1, 2
sauvage78
I searched back on the forum & on various samples databases for the worst problem samples I could found for vorbis & I edited with audacity the ones that I was able to ABX, usually from 10-30 sec to 1-2 sec, I even duplicated the channel from mono to stereo if it happens that the problem was only on the right or left channel.

I ended with 5 very short truly killer samples which I ABXed at various level to see if higher bitrates were transparent or not, here is the result:





All this samples comes from real CD, but they are so focused on the artefact that I may gives newbies a bad idea of vorbis,
so let it be clear aoTuV Beta5.7 is a very good lossy codec. I tested on aoTuV because it is IMHO the best overall codec around. (Edit: After adding other codecs, sadly I don't think so anymore)
Using the best DCT codec around was simply a way for me to find the worst killer samples around.
This is more a ritual killing of vorbis, gathering all my (small) knowledge against it, than a normal ABX test. I hit exactly where it hurts. So it is normal that it is painfull for vorbis.

So if I ABX aoTuV at up to -q8, it means that other lossy codecs will probably fail even worst (specially MP3) (Edit: not true). It doesn't mean that vorbis sounds bad at all on average music.
It only means that there is no lossy codec which is transparent on every samples, that simply doesn't exist, no matter the bitrate.

But, in some way, this test is made to prove that people using lossy at overkill bitrate are not protected from problem samples.
I don't even have golden ears, I can't ear above 18Khz...

You can run the test for yourself everything is included in the attached archive. Every samples at every bitrates. Everything is already encoded & replaygained, all you have to do is to listen & compare your results against mine, my logs are included (when successfull). You can test your earing very quickly because the samples are very short. I prepared everything to ease your life.

I didn't discovered these samples, none of them is new. But I renamed some of them for my own use (I named each sample by its artist) & also because they have been so shortened that they are sometimes unrecognizable.
Anyone collecting problem samples should have a look ... these 5 samples really worth it. I spend a couple of hours just to edit them & focus on pure artefact.
You don't have to point other people to the artefact inside the sample, it IS the artefact.
In fact my only goal with this listening test wasn't to test vorbis, but to start a small collection of killer samples for any DCT codec in order to know where they fail.
So I absolutly don't care if vorbis sounds good or bad ... I use 100% lossless actually.

Indeed this test is dedicated to Monty & Aoyumi. Thks for vorbis I hope it will help you improve the codec.

PS:
If you know the origin (Artist-Album ...) of the Castanets sample, I am interested by this information.
If you know very good killer samples, I am interested. Plz describe & post them so that I can add them to my test next time.
If ever Gabriel or Roberto reads this plz fix http://lame.sourceforge.net/quality.php (I get a 403 error when I try to download samples)

Edit1: Inclusion of Average Bitrate in the table for later comparison to others codecs.
Edit2: Added Nero AAC
Edit3: Added Musepack
Edit4: Added Lame MP3
Edit5: Added Itunes AAC
Edit6: Added Abfahrt Hinwil Sample
Edit7: Errata N°1
Edit8: Added aoTuV exp-bs1, Errata N°2
Edit9: Added Celt
halb27
QUOTE (sauvage78 @ Mar 16 2009, 12:35) *
... It only means that there is no lossy codec which is transparent on every samples, that simply doesn't exist, no matter the bitrate. ...

I see you're using lossyWAV at -P quality. At a bitrate like that or higher and with a well-designed codec that keeps the straightforward signal path of the pcm data I think we can beleive that audible deviation from the original is very subtle if audible at all.
It's quite astonishing however that you are able to ABX vorbis at 256 kbps with an annoying result.
sauvage78
I bump my own thread just to let people know that I added Nero AAC to the comparison.

As you can see Nero AAC shines but that doesn't mean Vorbis is bad, because I selected problem samples specific to vorbis & then tested it on Nero AAC, so it is unfair for vorbis.
I admit I don't like MPEG ... but that said I was impressed by Nero Q0,55 ...
Canar
Thank you for your efforts. Listening tests take time and dedication! I use aoTuV in my Vorbis streaming component for foobar2000, so this affects me as well.

Have you sent aoyumi a message linking to your test? He would probably be interested.

Finally, would you consider testing -q10 as well? I would like to know if these artifacts cover the whole breadth of Vorbis.
rpp3po
Nice work! Thank you!

It anyway kind of damages Vorbis somehow, in my opinion, to see it failing this way. I know, these specifically are targeted problem samples and every lossy codec probably has some skeletons in the closet, but some of them are already quite old (like Castanets) and still don't work. In contrast the Nero devs, for example, were able to straighten out most critical samples thrown at them here at HA over the years.

I really would have liked to see Vorbis succede. Even when it produced artifacts at lower bitrates, I always found them more pleasing / less annoying than those of other codecs. I don't know why. Albeit being an excellent codec, Vorbis missed the wave of momentum, it would have needed, a couple of years ago in my opinion and was overhauled by LAME-MP3 and AAC. A phone/player manufacturer can probably satisfy far more than 90% of its user base by just by supporting those two formats. Vorbis' Tremor showing serious quality problems on some hardware players probably didn't help either. Maybe that all reverses if Vorbis gets included in the next W3C spec, but I don't know about the current state.
uart
Yes a very impressive test sauvage78.

Any chance of adding Lame3.98 to the comparison. smile.gif
Kim_C
Thanks for your effort. Excellent visualisation of results!

If you have time and are interested, i think latest iTunes / Quicktime AAC would be good addition to comparison. It'd be interesting to see how well it performs against Nero AAC now because in previous tests iTunes was slightly better than Nero.
dgauze
Thanks for conducting this test, sauvage78! I hope your hard work will help contribute to the advancement of both codecs.


QUOTE (sauvage78 @ Mar 16 2009, 04:35) *
I searched back on the forum & on various samples databases for the worst problem samples I could found for vorbis & I edited with audacity the ones that I was able to ABX, usually from 10-30 sec to 1-2 sec, I even duplicated the channel from mono to stereo if it happens that the problem was only on the right or left channel.


It would be very interested to see how Vorbis performs on samples known to be problematic for the Nero AAC encoder.
sauvage78
I just added Musepack.

I consider this thread evolutive, so I may add other codecs & other killer samples later as my mood changes or my knowledge increases.

I specially plan to test:
1: Official Vorbis (all range) & Aotuv Q10
2: Lossy|Flac at --portable & a setting lower near 256Kbps (maybe --zero), but not higher (Add Ginnungagap sample)
3: Lame MP3 (not a priority, but usefull as a reference)

... actually I don't plan to test Itunes for the simple reason that I don't want to install it. Even if it would be good, I want a CLI encoder so Itunes is useless for my personnal needs.
If ever my system gets broken, I may test it before I re-install... but it will be much later (maybe when windows seven comes out)
no matter Itunes audio quality I consider Nero AAC better for personnal use due to the fact that Nero is both CLI & already transparent at ~192Kbps, so testing Itunes would only be usefull to get an idea of the quality provided by the Music Store which I don't use. Even if Itunes would beat Nero, it would be a marginal quality gain (& only at 128Kbps & below maybe) for the huge pain of using Itunes.

I don't promise anything as I am short of time. I do it for myself, I only publish it as the job is done honestly, to the best of my knowledge, so it can as well be public. And I also do it in case I would have done something wrong & to help developers. But I must say I don't care much about the opinion of others. I know what I hear, I spot artefact before beginning ABXing. I don't ABX randomly at all, that's why my score is almost always 100% success or 100% failure. It's never "I think that I hear something" except in very rare case noted as yellow. I am 100% sure that I can redo the test & get the exact same result except for yellow results.

Currently I consider Kraftwerk & Rush samples (specially the Rush sample) as Vorbis bugs that could be easyly fixed IMHO.
So far I never heard the Rush bug outside of vorbis. (I hear a toad in the background ...). It happens that I really love that Rush's song so, it is important that this particular bug gets fixed.

Concerning Musepack, its huge problem is not its quality at high bitrate but its lack of flexibility. It doesn't matter if you can now put it in MKV, guys from Doom9 would never use a codec which perform so badly at mid/low bitrate. Even at 128Kbps Musepack doesn't compete with Vorbis/AAC (even if the table doesn't show it as both looks almost tied, aotuv q4 beats musepack radio as the artefact of aotuv is always softer) ... Musepack is just an improvement compared to Lame ~192Kbps IMHO (Edit: After testing Lame MP3, not even true ...). Musepack seems to beats aotuv in the table but this is due to the fact that the samples are heavyly targetted at vorbis' flaw. I am confident that once vorbis bugs will get fixed, vorbis -q6 will beat musepack extreme ... or last be tied at high bitrate & beat it at mid/low bitrates.

Also the 10/12 on on nero Castanets Q0.35 is due to boredom & lack of focus, I am 100% sure that I can get a 8/8 there as I get 8/8 at Q0.40 which doesn't seem logic. But I don't care about re-trying as I am sure of myself here.

Canar:
for a confirmation at Q10 see the original thread of the Rush sample problem:
http://www.hydrogenaudio.org/forums/index....showtopic=44862
I am very confident that I can get the same score.

Edit1: Some thoughts about my methodology.
I have chosen 8 trials, because I wanted a number that would not be too long to ABX & that would prevent me from a row of lucky guess. From my experience it happens easyly that sometimes you guess up to 4 times in a row, starting at 5 or 6 success in row it becomes unlikely that you were guessing. At 7 or 8 success in a row I consider the result valid. I had hesitations between 8 & 12 trials because at 12/12 you don't have even the shadow of a doubt. In the end, I decided to split the apple in two. First I do 8 trials & see how confident I am in the validity of the result. If I get 8/8 I consider the result valid, specially if I can identify what I listen to. If I get a result of 5/8 or less, I consider the test a failure, specially if I cannot identify what I listen to. If I get 6 or 7 out of 8 trials, I go up to 12 trials. Then If I get at last 10 success out of 12 trials I consider the test valid specially if I know what I listen to. If I get 9 success out of 12 trials, I consider the test invalid specially if I don't know what I listen to.
It never happened that I would have a result of 9/12 while knowing what I was listening to, I have chosen & edited my samples specially in order that it never happens.
Surprisingly, what did happen is that I was able to get a 8/8 result while not knowing what I was listening to, such case were very rare (1 or 2) & marked as yellow (but not every yellow result is in this case). I consider them valid as I consider that there is an overall very slight modification in the audio without being a flaw that you can point out. For me it doesn't necessary means that I was guessing, it means that I was very very close to the transparency point. Also sometimes it happens that it sounds different but not bad, just slightly different. I may re-test yellow results later, but yellow means the quality is good anyway, so it may not be worth the headache.

Edit2: If ever a moderator read this plz edit the tittle
from Listening Test: aoTuV Beta5.7 on 5 Killer Samples, 96-128-192-256Kbps (17 ABX Log+Files)
to Multi-Codec Listening Test: 96-128-192-256Kbps, Killer Samples targetting vorbis (with logs)
Thks
uart
QUOTE
I specially plan to test:
1: Official Vorbis (all range) & Aotuv Q10
2: Lossy|Flac at --portable & a setting lower near 256Kbps (maybe --zero), but not higher (Add Ginnungagap sample)
3: Lame MP3 (not a priority, but usefull as a reference)


Great, looking forward to see those. Especially the lame reference to put the test into perspective for me (as an mp3 user).
Aoyumi
sauvage78:
Thank you for the test.

somebody:
I can't improve these problems immediately. Because I wrestle with another problem slowly.
On the other hand, I write views for motivated somebody.

Kraftwerk/Rush/Autechre
Please improve block switching. It is the cause that switches from short to long are too early as for those problem to happen at the high bit-rate. But the method to change the threshold simply is inefficient.

Harlem
It is point stereo in fact to influence a sound. In addition, there are some problems for block switching.

All the best!
Canar
QUOTE (sauvage78 @ Mar 17 2009, 23:05) *
Edit2: If ever a moderator read this plz edit the tittle
from Listening Test: aoTuV Beta5.7 on 5 Killer Samples, 96-128-192-256Kbps (17 ABX Log+Files)
to Multi-Codec Listening Test: 96-128-192-256Kbps, Killer Samples targetting vorbis (with logs)
Done-da-dun-dun-DOOOOONE
sauvage78
As several people requested it, I added Lame MP3 sooner than I thought.
I must say I didn't expected Lame MP3 to compete so good.

I don't plan to add any other tests before some time now because I am bored & I don't have the time anyway. So I will draw my own personnal conclusion.
According to me:

at low/mid bitrate (96/128Kbps)
1: Nero
2: Vorbis/Lame (tied but only due to vorbis bugs, overall I still favor vorbis)
3: Musepack (not even in the competition)

at high bitrate (192/256Kbps)
almost all tied, except vorbis which has serious problems that are not normal at this bitrate.

I don't know if I must be happy or sad that Lame competes so well, because it means that, most likely, nobody will ever code an open source AAC codec as good as what x264 is for AVC in the video codec world. I was secretly wishing that Lame MP3 would be awfull wink.gif

Overall, I didn't expected that there would be such a big difference between Vorbis & the other codecs. I knew I was hurting vorbis but I thought that, as tranform codecs, both Nero & Lame would suffer too (particulary Lame) ... It just didn't happen. In the same way, I knew Musepack was bad at low bitrates as it is not designed/optimized for it, but I thought that, due to the fact the it is a subband codec (& also influenced by 128Kbps comparisons where it was not bad at all), it would maybe not suffer too much at mid bitrate. I noticed a variation within the nature of the artefacts (Musepack smears much more than others codecs at 96/128Kbps) but Musepack was hurted very badly by my samples too. The fact that a codec is a transform or a subband codec alone is not enougth to tell anything about the quality of a codec IMHO. As long as the limit of the technology is not reached, the implementation is much more important than the technology used. That's why Lame just doesn't want to die. According to me, the claim that Musepack would be as good as Nero/Vorbis at 128Kbps is not true. I used to think that Musepack was better than Lame, I don't think so anymore. Maybe there was a time it was true, I don't know.

I also thought that the difference between AAC & MP3 would have been bigger. Even if both seems often tied within the table, AAC is always better qualitywise. When both Nero AAC & Lame MP3 are ABXable & tied, the artefact is always softer with Nero AAC.

The only reason for not using Nero (I don't) is that it is patented & closed. Qualitywise it is brilliant. Congratulations to Ivan & Co, I wish vorbis would be as good as Nero AAC.

Personnally before this test I thought that I would maybe encode some of my rips ... after this test I am back to lossless. ... ignorance is bless.

PS: Thks for the tittle Canar.

Note: In all honesty, this test was conducted by an anti-mp3 & an anti-mpc user. Concerning Lame I think that it is more than time for a Lame AAC version. Concerning Musepack, I admit I never understood the Musepack fanatism. Nowadays there is no rationnal reason to use Musepack. Anyway I think that I wasn't unfair with any of these codecs even if I dislike them much.
halb27
I hoped you alse test lossyWAV --portable. (Settings at ~ 256 kbps which you wrote you also wanted to test are not very useful IMO though it would be useful to test at low -q settings, for instance -q 1.0 or -q 1.5 -V.)
Maybe - in order to keep your hard work restricted - you can test just -q 1.5 -V. That would be great.
Nick.C
-V is not required for beta v1.1.3e - it is the default (and only) spreading function.
DigitalDictator
Thanks for the test! Very informative and nice table!

QUOTE
... actually I don't plan to test Itunes for the simple reason that I don't want to install it. Even if it would be good, I want a CLI encoder so Itunes is useless for my personnal needs.

Can't someone encode the iTunes samples and send them to you? then you don't have to install the program. I'm also interested in how iTunes compares with Nero.

Lame seems to do a pretty good job as well, right?
sauvage78
halb27:
I will not test lossywav --zero in order to see if it sounds good but in order to see how different sound the artefacts produced from usual DCT artefacts & also to see if DCT problem samples affect it in some way. Following the same idea, I plan to test Ginnungagap to see how DCT codecs react to it. Maybe everything will be transparent due to different technology, I can't know if I don't test. All I know is that I don't rely on other anymore to tell me what sounds good. I didn't follow lossywav development lately as it is too technical for me. (specially as I am not a native english speaker).
Nick asked me to test his new spreading function, I have interest in doing so. (Edit: well it seems it's too late) But I don't use neither vorbis nor lossy|flac on a daily basis actually so I will test when I get some time & if it's not too late. The problem with lossywav is also that it is very hard to ABX, so far I am only able to ABX it on Ginnungagap or at very low setting which are not supposed to be transparent anyway. I quickly tried the samples provided by the guy who could ABX --portable, I wasn't able to ABX his samples. The more the codec is transparent the longer it takes to ABX it, that why I tested pure lossy first. The yellow & failed ABX trials take much more time & are much more boring than orange & red results. I am not nuts & paranoid, I don't even try to ABX transparent audio randomly. Autechre & Ginnungagap produce very similar artefact, that why I want to compare these two sample particulary. My interest in testing lossy|flac will rise again as my HDD space will decrease, but it can be months before I come back to lossywav, I have to study ... sorry. I still think lossywav is great, even if it's a little to big for my taste wink.gif

Quote DigitalDictator:
"Lame seems to do a pretty good job as well, right?"
Yes, I was surprised by its quality. (specially at V7 which I expected to be in the red zone)

well in all honesty I don't plan to test itunes anytime soon even if I had the pre-encoded samples wink.gif I will never use this codec for my personnal use.
DigitalDictator
QUOTE
well in all honesty I don't plan to test itunes anytime soon even if I had the pre-encoded samples wink.gif I will never use this codec for my personnal use.

Too bad, there are many out here who uses iTunes and who are really interested in the comparison between the Itunes codec and others. So if we ask nicely? rolleyes.gif
sauvage78
I just added Itunes AAC,
I must say I was very dispointed by the results. I expected it to be much better, specially due to its reputation at mid bitrates. I was hoping that maybe it would be better than Nero AAC at 128Kbps ... I was far from the reality ... overall even Lame MP3 beats Itunes AAC which was quite a big surprise to me ... also it is affected by the Krafwerk sample while Nero is not. Because its the same technology as Nero AAC I didn't expected Itunes AAC to fail on this sample.

DigitalDictator:
I didn't test it for you but because I realized I never tested the VBR version of the Itunes AAC codecs, last time I installed such a terrible software on my machine it was Itunes Version 4.0 with CBR only ...
this is the first & last time I test Itunes AAC, not only it is a software for children, (70meg, 5 firewall alerts, 4 folders in Program Files, dead keys in registry) but it sounds very average. Not specially bad (except Krafwerk), not specially good ...

Anyway it's done & I know what it worths. It is not such a bad codec overall, afterall at low birates it beats Musepack & at high bitrates it beats Vorbis ... but in the context of the AAC codecs battle. Nero wins by a good margin qualitywise & its CLI is so much friendly.

When I started my test I didn't expected it to be such a triumph for Nero AAC (& to some extend for Lame MP3 too) ... to my dismay Vorbis is losing grounds. Vorbis is only very good for streaming, because its artefacts are usually soft at low bitrates. For webmasters vorbis is great, but for CD archiving, it's not such a good option. At least actually, ... I hope it can be fixed.

Don't even ask for any codec addition for ages. I didn't count but it takes more than 3 hours just for one codec. Each time I need to organize the files (encode/rename), do the ABX test, do the table, edit the screenshot, re-up the PNG on imageshack, edit the topic ... & I don't count the time spent to find & edit samples that I could ABX ... I'd rather add new killer samples than add new codecs now.

Just take what I give as it comes & be happy with it ... or leave wink.gif I am BORED of ABXing !!!

Edit: I added the result table to the uploads, it's in openoffice .odt format, in case anyone is willing to run the same test & publish his results it will save him some time maybe ...
Busemann
The bitrates on the Kraftwerk sample on the QT files seem awfully low, and different from what I get when encoding it with QT 7.6.

Edit: I see you used a "custom" sample and I guess you must have used the custom 256 vbr setting rather than iTunes Plus too.. my mistake!
sauvage78
Yes, it is the same sample but edited (shortened & channel with the artefact duplicated to make it stereo) to focus on the specific artefact that I could heard.
You are right the bitrate is low, but it's not a problem with my sample, it is an Itunes bug. With a low bitrate too (72Kbps) Musepack achieve transparency on the same sample.
You can download all the samples (both lossy & lossless), everything is in the archives.
Busemann
QUOTE (sauvage78 @ Mar 20 2009, 08:03) *
Yes, it is the same sample but edited (shortened & channel with the artefact duplicated to make it stereo) to focus on the specific artefact that I could heard.
You are right the bitrate is low, but it's not a problem with my sample, it is an Itunes bug. With a low bitrate too (72Kbps) Musepack achieve transparency on the same sample.
You can download all the samples (both lossy & lossless), everything is in the archives.


Yeah I tried the "regular" kraftwerk sample and the bitrates was significantly higher on all settings, which is why I got confused. I also got a few kbps' off on the 256kbps samples but I encoded with the default iTunes Plus setting which use the maximum quality setting of the encoder. I'm not sure if the Plus files would sound much different tough smile.gif
rpp3po
Your Quicktime 7.6 evaluation is definitely flawed. The bitrates don't match and are partly totally amiss. Your ABX findings are also inexplicable.

Attached you'll find the correct encodings (QT 7.6) for target bitrate mode (constrained VBR/iTunes) and, where Quicktime 7.6 really excels, target quality mode (true VBR).

In true VBR mode Quicktime considers the following bitrates sufficient at the highest quality level:

Autechre: 222kbit/s
Castanets: 200kbit/s
Harlem: 216kbit/s
Kraftwerk: 298kbit/s
Rush: 196kbit/s

All are quite high on average, even for the highest Q setting. Using the same setting QT averages at about 185kbit/s over my whole music collection.

So Quicktime correctly identifies problematic content and adjusts bitrate accordingly. That your own Kraftwerk sample shows 162 kbit/s for the 256kbit/s constrained encode seems quite off the mark.

AAC is inherently a VBR format. Forcing it into certrain bitrates is really not helpful. With MP3 the latter at least increased compatibility, but that's not the case for AAC. Just let it flow at Q127 and it will give you both very small and very large files at a very reasonable total average.
sauvage78
I didn't use quicktime I used the Itunes interface & change the importation settings which were 256Kbps VBR by default to 96-128-192-256Kbps VBR ... maybe there is another AAC encoder or advanced settings in quicktime ... I didn't have much paramaters within Itunes. I don't use Itunes at all for my personnal use so maybe I used the wrong software simply ... I didn't have any True-VBR or ABR options.
I am looking at it. Unfortunatly I have already get rid of Itunes wink.gif My HDD is allergic to it ... 20 dead keys after desinstallation & a firefox plugin I never asked it to install ...

Edit: Can I do it with the quicktime installed by Itunes ? a few years ago quicktime wasn't a freeware if I recall well.
rpp3po
Yes, sadly iTunes on Windows is a real pain compared to OS X.
Alex B
These are interesting tests, but I don't think any serious conclusions can be made because the durations are only a second or two. In the public HA listening tests the first two seconds of the encoded samples have always been cutted off because the lossy codecs may first need to adapt to the content. I don't know how severe the problem can be and which codecs & settings are most affected, but that has been the accepted practise.

In addition, as sauvage78 stated, anyone who interprets the results must remember that the results are valid only for these specific samples, which represent in total of 8 seconds of quite unusual sound clips. I'd recommend playing once through the original lossless samples before making any conclusions.

EDIT: fixed a typo (adopt > adapt)
rpp3po
QUOTE (Alex B @ Mar 20 2009, 17:02) *
In the public HA listening tests the first two seconds of the encoded samples have always been cutted off because the lossy codecs may first need to adopt to the content.


Lossy audio codecs don't "adopt" to content over time (n-pass video coding is different). In fact, they don't even have any memory about the past surviving the current frame boundary (except maybe a bit reservoir for bitrate constraints).
rpp3po
QUOTE (sauvage78 @ Mar 20 2009, 17:00) *
Edit: Can I do it with the quicktime installed by Itunes ? a few years ago quicktime wasn't a freeware if I recall well.


You can just use my samples or create your own with Quicktime Player's "export" function (and see that they are identical to mine). But there is no batch interface on Windows, yet, so better save your time for the actual testing, if you don't have a Mac available.
sauvage78
Alex B:
I can only agree with you as some people seems to only watch the colored table & say WAOUH ... that is not the right way to do things, you have to test for yourself to put it in perspective.
That said, the Harlem sample is applauds, so every live CD is potentially affected by applauds.
In the same way, killer samples are very usual in electronic music, songs from NIN, Ministry, Marylin Manson, Fear Factory ... are full of effects that can be very similar to the Krafwerk/Rush/Autechre samples.

So overall, even if it's only 10 sec in the ocean of music ... I think you can draw some conclusions from my test both for live music & for electronic music.
But I agree it definitly needs more samples for other genres.

I can only tell you that this test is very serious & very honest ... I didn't spend two days to do a cheap test. I have better things to do in life specially as I don't use lossy !!!
Unless you're a real sadomasochist, you don't listen to Autechre 30min in a row for fun ... that I can tell you ...
Alex B
QUOTE (rpp3po @ Mar 20 2009, 18:06) *
Lossy audio codecs don't "adopt" to content over time (n-pass video coding is different). In fact, they don't even have any memory about the past surviving the current frame boundary (except maybe a bit reservoir for bitrate constraints).

I just spent half an hour trying to find the original reason for adding 1000-2000 ms of additional offset in the public listening tests, but unfortunately my searches didn't find the correct threads/posts. It may be related to the bit reservoir behavior or something else, but if I recall correctly it has something to do with audio quality in the very beginning of the encoded samples.
guruboolez
The test is really interesting (ABXing vorbis at -q8 is not common) but the cross-comparison of different audio coders could be misleading. I just quote the original posters words:

QUOTE
I searched back on the forum & on various samples databases for the worst problem samples I could found for vorbis (…)

QUOTE
but that doesn't mean Vorbis is bad, because I selected problem samples specific to vorbis & then tested it on Nero AAC, so it is unfair for vorbis.


Maybe a big red warning on the top of the first message should avoid future confusions.

Anyway, thank you for your test (and welcome to the club of people disgusted by ABX procedure laugh.gif )
Alex B
I finally found at least one thread in which Gabriel explains why "additional offset" should be used.

QUOTE (Gabriel @ Nov 21 2004, 15:05) *
I would like to suggest a little change to the recommended practices in listening tests.

Most of modern codecs are working based on the recent context. They usually have a way to adapt the bitrate to the content that take into consideration the past recent bitrate (a window). Many encoders also have a psychoacoustic model that take into consideration the previous psychoacoustic parameters/results.

Right now, when listening to samples, we usually encode a short sample with the encoder and listen to the result.
But the encoder needs some time to adapt its models (bitrate and psychoacoustic), and of course will not be able to properly adapt at the very beginning of the sample. If the sample hasn't been extracted from the full track, the encoder would have some time to adapt its models. It means that encoding a short sample is not totally representative of how this portion would be encoded in a "real" encode.

That is why I am proposing the following:
When encoding a short sample, allow a 1 second margin at the beginning and at the end of the sample so the encoder can adapt its models. This should not be 1s of silence, but a real 1s of content.
For ease of use, this could even be taken into consideration by the testing tools.

For video, the vqeg already has a similar recommendation: 1s at the beginning and 1s at the end should not be considered for tests, in order to let encoders stabilize themselves.

And:
QUOTE (Gabriel @ Nov 22 2004, 16:33) *
Well, I suggested 1s because we have to find a reasonable value.
I do not know about wma encoders, but even 1s is not optimal to Lame, as the ATH adjustement might need more than 1s to stabilise.
But 1s is still way better than nothing and does not reduces the sample that much.

Regarding the testings themselves, I think that it would be very nice to have the tools automatically restrict the default time by 1s at both ends.
sauvage78
It may be true in theory but my experience shows that it is not true in practice, at last for vorbis. For a very simple reason ... before I had my "artefact only" 2 sec samples ... I had to test 10 to 30 sec samples to actually find the artefact ... it never happened that there was a variation in what I was hearing betwen the 1-2 sample & the 10-30 sec sample ... This is true for Vorbis at -q2 which is the codec/setting I used to find my artefacts/samples ... I don't know for other codecs. I doubt it, specially as the argument comes from Gabriel & Lame MP3 competes very well, if it was a real problem Lame wouldn't compete so well.

Edit: When I will have more time I will re-code both long & short samples at Lame V7 then decode both to wav & cut the long wav to match the small sample. If what you say affect audio quality, I should be able ABX a difference. I think I won't find any, but I prefer to be 100% sure.
IgorC
Making statements that codec A is better than B based on ABX results is nonsense. ABC/HR is required.
sauvage78
Well it is an A/B Vs. C/D test using foobar2000 ABXing component so the reference was hidden, I didn't knew my reference file. I conducted the test in 2 parts, first I determined which one I considered as the lossless file between A & B then I determined between C & D which was the closer. It is not an X selected reference vs. A/B random test. What it really miss IMHO is statistical validity & a larger sample database ... with time I can fix the first problem but I cannot force others to test ... any quality claims are to be taken carefully. But if each time someone test codecs everyone jump & tell this is not valid for reason XYZ ... it is not surprising that tests like this doesn't show up more often. Not only it's boring as hell, but the whole world suddenly disagree with you. The only thing you can do is to test with the highest transparency possible, the most scientifically possible & then tell your opinion so that others can disagree openly. I have nothing against critics. All the files are here & my ABX logs too, so the test can re-run forever by others until is is proven scientifically valid. There is a small part of truth in this test. Readers should just be aware that it is not THE truth. If such test didn't gave an hint/a clue/an orientation, it wouldn't even be worth it testing audio for yourself.

I runned the test for myself, I am very confident of the result for myself. But in the same way I don't blindly trust tests made by others, I don't expect others to blindly trust myself. It is not a problem for me if you disagree, it's a problem for me if I made mistake ... like not using the optimal settings for iTunes AAC. (but my test is valid within the setting I used, which is iTunes for Windows default import setting, I will edit the table to make it clearer for mac OS users)

For me, it is nonsense to make quality claims within the same area of flaw. I cannot honestly tell what is better between to two orange/medium or two yellow/light artefacts. But if you tell me that I cannot tell that a sample I marked as red sound worst than a codec I marked as green. It is such an evidence that it is false, that I can only disagree. I didn't rate the sample from 1 to 5 because I consider that its to wide to be honest so I rated as 1/2/3. The fact that my scale is smaller means that the difference between rating is higher. Trust me, red is awfull & yellow shouldn't be ABXable for anyone without ABXing experience.

I am pretty confident that I can make some quality claims because I am very confident that I was able to find the right anchor in the first place. I agree this is very un-scientific & personnal ... but I cannot disagree with myself, I am not yet schizophrenic wink.gif It's your job to disagree !
rpp3po
@Alex B: Calling all lossy codecs oblivious was maybe to general. MP3 does in fact know frame interdependencies. So frame 3 could be depending on frame 2's content. I doubt that more than a quarter of a second would make a difference, but I don't know the actual implementation. Anything concerning rate control is rather messy compared to AAC anyway, in my opinion. AAC doesn't know frame interdependencies, so you wouldn't need leading silence for these kind of tests. I don't know how this is handled in Vorbis.
menno
QUOTE (rpp3po @ Mar 20 2009, 13:12) *
@Alex B: Calling all lossy codecs oblivious was maybe to general. MP3 does in fact know frame interdependencies. So frame 3 could be depending on frame 2's content. I doubt that more than a quarter of a second would make a difference, but I don't know the actual implementation. Anything concerning rate control is rather messy compared to AAC anyway, in my opinion. AAC doesn't know frame interdependencies, so you wouldn't need leading silence for these kind of tests. I don't know how this is handled in Vorbis.


You're mistaken here, an AAC decoder has very minimal inter frame dependencies (but not none), but nothing is stopping an encoder from keeping track of, and using, a lot of past (or even future) data, as long as the bitstream conforms.
rpp3po
QUOTE (menno @ Mar 20 2009, 22:22) *
You're mistaken here, an AAC decoder has very minimal inter frame dependencies (but not none)...


You must know. Just out of curiosity, which would that be?
menno
QUOTE (rpp3po @ Mar 20 2009, 13:28) *
QUOTE (menno @ Mar 20 2009, 22:22) *
You're mistaken here, an AAC decoder has very minimal inter frame dependencies (but not none)...


You must know. Just out of curiosity, which would that be?


For LC it is only overlap and add in the filterbank, but this has no influence as long as the frames are presented to the decoder in the same order as the encoder output them. Then there is inter frame prediction for MAIN profile (who uses that). And for SBR there is a header with some configuration data emitted only once so many frames, as well as a lot of influence on parameters from previous frames.
rpp3po
I am sad to report that the Kraftwerk sample just miserably failed my ABX for Quicktime AAC in all versions up to 274 kbit/s (256kbit/s constrained VBR/iTunes Plus), so even the ones with corrected bitrate:

CODE
foo_abx 1.3.3 report
foobar2000 v0.9.6.2
2009/03/20 23:29:36

File A: Y:\Downloads\01- DCT Killer Samples (Lossless)\01- Artefact+Context\QT7.6_VBR_Target-Bitrate\04- Kraftwerk (Artefact+Context) QT7.6_256kbs_VBR_constrained.m4a
File B: Y:\Downloads\01- DCT Killer Samples (Lossless)-1\01- Artefact+Context\04- Kraftwerk (Artefact+Context) Lossless.flac

23:29:36 : Test started.
23:30:33 : 01/01  50.0%
23:30:41 : 02/02  25.0%
23:30:59 : 03/03  12.5%
23:31:12 : 03/04  31.3%
23:31:30 : 04/05  18.8%
23:31:38 : 05/06  10.9%
23:31:58 : 06/07  6.3%
23:32:13 : 07/08  3.5%
23:32:21 : 08/09  2.0%
23:32:39 : 09/10  1.1%
23:32:51 : 10/11  0.6%
23:33:08 : 11/12  0.3%
23:33:20 : 12/13  0.2%
23:33:29 : 13/14  0.1%
23:33:31 : Test finished.

----------
Total: 13/14 (0.1%)


It's instantly noticeable, just try it yourself. The synth sound in the middle part (from 00:01) is completely muffled.

And I can also completely reproduce sauvage78's findings that Nero is already transparent at q .4:

CODE
foo_abx 1.3.3 report
foobar2000 v0.9.6.2
2009/03/20 23:57:41

File A: Y:\Downloads\03- Nero AAC 1.3.3.0\04- Kraftwerk\04- Kraftwerk (Artefact Only) (Duplicated Right Channel) Lossless.flac
File B: Y:\Downloads\03- Nero AAC 1.3.3.0\04- Kraftwerk\04- Kraftwerk (Artefact Only) (Duplicated Right Channel) Nero AAC 1.3.3.0 Q0.40.mp4

23:57:41 : Test started.
23:58:58 : 00/01  100.0%
23:59:23 : 01/02  75.0%
23:59:47 : 01/03  87.5%
00:00:00 : 01/04  93.8%
00:00:14 : 01/05  96.9%
00:00:28 : 01/06  98.4%
00:00:40 : 02/07  93.8%
00:00:51 : 03/08  85.5%
00:01:01 : 03/09  91.0%
00:01:12 : 04/10  82.8%
00:01:20 : 04/11  88.7%
00:01:27 : 05/12  80.6%
00:01:35 : 05/13  86.7%
00:01:39 : Test finished.

----------
Total: 5/13 (86.7%)
C.R.Helmrich
My two cents on the behavior of a codec during the first 1 or 2 seconds of audio, in short form:

At the beginning of an encoding, the bit reservoir is full (since previously, there was no audio)
=> The bit reservoir can be "drained" more aggressively than in the middle of an encoding
=> more than the targeted average bits per time can be spent
=> the first few frames (or tenths of a seconds) most likely sound better than later audio parts

Of course, this only applies to CBR coding. A VBR codec doesn't have to enforce a very strict bit rate per second (or per x frames), so there is no, or a very lenient, bit reservoir.
=> For VBR, quality should be the same for the first few frames and later frames.

Still, I recommend using samples longer than 1 or 2 seconds because, as said, an encoder might need a few frames to adjust to the input, and because our hearing also needs some time to get accustomed to the stimulus (especially if it's something noisy and transient like sauvage78's test set and there is a distinct click/pop when looping a test item).

sauvage78, which items did you use for ABXing? The artefact+context, or the artefact-only? The former ones are long enough, the latter ones not, in my opinion.
/mnt
Wow i didn't know that ogg autov can have some really serious precho problems.

CODE
foo_abx 1.3.3 report
foobar2000 v0.9.6.3
2009/03/21 00:36:40

File A: C:\Downloads\02__aoTuV_Beta5.7\02- aoTuV Beta5.7\04- Kraftwerk\04- Kraftwerk (Artefact Only) aoTuV Beta5.7 256Kbps.ogg
File B: C:\Downloads\02__aoTuV_Beta5.7\02- aoTuV Beta5.7\04- Kraftwerk\04- Kraftwerk (Artefact Only) Lossless.flac

00:36:40 : Test started.
00:36:56 : 01/01 50.0%
00:37:00 : 02/02 25.0%
00:37:18 : 02/03 50.0%
00:37:21 : 03/04 31.3%
00:37:25 : 04/05 18.8%
00:37:30 : 05/06 10.9%
00:37:35 : 06/07 6.3%
00:37:39 : 07/08 3.5%
00:37:42 : 08/09 2.0%
00:37:47 : 09/10 1.1%
00:37:50 : 10/11 0.6%
00:37:55 : 11/12 0.3%
00:37:59 : 12/13 0.2%
00:38:10 : 13/14 0.1%
00:38:13 : 14/15 0.0%
00:38:18 : 15/16 0.0%
00:38:22 : 16/17 0.0%
00:38:27 : 17/18 0.0%
00:38:30 : 18/19 0.0%
00:38:33 : 19/20 0.0%
00:38:38 : 20/21 0.0%
00:38:44 : Test finished.

----------
Total: 20/21 (0.0%)


Precho all the way through the synth, causing smearing and making it sound muffed up. Pretty bad for a 358kbps file.

I can also confirm that iTunes AAC has the same problem aswell.

CODE
foo_abx 1.3.3 report
foobar2000 v0.9.6.3
2009/03/21 00:30:22

File A: C:\Downloads\04__iTunes_AAC_8.1.0.52\04- iTunes AAC 8.1.0.52\04- Kraftwerk\04- Kraftwerk (Artefact Only) (Duplicated Right Channel) iTunes 8.1.0.52, QuickTime 7.6 256Kbps VBR.m4a
File B: C:\Downloads\04__iTunes_AAC_8.1.0.52\04- iTunes AAC 8.1.0.52\04- Kraftwerk\04- Kraftwerk (Artefact Only) (Duplicated Right Channel) Lossless.flac

00:30:22 : Test started.
00:30:30 : 01/01 50.0%
00:30:34 : 02/02 25.0%
00:30:42 : 02/03 50.0%
00:31:05 : 03/04 31.3%
00:31:09 : 04/05 18.8%
00:31:12 : 04/06 34.4%
00:31:15 : 05/07 22.7%
00:31:19 : 06/08 14.5%
00:31:25 : 07/09 9.0%
00:31:32 : 08/10 5.5%
00:31:36 : 09/11 3.3%
00:31:39 : 10/12 1.9%
00:31:44 : 11/13 1.1%
00:31:50 : 12/14 0.6%
00:31:55 : 13/15 0.4%
00:31:59 : 14/16 0.2%
00:32:03 : 15/17 0.1%
00:32:07 : 16/18 0.1%
00:32:12 : 17/19 0.0%
00:32:15 : 18/20 0.0%
00:32:20 : 19/21 0.0%
00:32:26 : 20/22 0.0%
00:32:29 : 21/23 0.0%
00:32:40 : 22/24 0.0%
00:32:46 : 23/25 0.0%
00:32:52 : 24/26 0.0%
00:32:58 : Test finished.

----------
Total: 24/26 (0.0%)


Same problem.
sauvage78
rpp3po:
I have re-installed Itunes in order to see if I did anything wrong, here is the encoder/setting that I used
Preferences/Import Settings/Custom then I only changed the bitrate.



I tried to export using the free QuickTime Player bundle with the windows Itunes version, here is what I get: buy the pro version ;(

so I think the true-VBR & constrained-VBR are only available for quicktime pro users, it may be free for Mac OS users has it may come bundle with their OS, I don't know (I never touched an apple of my whole life), from what I gathered there is 4 encoding mode for quicktime AAC:

0 - Constant Bit Rate (CBR)
1 - Average Bit Rate (ABR)
2 - Variable Bit Rate Constrained (VBR Constrained)
3 - Variable Bit Rate (VBR)

obviously if what you call true-VBR is the 4th mode, my files are neither true-vbr or VBR Constrained, but I think it should be either CBR or ABR, can someone (rpp3po maybe) using a Mac & having access to QuickTime Player pro encode the lossless version to CBR & ABR & tell me if the bitrate matches with mine. As there was a radio button for VBR on windows (see the screenshot) I called my files VBR, but I think it must be ABR.

I need a confirmation so that I know exactly what I ABXed & edit the table accordingly. Thks a lot to anyone who solve this mystery wink.gif

C.R.Helmrich:
I used the short version as marked within the ABX logs. But if this is a real problem I will find it, give me some time. I need to know as I intend to find even more short samples with artefact to make the ABXing time shorter.
rpp3po
I have this already cleared up here. In iTunes you have the chance between VBR constrained and ABR. Maybe the bitrate confusion comes from the fact that your listed bitrates are for artifact only encodes and mine for artifact+context? A too short sample shouldn't really worsen a codec's ability to prevent artifacts, but you can't really compare bitrates for such ultra short clips.

When you want to try the QT pro version just for this test and not for anything else you may google for pablo/nop. wink.gif
sauvage78
Thks a lot, I get what you meant with your googling thing wink.gif but it is not my intention to test codecs which are not free for us mortal windows users. I will re-focus on aotuv vs. nero aac because this is where my interest really stands. I will edit VBR to VBR constrained in the table.
DigitalDictator
What are you talking about? They are all free. Maybe not Quicktime Pro, but iTunes is, and so are the rest.
knucklehead
QUOTE (DigitalDictator @ Mar 21 2009, 18:20) *
What are you talking about? They are all free. Maybe not Quicktime Pro, but iTunes is, and so are the rest.


You need to read all the posts.

If you have a Mac, you can actually get all the options in a very convenient way for free.

Some folks seem to have real problems with the idea of using a Mac.

Some of those ideas might be rational,

some perhaps a bit less than rational....

ktf
I was playing around with Vorbis today, and I found another sample which you might find interesting. It is lossless, directly rendered from the trial of FL Studio. The problem is, when coded with vorbis (aoTuV, latest beta) at quality 2, you can hear distortion in the foreground synth.

Edit: typo
sauvage78
I tried to catch something but I failed ... can you tell me when & what to listen to more exactly, I focused on synth at the beginning/middle & end, I found nothing.

Don't worry this happens often wink.gif this morning after discovering the eig sample from Mo0zOoH, I tried to ABX his two other samples:

Autechre — [Gantz Graf EP #01] Gantz Graf [3:58];
Nine Inch Nails — [Quake OST #01] Quake Theme [5:08].

from this thead:
Mo0zOoH's problem samples

I cannot ABX the first one at all & I can ABX the second one but only at lame V7 so it wasn't worth it.

That's why it's hard to put such a test together, first you must waste time listening to things that others can hear while you cannot... It's boring & frustrating.
Thks anyway.

Edit:
In the future, I will split the table in two as, if I will add new samples for sure to this test (Ministry & Abfahrt Hinwil are planned, I already know that Lame MP3 will have 2 medium artefacts at V2 on these). I do not plan to test the new samples on iTunes & Musepack, for various reasons (not only audio quality) I don't think that these codecs worth that I spend time on them. I think the same of Lame MP3, but actually Lame MP3 is a good anchor & is usefull to identify the artefacts so I decided to keep Lame MP3 even if I will never use it personnaly. I will focus on Vorbis/Nero/Lame & lately Lossy|Flac. Also, I want people to be able to test for themselve. So I want them to be able to download the samples, but there is an upload limit that I am already almost reaching. So when I will add new samples I will re-organize my files to remove lossless doublons in the archive & gain some space. I want this thread to be heavyly oriented toward vorbis, so that, maybe one day, its flaws get fixed.
DigitalDictator
QUOTE (knucklehead @ Mar 22 2009, 04:12) *
QUOTE (DigitalDictator @ Mar 21 2009, 18:20) *
What are you talking about? They are all free. Maybe not Quicktime Pro, but iTunes is, and so are the rest.


You need to read all the posts.

Sure I read all the posts. I just think he confuses codecs with applications and platforms. It's still the same codec AFAIK. Not important anyway.
This is a "lo-fi" version of our main content. To view the full version with more information, formatting and images, please click here.
Invision Power Board © 2001-2009 Invision Power Services, Inc.