Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: Multi-Codec Listening Test: 96-128-192-256Kbps (Read 61341 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

Multi-Codec Listening Test: 96-128-192-256Kbps

I searched back on the forum & on various samples databases for the worst problem samples I could found for vorbis & I edited with audacity the ones that I was able to ABX, usually from 10-30 sec to 1-2 sec, I even duplicated the channel from mono to stereo if it happens that the problem was only on the right or left channel.

I ended with 5 very short truly killer samples which I ABXed at various level to see if higher bitrates were transparent or not, here is the result:





All this samples comes from real CD, but they are so focused on the artefact that I may gives newbies a bad idea of vorbis,
so let it be clear aoTuV Beta5.7 is a very good lossy codec. I tested on aoTuV because it is IMHO the best overall codec around. (Edit: After adding other codecs, sadly I don't think so anymore)
Using the best DCT codec around was simply a way for me to find the worst killer samples around.
This is more a ritual killing of vorbis, gathering all my (small) knowledge against it, than a normal ABX test. I hit exactly where it hurts. So it is normal that it is painfull for vorbis.

So if I ABX aoTuV at up to -q8, it means that other lossy codecs will probably fail even worst (specially MP3) (Edit: not true). It doesn't mean that vorbis sounds bad at all on average music.
It only means that there is no lossy codec which is transparent on every samples, that simply doesn't exist, no matter the bitrate.

But, in some way, this test is made to prove that people using lossy at overkill bitrate are not protected from problem samples.
I don't even have golden ears, I can't ear above 18Khz...

You can run the test for yourself everything is included in the attached archive. Every samples at every bitrates. Everything is already encoded & replaygained, all you have to do is to listen & compare your results against mine, my logs are included (when successfull). You can test your earing very quickly because the samples are very short. I prepared everything to ease your life.

I didn't discovered these samples, none of them is new. But I renamed some of them for my own use (I named each sample by its artist) & also because they have been so shortened that they are sometimes unrecognizable.
Anyone collecting problem samples should have a look ... these 5 samples really worth it. I spend a couple of hours just to edit them & focus on pure artefact.
You don't have to point other people to the artefact inside the sample, it IS the artefact.
In fact my only goal with this listening test wasn't to test vorbis, but to start a small collection of killer samples for any DCT codec in order to know where they fail.
So I absolutly don't care if vorbis sounds good or bad ... I use 100% lossless actually.

Indeed this test is dedicated to Monty & Aoyumi. Thks for vorbis I hope it will help you improve the codec.

PS:
If you know the origin (Artist-Album ...) of the Castanets sample, I am interested by this information.
If you know very good killer samples, I am interested. Plz describe & post them so that I can add them to my test next time.
If ever Gabriel or Roberto reads this plz fix http://lame.sourceforge.net/quality.php (I get a 403 error when I try to download samples)

Edit1: Inclusion of Average Bitrate in the table for later comparison to others codecs.
Edit2: Added Nero AAC
Edit3: Added Musepack
Edit4: Added Lame MP3
Edit5: Added Itunes AAC
Edit6: Added Abfahrt Hinwil Sample
Edit7: Errata N°1
Edit8: Added aoTuV exp-bs1, Errata N°2
Edit9: Added Celt

Multi-Codec Listening Test: 96-128-192-256Kbps

Reply #1
... It only means that there is no lossy codec which is transparent on every samples, that simply doesn't exist, no matter the bitrate. ...

I see you're using lossyWAV at -P quality. At a bitrate like that or higher and with a well-designed codec that keeps the straightforward signal path of the pcm data I think we can beleive that audible deviation from the original is very subtle if audible at all.
It's quite astonishing however that you are able to ABX vorbis at 256 kbps with an annoying result.
lame3995o -Q1.7 --lowpass 17

Multi-Codec Listening Test: 96-128-192-256Kbps

Reply #2
I bump my own thread just to let people know that I added Nero AAC to the comparison.

As you can see Nero AAC shines but that doesn't mean Vorbis is bad, because I selected problem samples specific to vorbis & then tested it on Nero AAC, so it is unfair for vorbis.
I admit I don't like MPEG ... but that said I was impressed by Nero Q0,55 ...

Multi-Codec Listening Test: 96-128-192-256Kbps

Reply #3
Thank you for your efforts. Listening tests take time and dedication! I use aoTuV in my Vorbis streaming component for foobar2000, so this affects me as well.

Have you sent aoyumi a message linking to your test? He would probably be interested.

Finally, would you consider testing -q10 as well? I would like to know if these artifacts cover the whole breadth of Vorbis.

Multi-Codec Listening Test: 96-128-192-256Kbps

Reply #4
Nice work! Thank you!

It anyway kind of damages Vorbis somehow, in my opinion, to see it failing this way. I know, these specifically are targeted problem samples and every lossy codec probably has some skeletons in the closet, but some of them are already quite old (like Castanets) and still don't work. In contrast the Nero devs, for example, were able to straighten out most critical samples thrown at them here at HA over the years.

I really would have liked to see Vorbis succede. Even when it produced artifacts at lower bitrates, I always found them more pleasing / less annoying than those of other codecs. I don't know why. Albeit being an excellent codec, Vorbis missed the wave of momentum, it would have needed, a couple of years ago in my opinion and was overhauled by LAME-MP3 and AAC. A phone/player manufacturer can probably satisfy far more than 90% of its user base by just by supporting those two formats. Vorbis' Tremor showing serious quality problems on some hardware players probably didn't help either. Maybe that all reverses if Vorbis gets included in the next W3C spec, but I don't know about the current state.

Multi-Codec Listening Test: 96-128-192-256Kbps

Reply #5
Yes a very impressive test sauvage78.

Any chance of adding Lame3.98 to the comparison.

Multi-Codec Listening Test: 96-128-192-256Kbps

Reply #6
Thanks for your effort. Excellent visualisation of results!

If you have time and are interested, i think latest iTunes / Quicktime AAC would be good addition to comparison. It'd be interesting to see how well it performs against Nero AAC now because in previous tests iTunes was slightly better than Nero.

Multi-Codec Listening Test: 96-128-192-256Kbps

Reply #7
Thanks for conducting this test, sauvage78! I hope your hard work will help contribute to the advancement of both codecs.


I searched back on the forum & on various samples databases for the worst problem samples I could found for vorbis & I edited with audacity the ones that I was able to ABX, usually from 10-30 sec to 1-2 sec, I even duplicated the channel from mono to stereo if it happens that the problem was only on the right or left channel.


It would be very interested to see how Vorbis performs on samples known to be problematic for the Nero AAC encoder.

Multi-Codec Listening Test: 96-128-192-256Kbps

Reply #8
I just added Musepack.

I consider this thread evolutive, so I may add other codecs & other killer samples later as my mood changes or my knowledge increases.

I specially plan to test:
1: Official Vorbis (all range) & Aotuv Q10
2: Lossy|Flac at --portable & a setting lower near 256Kbps (maybe --zero), but not higher (Add Ginnungagap sample)
3: Lame MP3 (not a priority, but usefull as a reference)

... actually I don't plan to test Itunes for the simple reason that I don't want to install it. Even if it would be good, I want a CLI encoder so Itunes is useless for my personnal needs.
If ever my system gets broken, I may test it before I re-install... but it will be much later (maybe when windows seven comes out)
no matter Itunes audio quality I consider Nero AAC better for personnal use due to the fact that Nero is both CLI & already transparent at ~192Kbps, so testing Itunes would only be usefull to get an idea of the quality provided by the Music Store which I don't use. Even if Itunes would beat Nero, it would be a marginal quality gain (& only at 128Kbps & below maybe) for the huge pain of using Itunes.

I don't promise anything as I am short of time. I do it for myself, I only publish it as the job is done honestly, to the best of my knowledge, so it can as well be public. And I also do it in case I would have done something wrong & to help developers. But I must say I don't care much about the opinion of others. I know what I hear, I spot artefact before beginning ABXing. I don't ABX randomly at all, that's why my score is almost always 100% success or 100% failure. It's never "I think that I hear something" except in very rare case noted as yellow. I am 100% sure that I can redo the test & get the exact same result except for yellow results.

Currently I consider Kraftwerk & Rush samples (specially the Rush sample) as Vorbis bugs that could be easyly fixed IMHO.
So far I never heard the Rush bug outside of vorbis. (I hear a toad in the background ...). It happens that I really love that Rush's song so, it is important that this particular bug gets fixed.

Concerning Musepack, its huge problem is not its quality at high bitrate but its lack of flexibility. It doesn't matter if you can now put it in MKV, guys from Doom9 would never use a codec which perform so badly at mid/low bitrate. Even at 128Kbps Musepack doesn't compete with Vorbis/AAC (even if the table doesn't show it as both looks almost tied, aotuv q4 beats musepack radio as the artefact of aotuv is always softer) ... Musepack is just an improvement compared to Lame ~192Kbps IMHO (Edit: After testing Lame MP3, not even true ...). Musepack seems to beats aotuv in the table but this is due to the fact that the samples are heavyly targetted at vorbis' flaw. I am confident that once vorbis bugs will get fixed, vorbis -q6 will beat musepack extreme ... or last be tied at high bitrate & beat it at mid/low bitrates.

Also the 10/12 on on nero Castanets Q0.35 is due to boredom & lack of focus, I am 100% sure that I can get a 8/8 there as I get 8/8 at Q0.40 which doesn't seem logic. But I don't care about re-trying as I am sure of myself here.

Canar:
for a confirmation at Q10 see the original thread of the Rush sample problem:
http://www.hydrogenaudio.org/forums/index....showtopic=44862
I am very confident that I can get the same score.

Edit1: Some thoughts about my methodology.
I have chosen 8 trials, because I wanted a number that would not be too long to ABX & that would prevent me from a row of lucky guess. From my experience it happens easyly that sometimes you guess up to 4 times in a row, starting at 5 or 6 success in row it becomes unlikely that you were guessing. At 7 or 8 success in a row I consider the result valid. I had hesitations between 8 & 12 trials because at 12/12 you don't have even the shadow of a doubt. In the end, I decided to split the apple in two. First I do 8 trials & see how confident I am in the validity of the result. If I get 8/8 I consider the result valid, specially if I can identify what I listen to. If I get a result of 5/8 or less, I consider the test a failure, specially if I cannot identify what I listen to. If I get 6 or 7 out of 8 trials, I go up to 12 trials. Then If I get at last 10 success out of 12 trials I consider the test valid specially if I know what I listen to. If I get 9 success out of 12 trials, I consider the test invalid specially if I don't know what I listen to.
It never happened that I would have a result of 9/12 while knowing what I was listening to, I have chosen & edited my samples specially in order that it never happens.
Surprisingly, what did happen is that I was able to get a 8/8 result while not knowing what I was listening to, such case were very rare (1 or 2) & marked as yellow (but not every yellow result is in this case). I consider them valid as I consider that there is an overall very slight modification in the audio without being a flaw that you can point out. For me it doesn't necessary means that I was guessing, it means that I was very very close to the transparency point. Also sometimes it happens that it sounds different but not bad, just slightly different. I may re-test yellow results later, but yellow means the quality is good anyway, so it may not be worth the headache.

Edit2: If ever a moderator read this plz edit the tittle
from Listening Test: aoTuV Beta5.7 on 5 Killer Samples, 96-128-192-256Kbps (17 ABX Log+Files)
to Multi-Codec Listening Test: 96-128-192-256Kbps, Killer Samples targetting vorbis (with logs)
Thks

Multi-Codec Listening Test: 96-128-192-256Kbps

Reply #9
Quote
I specially plan to test:
1: Official Vorbis (all range) & Aotuv Q10
2: Lossy|Flac at --portable & a setting lower near 256Kbps (maybe --zero), but not higher (Add Ginnungagap sample)
3: Lame MP3 (not a priority, but usefull as a reference)


Great, looking forward to see those. Especially the lame reference to put the test into perspective for me (as an mp3 user).

Multi-Codec Listening Test: 96-128-192-256Kbps

Reply #10
sauvage78:
Thank you for the test.

somebody:
I can't improve these problems immediately. Because I wrestle with another problem slowly.
On the other hand, I write views for motivated somebody.

Kraftwerk/Rush/Autechre
Please improve block switching. It is the cause that switches from short to long are too early as for those problem to happen at the high bit-rate.  But the method to change the threshold simply is inefficient.

Harlem
It is point stereo in fact to influence a sound. In addition, there are some problems for block switching.

All the best!

Multi-Codec Listening Test: 96-128-192-256Kbps

Reply #11
Edit2: If ever a moderator read this plz edit the tittle
from Listening Test: aoTuV Beta5.7 on 5 Killer Samples, 96-128-192-256Kbps (17 ABX Log+Files)
to Multi-Codec Listening Test: 96-128-192-256Kbps, Killer Samples targetting vorbis (with logs)
Done-da-dun-dun-DOOOOONE

Multi-Codec Listening Test: 96-128-192-256Kbps

Reply #12
As several people requested it, I added Lame MP3 sooner than I thought.
I must say I didn't expected Lame MP3 to compete so good.

I don't plan to add any other tests before some time now because I am bored & I don't have the time anyway. So I will draw my own personnal conclusion.
According to me:

at low/mid bitrate (96/128Kbps)
1: Nero
2: Vorbis/Lame (tied but only due to vorbis bugs, overall I still favor vorbis)
3: Musepack (not even in the competition)

at high bitrate (192/256Kbps)
almost all tied, except vorbis which has serious problems that are not normal at this bitrate.

I don't know if I must be happy or sad that Lame competes so well, because it means that, most likely, nobody will ever code an open source AAC codec as good as what x264 is for AVC in the video codec world. I was secretly wishing that Lame MP3 would be awfull

Overall, I didn't expected that there would be such a big difference between Vorbis & the other codecs. I knew I was hurting vorbis but I thought that, as tranform codecs, both Nero & Lame would suffer too (particulary Lame) ... It just didn't happen. In the same way, I knew Musepack was bad at low bitrates as it is not designed/optimized for it, but I thought that, due to the fact the it is a subband codec (& also influenced by 128Kbps comparisons where it was not bad at all), it would maybe not suffer too much at mid bitrate. I noticed a variation within the nature of the artefacts (Musepack smears much more than others codecs at 96/128Kbps) but Musepack was hurted very badly by my samples too. The fact that a codec is a transform or a subband codec alone is not enougth to tell anything about the quality of a codec IMHO. As long as the limit of the technology is not reached, the implementation is much more important than the technology used. That's why Lame just doesn't want to die. According to me, the claim that Musepack would be as good as Nero/Vorbis at 128Kbps is not true. I used to think that Musepack was better than Lame, I don't think so anymore. Maybe there was a time it was true, I don't know.

I also thought that the difference between AAC & MP3 would have been bigger. Even if both seems often tied within the table, AAC is always better qualitywise. When both Nero AAC & Lame MP3 are ABXable & tied, the artefact is always softer with Nero AAC.

The only reason for not using Nero (I don't) is that it is patented & closed. Qualitywise it is brilliant. Congratulations to Ivan & Co, I wish vorbis would be as good as Nero AAC.

Personnally before this test I thought that I would maybe encode some of my rips ... after this test I am back to lossless. ... ignorance is bless.

PS: Thks for the tittle Canar.

Note: In all honesty, this test was conducted by an anti-mp3 & an anti-mpc user. Concerning Lame I think that it is more than time for a Lame AAC version. Concerning Musepack, I admit I never understood the Musepack fanatism. Nowadays there is no rationnal reason to use Musepack. Anyway I think that I wasn't unfair with any of these codecs even if I dislike them much.

Multi-Codec Listening Test: 96-128-192-256Kbps

Reply #13
I hoped you alse test lossyWAV --portable. (Settings at ~ 256 kbps which you wrote you also wanted to test are not very useful IMO though it would be useful to test at low -q settings, for instance -q 1.0 or -q 1.5 -V.)
Maybe - in order to keep your hard work restricted - you can test just -q 1.5 -V. That would be great.
lame3995o -Q1.7 --lowpass 17

Multi-Codec Listening Test: 96-128-192-256Kbps

Reply #14
-V is not required for beta v1.1.3e - it is the default (and only) spreading function.
lossyWAV -q X -a 4 -s h -A --feedback 2 --limit 15848 --scale 0.5 | FLAC -5 -e -p -b 512 -P=4096 -S- (having set foobar to output 24-bit PCM; scaling by 0.5 gives the ANS headroom to work)

Multi-Codec Listening Test: 96-128-192-256Kbps

Reply #15
Thanks for the test! Very informative and nice table!

Quote
... actually I don't plan to test Itunes for the simple reason that I don't want to install it. Even if it would be good, I want a CLI encoder so Itunes is useless for my personnal needs.

Can't someone encode the iTunes samples and send them to you? then you don't have to install the program. I'm also interested in how iTunes compares with Nero.

Lame seems to do a pretty good job as well, right?
//From the barren lands of the Northsmen

Multi-Codec Listening Test: 96-128-192-256Kbps

Reply #16
halb27:
I will not test lossywav --zero in order to see if it sounds good but in order to see how different sound the artefacts produced from usual DCT artefacts & also to see if DCT problem samples affect it in some way. Following the same idea, I plan to test Ginnungagap to see how DCT codecs react to it. Maybe everything will be transparent due to different technology, I can't know if I don't test. All I know is that I don't rely on other anymore to tell me what sounds good. I didn't follow lossywav development lately as it is too technical for me. (specially as I am not a native english speaker).
Nick asked me to test his new spreading function, I have interest in doing so. (Edit: well it seems it's too late) But I don't use neither vorbis nor lossy|flac on a daily basis actually so I will test when I get some time & if it's not too late. The problem with lossywav is also that it is very hard to ABX, so far I am only able to ABX it on Ginnungagap or at very low setting which are not supposed to be transparent anyway. I quickly tried the samples provided by the guy who could ABX --portable, I wasn't able to ABX his samples. The more the codec is transparent the longer it takes to ABX it, that why I tested pure lossy first. The yellow & failed ABX trials take much more time & are much more boring than orange & red results. I am not nuts & paranoid, I don't even try to ABX transparent audio randomly. Autechre & Ginnungagap produce very similar artefact, that why I want to compare these two sample particulary. My interest in testing lossy|flac will rise again as my HDD space will decrease, but it can be months before I come back to lossywav, I have to study ... sorry. I still think lossywav is great, even if it's a little to big for my taste

Quote DigitalDictator:
"Lame seems to do a pretty good job as well, right?"
Yes, I was surprised by its quality. (specially at V7 which I expected to be in the red zone)

well in all honesty I don't plan to test itunes anytime soon even if I had the pre-encoded samples  I will never use this codec for my personnal use.

Multi-Codec Listening Test: 96-128-192-256Kbps

Reply #17
Quote
well in all honesty I don't plan to test itunes anytime soon even if I had the pre-encoded samples wink.gif I will never use this codec for my personnal use.

Too bad, there are many out here who uses iTunes and who are really interested in the comparison between the Itunes codec and others. So if we ask nicely? 
//From the barren lands of the Northsmen

Multi-Codec Listening Test: 96-128-192-256Kbps

Reply #18
I just added Itunes AAC,
I must say I was very dispointed by the results. I expected it to be much better, specially due to its reputation at mid bitrates. I was hoping that maybe it would be better than Nero AAC at 128Kbps ... I was far from the reality ... overall even Lame MP3 beats Itunes AAC which was quite a big surprise to me ... also it is affected by the Krafwerk sample while Nero is not. Because its the same technology as Nero AAC I didn't expected Itunes AAC to fail on this sample.

DigitalDictator:
I didn't test it for you but because I realized I never tested the VBR version of the Itunes AAC codecs, last time I installed such a terrible software on my machine it was Itunes Version 4.0 with CBR only ...
this is the first & last time I test Itunes AAC, not only it is a software for children, (70meg, 5 firewall alerts, 4 folders in Program Files, dead keys in registry) but it sounds very average. Not specially bad (except Krafwerk), not specially good ...

Anyway it's done & I know what it worths. It is not such a bad codec overall, afterall at low birates it beats Musepack & at high bitrates it beats Vorbis ... but in the context of the AAC codecs battle. Nero wins by a good margin qualitywise & its CLI is so much friendly.

When I started my test I didn't expected it to be such a triumph for Nero AAC (& to some extend for Lame MP3 too) ... to my dismay Vorbis is losing grounds. Vorbis is only very good for streaming, because its artefacts are usually soft at low bitrates. For webmasters vorbis is great, but for CD archiving, it's not such a good option. At least actually, ... I hope it can be fixed.

Don't even ask for any codec addition for ages. I didn't count but it takes more than 3 hours just for one codec. Each time I need to organize the files (encode/rename), do the ABX test, do the table, edit the screenshot, re-up the PNG on imageshack, edit the topic ... & I don't count the time spent to find & edit samples that I could ABX ... I'd rather add new killer samples than add new codecs now.

Just take what I give as it comes & be happy with it ... or leave  I am BORED of ABXing !!!

Edit: I added the result table to the uploads, it's in openoffice .odt format, in case anyone is willing to run the same test & publish his results it will save him some time maybe ...

Multi-Codec Listening Test: 96-128-192-256Kbps

Reply #19
The bitrates on the Kraftwerk sample on the QT files seem awfully low, and different from what I get when encoding it with QT 7.6.

Edit: I see you used a "custom" sample and I guess you must have used the custom 256 vbr setting rather than iTunes Plus too.. my mistake!

Multi-Codec Listening Test: 96-128-192-256Kbps

Reply #20
Yes, it is the same sample but edited (shortened & channel with the artefact duplicated to make it stereo) to focus on the specific artefact that I could heard.
You are right the bitrate is low, but it's not a problem with my sample, it is an Itunes bug. With a low bitrate too (72Kbps) Musepack achieve transparency on the same sample.
You can download all the samples (both lossy & lossless), everything is in the archives.

Multi-Codec Listening Test: 96-128-192-256Kbps

Reply #21
Yes, it is the same sample but edited (shortened & channel with the artefact duplicated to make it stereo) to focus on the specific artefact that I could heard.
You are right the bitrate is low, but it's not a problem with my sample, it is an Itunes bug. With a low bitrate too (72Kbps) Musepack achieve transparency on the same sample.
You can download all the samples (both lossy & lossless), everything is in the archives.


Yeah I tried the "regular" kraftwerk sample and the bitrates was significantly higher on all settings, which is why I got confused. I also got a few kbps' off on the 256kbps samples but I encoded with the default iTunes Plus setting which use the maximum quality setting of the encoder. I'm not sure if the Plus files would sound much different tough

Multi-Codec Listening Test: 96-128-192-256Kbps

Reply #22
Your Quicktime 7.6 evaluation is definitely flawed. The bitrates don't match and are partly totally amiss. Your ABX findings are also inexplicable.

Attached you'll find the correct encodings (QT 7.6) for target bitrate mode (constrained VBR/iTunes) and, where Quicktime 7.6 really excels, target quality mode (true VBR).

In true VBR mode Quicktime considers the following bitrates sufficient at the highest quality level:

Autechre: 222kbit/s
Castanets: 200kbit/s
Harlem: 216kbit/s
Kraftwerk: 298kbit/s
Rush: 196kbit/s

All are quite high on average, even for the highest Q setting. Using the same setting QT averages at about 185kbit/s over my whole music collection.

So Quicktime correctly identifies problematic content and adjusts bitrate accordingly. That your own Kraftwerk sample shows 162 kbit/s for the 256kbit/s constrained encode seems quite off the mark.

AAC is inherently a VBR format. Forcing it into certrain bitrates is really not helpful. With MP3 the latter at least increased compatibility, but that's not the case for AAC. Just let it flow at Q127 and it will give you both very small and very large files at a very reasonable total average.

Multi-Codec Listening Test: 96-128-192-256Kbps

Reply #23
I didn't use quicktime I used the Itunes interface & change the importation settings which were 256Kbps VBR by default to 96-128-192-256Kbps VBR ... maybe there is another AAC encoder or advanced settings in quicktime ... I didn't have much paramaters within Itunes. I don't use Itunes at all for my personnal use so maybe I used the wrong software simply ... I didn't have any True-VBR or ABR options.
I am looking at it. Unfortunatly I have already get rid of Itunes  My HDD is allergic to it ... 20 dead keys after desinstallation & a firefox plugin I never asked it to install ...

Edit: Can I do it with the quicktime installed by Itunes ? a few years ago quicktime wasn't a freeware if I recall well.

Multi-Codec Listening Test: 96-128-192-256Kbps

Reply #24
Yes, sadly iTunes on Windows is a real pain compared to OS X.