Nine different codecs 100-pass recompression test

Topic: Nine different codecs 100-pass recompression test (Read 52060 times) previous topic - next topic

0 Members and 1 Guest are viewing this topic.

Nine different codecs 100-pass recompression test

2013-03-23 18:12:47

Hi everyone!

I lately discovered this forum and enjoyed reading the listening tests. I decided to run a listening test myself. Have you ever wondered how different codecs are affected by re-encoding / re-compressing? Of course, recompressing audio is a bad idea, but sometimes can't be avoided. To clear things up, I did a test with the following encoders:

WMA Professional 10 (wmapro)
WMA 9.2 (wma)
Musepack (mpc)
Fraunhofer Mp3 surround encoder (mp3s)
LAME (mp3)
Quicktime AAC (qaac)
Nero AAC (nero)
Vorbis OGG (vorbis)
Opus (opus)

Quality settings: Low (~96 kbps) and high (~256 kbps)
Bitrate modes: CBR, ABR and VBR

I encoded the original sample with the respective encoder, decoded it back to WAV and encoded it again, for 100 times. Then I listened to the results to determine which encoder produced the best results.

RESULTS

AAC is the clear winner by far. It is virtually unaffected by the number of passes. All other codecs had degraded sound quality increasing with the number of encoding passes, especially at low bitrates.

At low bitrates, AAC was the only codec providing satisfactory results. All other encoders fall way behind and produce audible compression artifacts such as cracking noises, muffled sound and hissing. At high bitrates, LAME and Musepack can compete with AAC, but all other encoders fall way behind.

It's interesting to see how much encoders profit from an increased bitrate when recompressing many times. For AAC, as the clear winner, it didn't matter. That being said, Musepack placed 9th with low bitrate settings, but at high bitrate, it was almost as good as AAC and placed 4th. This is similar to LAME, which produced loud cracking noises at low bitrates and placed 8th, but sounded almost perfect at high bitrates and placed 3rd.

Other codecs were mainly unaffected by bitrate, such as WMA, the Fraunhofer MP3s encoder, Opus and OGG Vorbis. These codecs were mainly affected by the number of recompression passes.

In general, WMA and the Fraunhofer MP3s codec were the most disappointing. WMA produced loud hissing and cracking noises, while the Fraunhofer encoder sounded bland and muffling, discarding brilliance and detail. The only reason Fraunhofer placed decent is that it doesn't produce loud cracking or hissing noises, which to my ears is even worse than just muffled or dull sound. Of course, that's purely subjective.

Some encoders did not only degrade sound quality, but also had some other quirks. For example, the LAME encoder lowers the volume with every encoding pass. The 100th pass was virtually inaudible. I had to normalize the audio to hear anything at all. Other encoders produced erroneous files and garbage. The Fraunhofer encoder added silence to the beginning and end of each file and repeated parts of the sample at the end. After 100 passes, it created a 12 seconds file (the original file was 7 seconds). Winamp and Foobar2000 even reported a length of 1:02 minutes for the Fraunhofer file, however the playback ended after 12 seconds. The Vorbis encoder did a similar thing, which resulted in a reported length of 2 seconds, while the playback ended at 7 seconds. I can't really say if I did something fundamentally wrong or if it's the encoders fault, but in the end, the Fraunhofer and Vorbis encoders produced corrupted files. For the listening test, I tried to fix all errors like added silence or corrupted files, since I wanted to judge the sound quality only.

You can view the complete test on my homepage. There, I also have attached the test audio samples so you can hear them in your browser. I also visualized the waveform of each sample, it's very interesting to see.

http://bernholdtech.blogspot.de/2013/03/Ni...ssion-test.html

For example, this is the original file:

This is after 100 re-encodings with Nero AAC:

And this is after 100 re-encodings with OGG Vorbis:

This is after 100 re-encodings with WMA (Windows Media Audio):

Nine different codecs 100-pass recompression test

Reply #1 – 2013-03-23 18:35:19

Interesting.
Though i am surprised that Vorbis did so bad.

Have you tried with aoTuVb6.03?
Cause it should be more resilient then LibVorbis.

Nine different codecs 100-pass recompression test

Reply #2 – 2013-03-23 18:37:50

I read somewhere (Wikipedia, I think) that the improvements of aoTuV are periodically merged back to the original Vorbis codec. So I assumed it won't make much of a difference. I'm not very familiar with Vorbis, though. If you say so, it may be worth testing that, too.

Nine different codecs 100-pass recompression test

Reply #3 – 2013-03-23 18:42:00

You probably need to manually adjust the encoders gain in lame so that it does not change the volume when encoding.

If the audio shifted in time for vorbis something probably went wrong. Vorbis supports gap less playback by default so no change in length should occur.

Nine different codecs 100-pass recompression test

Reply #4 – 2013-03-23 18:46:07

Wow, you uploaded all files playable within the browser. Thank you for sharing this with us.

Nine different codecs 100-pass recompression test

Reply #5 – 2013-03-23 19:17:17

I was always wondering about this, but neverh had enough spare time to do it. Thank you.

Nine different codecs 100-pass recompression test

Reply #6 – 2013-03-23 19:20:08

Quote from: saratoga on 2013-03-23 18:42:00

You probably need to manually adjust the encoders gain in lame so that it does not change the volume when encoding.

If the audio shifted in time for vorbis something probably went wrong. Vorbis supports gap less playback by default so no change in length should occur.

Thank you, I will try that. Do you think this affected the sound quality of LAME? Regarding Vorbis, the length hasn't actually changed, it's just somehow reported wrong in the audio players I used. It shows as 0:02 in the playlist, but when I actually play it, it's perfectly normal (7 seconds). When I decode it back to WAV, the length is also correct. So I didn't bother much, it shouldn't make a difference regarding sound quality anyway.

Nine different codecs 100-pass recompression test

Reply #7 – 2013-03-23 19:21:41

Quote from: bernhold on 2013-03-23 18:37:50

I read somewhere (Wikipedia, I think) that the improvements of aoTuV are periodically merged back to the original Vorbis codec. So I assumed it won't make much of a difference. I'm not very familiar with Vorbis, though. If you say so, it may be worth testing that, too.

Yes that is true. Vanilla vorbis has merged only beta2 code, but the recent Aotuv is beta 6. So it could be worth to test beta 6.
Thanks for interesting test.

Nine different codecs 100-pass recompression test

Reply #8 – 2013-03-23 20:42:27

Hm, reencoding 100 times seems like a bit of an overkill, and generally not a very realistic test? (in that sense, it doesn't clear things up much)

I would assume more practical results would come from one-two passes, starting 1. from lossless source and 2. a lossy high bitrate encode (comparable to typical stuff bought from iTunes or Amazon), both with low bitrate as a target (say, for portable use), and comparing the two resulting files.

Nine different codecs 100-pass recompression test

Reply #9 – 2013-03-23 20:53:50

Yes, 100 times is not practical

But there's a reason for it. I encoded 100 times because it's much easier to see how a codec performs. You may not be able to hear any difference after 1 or 2 re-encodes. And I assume that a codec which sounds better than another codec after 100 re-encodes will also sound better after 1 or 2 re-encodes. However, for the listening test, I only used the results after 100 re-encodes.

I also added results for 10, 25 and 50 passes in my test, they are available in the "detailed results" section on the web page (scroll down). These results are less extreme as you may expect.

Nine different codecs 100-pass recompression test

Reply #10 – 2013-03-23 21:04:36

I run your files through Python (from yesterday's waveform thread), as it also colors waveform on spectral intensity.

Here is result: http://db.tt/q9gXzysF

Nine different codecs 100-pass recompression test

Reply #11 – 2013-03-23 21:10:53

Quote from: bernhold on 2013-03-23 20:53:50

I encoded 100 times because it's much easier to see how a codec performs. You may not be able to hear any difference after 1 or 2 re-encodes. And I assume that a codec which sounds better than another codec after 100 re-encodes will also sound better after 1 or 2 re-encodes.

Careful there, you assumption looks like it might go against certain rules here...

Not being able to hear any differences after a few reencodes is also a perfectly valid (and much more useful, vs. a bit artificial overkill scenario) result.

All that said, thank you for the effort (particularly for the "detailed results" section, 10 passes ;P ) - I also always wanted to do a similar test, but never got to it.

Nine different codecs 100-pass recompression test

Reply #12 – 2013-03-23 21:56:52

It seems as though this test would be more suited to comparing different versions of the same codec, itf anything at all.

As it stands, you are using codecs with different encoding techniques on one particular sample for a few of the codecs tested, but not others.

In that sense, this test doesn't tell us much of anything as it stands.

Nine different codecs 100-pass recompression test

Reply #13 – 2013-03-24 01:19:54

After 100 passes, rounding error probably starts to be a problem. I wonder what effect the intermediate formats used by the decoder/encoder have on quality. Software that can output/read float probably has an advantage here over 16 bit (or even 24 bit) PCM.

Nine different codecs 100-pass recompression test

Reply #14 – 2013-03-24 02:16:43

This test is very comprehensive! Good job.

Quote from: romor on 2013-03-23 21:04:36

Here is result: http://db.tt/q9gXzysF

Excellent, but your results are sorted per passcount per codec, and I think it's more interesting to see the progress of the decay for each codec and setting. Perhaps the data reaches some kind of plateau after a certain number of transcode cycles, or instead accelerates toward Shannon's oblivion.

I've aligned the Vorbis-low images in Photoshop to 0, 10, 25, 50 and 100 measuring points, but there was little I could see because of the gap between 50 and 100.

Nine different codecs 100-pass recompression test

Reply #15 – 2013-03-24 03:16:40

Does this belong under "Listening Tests"?

I don't think so...

Nine different codecs 100-pass recompression test

Reply #16 – 2013-03-24 05:16:54

No this does not belong in listening tests and will be moved shortly.

There have already been complaints that this discussion is not in keeping with TOS8 and I have a hard time disagreeing.

While I understand that this took time and effort, I do not agree that the results are particularly meaningful, let alone useful. It's a lot easier to push a few buttons and let the computer chug away than it is to actually conduct double blind tests.

This is a far cry from the level of analysis that members of this forum are capable of presenting.

Nine different codecs 100-pass recompression test

Reply #17 – 2013-03-24 06:48:12

While perhaps it doesnt belong in listening tests, I dont believe it should be binned, I found the results quite interesting, particularly how some codecs manage to keep some semblance of the source file while others destroy it almost beyond recognition. Of course nobody is going to encode a file 100 times but its an interesting test nonetheless.

Nine different codecs 100-pass recompression test

Reply #18 – 2013-03-24 10:16:01

Read my post again. You will not see any mention of binning the discussion.

Nine different codecs 100-pass recompression test

Reply #19 – 2013-03-24 11:07:53

@greynol: Could you clarify why this infringes the TOS #8 and from what point of view is this useless?

Concretely, it is a test of codec regression and I don't even need to listen to the samples from Ogg Vorbis and WMA to know that they will sound notably different, just by looking at those waveforms above. (Edit: Ok, probably the final table classification would need an abc-hr result to back it up)

You will probably also remember some tests made some years ago, that studied transcoding from one codec to another , and that in that case, Musepack seemed to be the best source to transcode to mp3.
That test required a listening test because it was a single pass, not 100, and because it was testing inter-codec transcoding, instead of transcoding to self.

Concretely, this test can answer several things:

If an user is going to transcode some files, and the origin and destination formats are known, then there's an empirical way to know if it will degrade fast (so the decision of transcoding be less desiderable).

If there is a codec that, giving the interest of transcoding, will manage to add the less amount of artifacts and/or be more stable in doing so.

@bernhold: Like saratoga said, it would be interesting to change the gain that LAME applies by default (which i thought it no longer did), (--scale 1). Said that, which version of LAME is that? (and maybe of the other codecs and which tool was used).

Nine different codecs 100-pass recompression test

Reply #20 – 2013-03-24 14:53:23

Don't take me wrong.

It's clear than there are varity of methods for testing audio codecs and everybody is free to adopt and defend any of them.

But as one could notice there is no comments from people who usually involved in listening tests from here.
Or everything is perfect and there is nothing to say, or everything is plain wrong and there is nothing to say.
Take a guess.

Nine different codecs 100-pass recompression test

Reply #21 – 2013-03-24 14:58:44

Sound quality of lossy codecs is determined though DBT, full stop.

Nine different codecs 100-pass recompression test

Reply #22 – 2013-03-24 15:57:49

Quote from: [JAZ] on 2013-03-24 11:07:53

@bernhold: Like saratoga said, it would be interesting to change the gain that LAME applies by default (which i thought it no longer did), (--scale 1). Said that, which version of LAME is that? (and maybe of the other codecs and which tool was used).

This would be sensible, and perhaps lossless version of your 8s sample.

Nine different codecs 100-pass recompression test

Reply #23 – 2013-03-24 17:34:59

What would be interesting IMO and, I think, much more useful to actual users, would be a test with repeated iterations of re-encoding material from various uncompressed and lossy settings to various other lossy settings, with DBTs after each, aiming to determine when degradation becomes audible and perhaps its extent compared to other workflows. Then again, I have a hunch that effects would become audible well before 100 passes, which I agree is a number so improbable in reality that it’s not useful in any concrete sense and is purely an abstract ‘what if’. The workload in the test I suggested would come much less from the number of passes and much more from the need to choose various source and destination encoders/settings and determine how to assess their effects and the resulting relative quality. Anyway… pure speculation.

Nine different codecs 100-pass recompression test

Reply #24 – 2013-03-24 17:48:16

Maybe also, something similar to transcoding test linked by [JAZ], but perhaps cross-referenced table of selected 30s sample and selected bitrate - pass original signal to every encoder and yet pass again to every other. Table would look interesting to me and I might as well do that out of curiosity, but publicly this test would definitely need ABX report which is not needed here (in this thread test).

Notice