Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: Short re-encoding blind listening test (Read 74094 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

Short re-encoding blind listening test

Re-encoding from lossy source (sometimes called transcoding) is a technique which is not optimal (the output quality is necessary lower compared to encodings done directly from the original source), but often used for greater convenience. Are some lossy formats better than other ones for re-encoding purpose? Which one could be considered as best source?

We have few empirical elements of answers. Some tests were published in the past. The purpose of mine is to add some additional elements (but only few ones).



1/ Samples

The following test is very limited. I’ve only used four samples (you know, time and envy…). It will be impossible to make any strong conclusions with such limited samples, but maybe some interesting leads would appear. The four samples are those selected by ff123 for its 128 kbps listening test.
http://ff123.net/samples.html

2/ Bitrate

Always the most disputed point… I must explain my choice.
First, I had to select the bitrate of both input and output. As output, the choice was easy: MP3 as format, ABR 128 as setting. It’s probably one of the most universal settings. But for input, the choice of bitrate was harder.
On one hand, we have perceptual encoders (mp3, mpc, vorbis, aac), which could reach transparency at 170…190 kbps for most people. On the other hand, there are hybrid encoders, which need much higher bitrate (300 kbps) to be fully transparent, but which are reputed to be better source for re-encoding process. I had therefore two reasonable choices:
— to set all formats to 300 kbps. It might be interesting, but there are few people using --quality 10 for mpc, -q 9.5 for ogg vorbis, CBR 320 for MP3 or AAC. Therefore, I have discarded this solution.
— to make a compromise, and use 256 kbps as average bitrate. This bitrate is much more common than 300 kbps. It corresponds to --preset extreme (mp3), to -q8 (vorbis) and is near --insane (mpc). These settings are of course not really popular, but are not rare either. On the other side, modern hybrid encoders have progressed recently (DualStream could encode decently at 230 kbps, and WavPack 4 lossy allows 196 kbps). 256 kbps is probably still not optimal for hybrid format, but it should be more than acceptable, and maybe more for re-encoding purpose.

3/ Input challengers

I’ve decided for the most common formats: AAC, MP3, MPC, Vorbis, & WavPack lossy. I’ve considered average bitrate of WV4 encodings as standard (261 kbps), and tried to obtain the same with other format.

• MPC: mppenc 1.15u and --quality 7.5 (--insane is ~230 kbps and --braindead ~270 kbps)

• Vorbis: I preferred aoTuV beta3 to official 1.1 encoder. -q 8.3 match 261 kbps.

• AAC: choice was more problematic. I’ve tried first with Nero AAC VBR (and ‘fast’ encoder), but no preset correspond to the targeted bitrate. Therefore, I’ve opted for CBR 256. Instead of Nero AAC, I’ve used iTunes AAC (I have few elements to justify this choice, but: 1/this encoder was superior at CBR 128 to Nero CBR on two last collective tests organized by Roberto; 2/ the newer Nero AAC (called by ‘fast’ mode) was still considered as unfinished by JohnV in a recent past; 3/ iTunes AAC has less pre-echo issues at high bitrate; 4/ not related to quality , but iTunes AAC is twice faster and is running on a second platform).

• MP3: I’ve privileged lame 3.97 alpha 8 to any other ‘stable’ version. I did more than 800 blind comparisons with lame 3.97 (from alpha 5 to alpha 8), and in my “double-blind constructed” opinion this encoder have nothing to envy to older version. Unfortunately, the highest VBR preset (-V0 or --preset extreme) can’t match the targeted bitrate (242 kbps instead of 261). I’ve hesitated for a long time between ABR 256 and -V 0, but after deliberation, I’ve opted for VBR. I’m reading HA.org for more than three years, and people using --preset extreme are countless compared to those using ABR/CBR at this bitrate. As consequence, my setting was: -V0 --vbr-new (it performed slightly better on my recent tests, at least with lower VBR settings: -V4, V3 & V2).

• WavPack 4: I’ve hesitated one moment between -hb256 and -hb256x, but the encoding speed of -x optimisation have decided for me (a 3Ghz computer is probably needed to encode at x2… mine reached ½ real time!).



4/ ADDITIONAL NOTES


• I performed two separate listening tests for two samples (rawhide.wav and dogies.wav). Each test corresponds to a different part of the sample.
• As reference, I haven’t used an uncompressed file, but simply an optimal mp3 encoding (i.e. encoded with a proper source).
• iTunes encoding offsets were removed by schnofler’s ABC/HR tool; gain was systematically applied to avoid existing volume difference between files (it wasn’t really necessary).

[span style='font-size:14pt;line-height:100%']
5/ RESULTS

[/span]


ABX log files are here.





iTunes AAC

It suffers three times: with dogies_1, rawhide_1 and wayitis. Each time I noticed an additional artifact:

• “but some drums have an ugly coloration” (dogies.wav (piano & drums))
• “audible distortions on voice” (rawhide.wav)
• “piano notes are excessively distorted (coloring)” (wayitis.wav)

With other files, quality was identical to reference for my hearing (and even slightly better [i.e. less aggressive] I’d say with cymbals on rawhide_2).



LAME MP3 --preset extreme

Clearly the worse challenger. Extract from ABX log files:

• “form of ringing: sound is very fluctuating. Flabby.” (dogies.wav (piano & drums))
• “cymbals are very unstable” (dogies.wav (cymbals))
• “drums are distorted, unstable” (fossiles.wav)
• “cymbals are much more distorted.” (rawhide.wav (cymbals))
• “horrible fluctuating/unstable noise” (wayitis.wav)

Each time, there was the same kind of distortion. It’s a form of ringing, very typical of lossy encoding, and which ruins the quality of background noise or ambiance. I was often amazed by the huge difference existing between the encoded file and the re-encoded one. I didn’t imagine that re-encoding could have such impact on quality…
I also recall that bitrate was also the lowest. But I don’t think that this slightly lower bitrate explains such bad performances. Wayitis.wav sample has for example higher bitrate with MP3 (source) than with vorbis (source), but:
MP3 (260 kbps) -> MP3 (128) : notation =1.5
OGG (252 kbps) -> MP3 (128) : notation = 5.0
Despite of higher bitrate, quality was really worse…



MUSEPACK --quality 7.5

One of the best source according to this small test. Transparent three times, and best once. Nevertheless, I’ve noticed problems on cymbals, slightly more distorted with mpc as source.

• “additional distortions on cymbals” (dogies.wav (cymbals))
• “cymbals are distorted” (rawhide.wav (cymbals))



Ogg VORBIS aoTuV -q 8.3


Best source with musepack: transparent three times, and best once. I’ve mainly noticed one specific problem: ‘drooling sound’ (in other words imprecise edges). It’s something similar to smearing, but with something else I can’t really describe.

• “excessively drooling: smearing is audible, and sound isn't very stable” (dogies.wav (piano & drums))
• “but with additional degradation (smearing, 'drooling' sound)” (rawhide.wav (cymbals))



WavPack 4 lossy -hb256


As expected, there was audible noise, and it handicaps the format. But there are two important things I’d like to precise:
- first, noise wasn’t always audible (not ABXable at least). I honestly expected at this sub-optimal bitrate (for a hybrid format) more audible problems. It’s a very good point.
- second thing: audible problems don’t necessary consist in additional noise; there are artifacts, which don’t differ from artifacts triggered by perceptual encoders as source for re-encoding. I have noticed it with dogies.wav, and less clearly with cymbals on rawhide.wav.

• “Very noisy. I wouldn't say that this noise isn't disturbing. Anyway, there's an annoying artifact in the middle of the tested part” (dogies.wav (piano & drums))
• “noise is sometimes noticeable; drums are slightly aggressive (noise)” (fossiles.wav)
• “distorted cymbals. A bit aggressive” (rawhide.wav (cymbals))
• “noise (I can't locate it... it's a very strange one)” (wayitis.wav)




[span style='font-size:17pt;line-height:100%']6/ GENERAL CONCLUSIONS[/span]

Hard to make such conclusions with only four samples. But we could note some interesting points which are clearly different from general claims and/or suppositions:

• when re-encoding from one lossy to another lossy format, keeping the same format doesn’t necessary help to maintain quality. LAME high bitrate encodings is (here) the worse source for LAME output… All other lossy encodings are much better inputs.

• The use of hybrid formats doesn’t necessary lead to keep re-encoding free of additional artifacts. Hybrid encoders are probably artifact free (at least if we didn’t consider noise as artifact, which is contestable), but this additional noise could trigger extra artifacts with re-encoding practice!.

• subband encoders (as mpc) aren’t necessary a better source for lossy re-encoding.

• the quality degradation isn’t constant: some parts don’t suffers from re-encoding process, and some others (doggies_1; rawhide_2) are much more sensitive.


Therefore, I would be very careful before claiming than such and such techniques are better for re-encoding.




7/ APPENDIX: statistical analysis

• ANOVA analysis:

Code: [Select]
OGG is better than MP3
MPC is better than MP3
WV4 is better than MP3
AAC is better than MP3


• FRIEDMAN analysis

Code: [Select]
OGG is better than MP3
MPC is better than MP3

Short re-encoding blind listening test

Reply #1
Very interesting.

I am wondering one thing about mp3:
You probably decoded the high bitrate mp3 file with something that removed the encoder delay. Thus on re-encoding frames have the same boundaries as in the first generation.

I am wondering if using different frame boundaries would have changed the result or no.

edit: anyway, I am expecting mp3 to finish last, but which of the two methods would have the best quality, and would there be any significant difference between both?

 

Short re-encoding blind listening test

Reply #2
Quote
I am wondering one thing about mp3:
You probably decoded the high bitrate mp3 file with something that removed the encoder delay. Thus on re-encoding frames have the same boundaries as in the first generation.

Exact, I've used foobar2000 for decoding-reencoding.

Short re-encoding blind listening test

Reply #3
Interesting...
Thanks, guruboolez !!

Short re-encoding blind listening test

Reply #4
Goes to show how bad transcoding can be, even if the target bit-rate is much higher.

It'd be interesting to see an abx of these samples not transcoded, but that would probably be a 5.0 on all I suppose.

Short re-encoding blind listening test

Reply #5
Another interesting result - thank you!

If possible, at what bitrate do you think WV4 would deliver 5.0 on all transcoded samples in this particular test?

Cheers,
David.

Short re-encoding blind listening test

Reply #6
Quote
If possible, at what bitrate do you think WV4 would deliver 5.0 on all transcoded samples in this particular test?

I don't know. I've planned to make a second test (320 -> 128), and this one could answer your question. But I have to find free time first (and maybe check first issues with mp3 format suggested by Gabriel).

Could someone try to encode dogies.wav to WV4 -hb256 and then to lame --abr 128, and check the ~4.5 - 6.0 range? I'd like to know if other people are also annoyed by the distortion I've heard on this specific part.

Short re-encoding blind listening test

Reply #7
Guru thanks for the test.  I am not familiar with this dogies sample. I tried to abx mp3 -hb256 vs mp3 original ( lame 3.97a9 -preset 128) - around 3.4-4.7 secs I picked up maybe a puff of hiss?  abx 7/8, cannot abx -b256x or -hb320. I don't know if that was what you heard too.

I have abxed only very few transcodes with wavpack using bitrates 320-450k -hb (also had good results with 320). Each time these minor differences  disappeared when using -x or -hx switch.

Short re-encoding blind listening test

Reply #8
Thanks for reply  . I didn't noticed noise, but a weird distortion (the reference file wasn't perfect either).
Next time, I'll try to upload somewhere a short part in order to make things easier to other people.

Short re-encoding blind listening test

Reply #9
As a sidenote: if the limted amount of samples are indeed any indication, then the mp3-"repacker" posted a few days ago may be more valuable than first thought - at least when the target-bitrate isn't 128kbit but instead something around 200kbit.

- Lyx
I am arrogant and I can afford it because I deliver.

Short re-encoding blind listening test

Reply #10
Very interesting results, thanks, guruboolez.  But I'd say, it's much more interesting to have some another things compared:
PCM -> MPC q7 (q6, q8) -> MP3, OGG, AAC of lower bitrates (e.g. for portable use — usually about 130-200 kbps);
PCM -> WV 320 (or 384, or less) kbps -> MP3, OGG, AAC of lower bitrates (same).
As you probably can see, I'm very interested in “archiveness” of quality of lossy codecs (and Musepack along with Wavpack lossy, AFAIK, are the best at bitrates past ~200). Like, is there any real need in lossless encodings meant for later transcoding, if fully unchanged audio stream is not the point (like, there are no really perceptable artifacts)? Or, if there are some, how do they affect quality and still, what lossy source is better for transcoding?
What do you think about that?
Infrasonic Quartet + Sennheiser HD650 + Microlab Solo 2 mk3. 

Short re-encoding blind listening test

Reply #11
Quote
But I'd say, it's much more interesting to have some another things compared:
PCM -> MPC q7 (q6, q8) -> MP3, OGG, AAC of lower bitrates (e.g. for portable use — usually about 130-200 kbps);
PCM -> WV 320 (or 384, or less) kbps -> MP3, OGG, AAC of lower bitrates (same).


note: it is not because someone made a usefull test that he should be obliged to conduct any other test combination someone is interested in.
Feel free to try those and to report results, it will also be interesting.


Short re-encoding blind listening test

Reply #13
Talking about subband codecs,

couldn't someone take MPEG 1 Layer one coding, give it some real vbr, and it would be higher quality than MPC?


As seen here:
http://en.wikipedia.org/wiki/MP3

    * Layer 1: excellent at 384 kbit/s
    * Layer 2: excellent at 256...384 kbit/s, very good at 224...256 kbit/s, good at 192...224 kbit/s
    * Layer 3: excellent at 224...320 kbit/s, very good at 192...224 kbit/s, good at 128...192 kbit/s

Short re-encoding blind listening test

Reply #14
Guru, your tests are always both thorough and interesting. Thanks!

I am actually amazed at how well WavPack did in this test! After all, all of the other codecs here are pretty much considered transparent at 256 kbps, while WavPack lossy is certainly not transparent at that bitrate. I suspect that one possibility is that for the other codecs tested, the largest part of the degradation in quality was due directly to the transcoding. With WavPack, perhaps most of the degradation was already present in the 256 kbps version with little further degradation occuring during the transcode. Just an idea.

Keep up the great work! 

Short re-encoding blind listening test

Reply #15
I did some additional tests.


2Bdecided> If possible, at what bitrate do you think WV4 would deliver 5.0 on all transcoded samples in this particular test?

I've tried to answer to this question, by testing one of the "worse" sample of my test with higher bitrate (dogies). I compared four transcoding directly, with source corresponding to:
-hb256 [the old one]
[span style='font-size:14pt;line-height:100%']-hb300
-hb350
-hb400
[/span]

SUMMARY OF RESULTS: I was able to ABX all files  But to be honest, I was very surprised when the software revealed these positive ABX scores. Difference was very subtle with -hb350 and -hb400 (small noise, very hard to locate).
256->128 kbps obtained 2/5 (I've noticed again the artifact)
300->128 kbps obtained 3/5 (the artifact was still here)
350->128 = 4/5 (but a better score would be 4.5...4.8)
400->128 = 4.5/5 (4.9 is more realisitic).

Anyway, theses results don't answer to the original question "at what bitrate do you think WV4 would deliver 5.0 on all transcoded samples". But the difference was so subtle with wavpack 4 350 or 400 kbps transcoding that I would answer by "400 kbps".

ABX log is available here


******************************************************************
******************************************************************
******************************************************************



A bit later, I performed another comparison, by testing [span style='font-size:14pt;line-height:100%']WV4@350[/span] to [span style='font-size:14pt;line-height:100%']MPC@~330[/span] (--quality 10). Musepack performed very well at -q7.5 with this sample, and therefore could be considered as a strong competitor (and also a very interesting one for "transcoding" purpose, because of its ultra-fast decoding speed).

SUMMARY OF RESULTS:
MPC->MP3:
NOTE = 3/5
COMMENT = "additionnal distortions are really annoying"
ABX score  = 10/12 pval = 0.019

WV4->MP3:
NOTE = 4.5/5
COMMENT = "a bit noisy"
ABX score = 6/12 pval = 0.612

WV4 vs MPC ABX score: 18/20 pval < 0.001

In other words, MPC@330 is clearly a worse source for transcoding than WV4 (which, I recall it, was poorer at lower bitrate than mpc). MPC->MP3 re-encoding introduces additional distortions, easy to hear, and a bit annoying. You could note that I failed this time to ABX WV4->MP3 (I succeed on previous test).
This time, the results confirm the theory (?): hybrid encoders are better than perceptual encoder for transcoding purpose.

ABX log is available here




P.S. I've also some results for MP3 and for Gabriel, but I must write the story first

Short re-encoding blind listening test

Reply #16
Gabriel> "I am wondering if using different frame boundaries would have changed the result or no."

I've started the test by comparing a re-encoding made directly by [span style='font-size:14pt;line-height:100%']foobar2000[/span] (which removes the encoder delay) and a re-encoding made in two steps:
- high bitrate MP3 decoded into PCM by [span style='font-size:14pt;line-height:100%']MAD decoder[/span]
- PCM file encoded with lame 3.97a8 at 128 kbps (abr)

I've noticed first some minor difference on the playback range used since yet, but the difference was much more obvious when cymbals are starting (06.00 - 08.00).

RESULTS:
re-encoding using MAD was better (ABXed against the fb2k re-encoding = 16/16 pval < 0.001 !)
The vibrating effect was clearly less pronounced.

ABX log is available here


Now, what's the exact cause of this positive difference? Are the modified boundaries responsible of the quality's progress? Or has MAD library something special (like dithering) which could increase quality for decoding?
Second question: is difference also audible on another part of this sample (beginning for example)?

I've tried to answer in a first time to this second question, and launched a new ABX test with the same contenders but on another listening range:
RESULTS:
I gave to MAD-source a slightly better note on ABC module, but failed on ABX: 6 out of 16, pval = 0.894.
But I was still convinced than one file is better on the selected range, and started a second test: I gave again a better note to MAD on ABC module and failed another time on ABX: 11 out of 16, pval = 0.105.
I'm sure that a difference exists, but it's apparently a very subtle one.


=> using a different frame boundaries for MP3 transcoding doesn't necessary help to maintain quality. But in some occasions the impact is clearly positive.


ABX logs are available here



******************************************************************
******************************************************************
******************************************************************


Last kind of test: comparing three different MP3 decoders:
- [span style='font-size:14pt;line-height:100%']mpglib [/span](foobar2000), which removes the encoder delay and keeps the same frame boudaries
- [span style='font-size:14pt;line-height:100%']MAD[/span], which doesn't remove the encoder delay and therefore change the frame boudaries when a re-encoding process is done. The decoder add dither to decoding.
- [span style='font-size:14pt;line-height:100%']Fraunhofer [/span](Winamp5), which doesn't remove the delay and doesn't add dither.

I've used the same problematic range than before (cymbals).


RESULTS:

mpglib was clearly worse than both MAD & Fhg, which are identical. The first one is severly distorted, whereas the two others are much better (but still easy to ABX against a clean MP3@128 encoding).

ABX log is available here

This result is important, because it indicates that gain in quality noticed first with MAD isn't linked to dither or something else inherent to MAD library, but is a consequence of different frames boundaries between both MP3 encodings (the first HQ one and the second 'transcoding').




At last, I performed another test, using this time another sample (wayitis), which suffers a lot from MP3 to MP3 re-encoding. I compared mpglib to fhg.

RESULTS:

Fhg > mpglib (26 out of 32, pval < 0.001 - with a last 14/14 when I decided to focuse my attention on a small artifact I noticed on right channel with one file [mpglib].
Nevertheless, the difference between both transcodings was really small: both suffers for severe ringing issues. In other words, reencoding using a different frame boudaries improves the quality, but the audible difference is not (apparently) a very big one. But probably important enough to justify it... more tests are welcome

P.S. ABX log is available here

Short re-encoding blind listening test

Reply #17
I fear that my narration is not very clear to other people. Just to sum up:

1/ WavPack 400 kbps was still ABXable on dogies.wav, due to a subtle noise (so subtle that I was surprised to pass the test)

2/ At ~350 kbps, WavPack4 performed much better than MPC at similar bitrate (-q10). MPC transcoded to MP3 had more distortions than MP3 encoded properly from WAV or even MP3 encoded from WV4.

3/ foobar2000 allows to encode MP3 properly: it means without extra delay or offset added by the first generation of encoding. After some tests, it appeared that keeping the delay had a positive impact on quality. To keep the delay, I've used different decoders (MAD and Fraunhofer; results was identical between MAD and Fhg). The impact on quality is more or less important (depends on the sample).

I also remind that removing the encoder/decoder delay (like foobar2000 does) is necessary to obtain gapless transcoded file (of course, it only matters if the device is able to perfom gapless playback).

Short re-encoding blind listening test

Reply #18
Great!  thanks guru.

Would it be possible for you to confirm that the wavpack -x switch will have a positive effect on the subtle difference as this has been my experience?

e.g - b400x and hb400x

In other words: is -x worth the time when for demanding non-standard cases?

Short re-encoding blind listening test

Reply #19
Quote
Great!  thanks guru.

Would it be possible for you to confirm that the wavpack -x switch will have a positive effect on the subtle difference as this has been my experience?

e.g - b400x and hb400x
[a href="index.php?act=findpost&pid=283578"][{POST_SNAPBACK}][/a]


Well... I'm using -x optimisations for my lossless encodings despite of my slow computer, but -hbx is painfully slow. That's why I didn't include it first in my tests.
But with time, I'll maybe do it (but don't count on it too much).

Short re-encoding blind listening test

Reply #20
I am most interested in -bx as its quicker to encode & decode than -hbx.

thanks.

Short re-encoding blind listening test

Reply #21
Quote
note: it is not because someone made a usefull test that he should be obliged to conduct any other test combination someone is interested in.
Feel free to try those and to report results, it will also be interesting.
[a href="index.php?act=findpost&pid=283223"][{POST_SNAPBACK}][/a]

Another note: I wasn's ogliging, I was suggesting.
See, the problem is I have rather crappy sound gear (cheap ~$20 Philips cans with no amp, and SB Live! 5.1 soundcard) to perform any real tests. Though I can fairly easy ABX most of the killer samples (up to --api) and spot artifacts in some other test samples, but when it comes to real music, the clipping occurs here and there, making any tries of spotting an artifact in a munch of distortion almost unbearable! I can't wait to buy a new pair of headphones, which would be Grado SR80/125 or more expensive Sennheiser HD580/600. I believe my hearing is slightly better than average (I can ABX a 22KHz sine from null wave in total silence! ), but it is too hard to perform a quality test (especially at high bitrates) with this garbage.
Until then, I'd really like someone with golden ears to perform a short yet useful test for all of us, and I really appreciate guruboolez's skill in testing and hearing!
Looking forward to performing many useful ABX tests comparing different encoders (this includes latest lame alphas as well!).  Still, I think I'll try to do some these days with present gear.

@ guruboolez:
One question: what is better for transcoding at bitrates around ~270 kbps? I assume this is MPC, right?

@ shadowking:
Thanks for the link.
Infrasonic Quartet + Sennheiser HD650 + Microlab Solo 2 mk3. 

Short re-encoding blind listening test

Reply #22
Quote
@ guruboolez:
One question: what is better for transcoding at bitrates around ~270 kbps? I assume this is MPC, right?

I don't know. I've only tested four files, and with ~256 kbps encodings as source.

Short re-encoding blind listening test

Reply #23
Sorry for the late reply, I've been away from the forum for a while.

Very useful test results guru. From my side it's nice to know that I wasn't completely imaging things a couple of years ago.

I was also pleasantly suprised how well wv256 performed in your test, as I normally think of it as somewhat less than transparent.

Den

Short re-encoding blind listening test

Reply #24
I've only just seen this thread - thank you again guru!

I think the WV4@350 vs MPC@~330 test is especially critical. People (myself included) often use lossy codecs with higher settings in order to "future proof" the files, or for subsequent transcoding, or just to make themselves feel better.

Also, I've heard people (claiming to be very knowledgeable, but who obviously aren't!) saying how if you must transcode, stick to the same type of codec - this is clearly nonsense too!

I'll be pointing other people to this thread, because the results go totally against what is "commonly accepted" and "unquestioned" by so many people!

In fact, could this thread find its way into the FAQ please? Maybe under "what can you do if you know you have to transcode..." or similar?

Thanks again guru!

Cheers,
David.