Encode/decode cycles in lossy codecs

Topic: Encode/decode cycles in lossy codecs (Read 11010 times) previous topic - next topic

0 Members and 1 Guest are viewing this topic.

Encode/decode cycles in lossy codecs

2003-02-09 21:12:41

Many of the discussions in this forum imply that if an audio sample is repeatedly encoded and decoded with any lossy codec, the quality of the sample always degrades progressively -- the more encode/decode cycles, the worse the quality loss.

I wonder if this is necessarily true? For example -- suppose that we define a "lossy codec" that simply cuts off all frequencies above, say, 19 kHz. When we first encode and then decode, the WAV that is produced (call it WAV1) differs from the original (WAV0) by having all the high frequencies removed. But then if we encode/decode again, we shouldn't do any further damage; WAV2, WAV3, etc. should be identical to WAV1, apart from errors introduced by, eg., rounding, impossibility of a perfectly sharp filter, etc.

Then the question is: why do the actual lossy codecs lead to a progressive decrease in quality with repeated encoding and decoding? Is it just a matter of a gradual accumulation of floating-point errors, or is something deeper at work?

Encode/decode cycles in lossy codecs

Reply #1 – 2003-02-09 21:22:31

Quote

I wonder if this is necessarily true?

It is true. Try decoding and re-encoding an MP3 twenty times and you can hear it. I used --aps and I could definitely hear heavy artifacts by the third encode through the crummy computer speakers and the 20" fan running. Anyone could hear bad artifacts by the 5th re-encoding.

Encode/decode cycles in lossy codecs

Reply #2 – 2003-02-09 21:40:04

MP3 is far more complex than that, it doesn't just "take out" frequencies in a certain band. That's just one part of it, the lowpass filter. What it does, in very basic terms, is try to estimate which parts of the music the human ear can and can't hear, or to what degree they can be heard. I'm not just talking about high frequencies being out of your hearing range, ther's "masking" too - i.e. when a drum is hit, it will obscure some or all of the acoustic guitar which is playing with it (as far as the human ear is concerned). These obscured frequencies are discarded by the MP3 encoder (or altered, depending on exactly what it thinks is best... but I'll keep it basic here).

What it comes down to is that MP3 is a "perceptual" lossy encoder - again, on a basic level, what it does is cut the large WAV down vastly in size by making a (very) educated guess at what you can and can't hear. When it's presented with a WAV file which has already been encoded to MP3 and subsequently decoded again, it has exactly the same job to do - it doesn't recognise it as a transcode, as far as it's concerned, it is a WAV file like any other. So, again, it tries to make educated guesses at what you can and can't hear... and inevitably, because the WAV file is very different to the original, more accurate data will be lost in favour of nearly acurate data to cut the file down to size; as I said, that's the job of a lossy encoder.

Encode/decode cycles in lossy codecs

Reply #3 – 2003-02-09 21:41:04

Quote

Is it just a matter of a gradual accumulation of floating-point errors, or is something deeper at work?

Something like that, but these errors aren't introduced because of floating point precision etc. They're intentionally put in quantisation step.

[span style='font-size:8pt;line-height:100%'](Very trivial explanation of quantisation in lossy codecs: it is rounding the sample value so it requires less bits then original, but still makes no difference to human ear)[/span]

And of course this quality degradation is not true for any lossy codec one can imagine, but it is true for existing ones. It's probably hard to obtain non-trivial compression levels (plus transparency) without entering into the area where qulity degradation due to reencoding becomes significiant.

Encode/decode cycles in lossy codecs

Reply #4 – 2003-02-09 23:45:13

1. Quantisation noise accumulates.
Well, this factor is already mentioned in this thread.
Here is how I would put it (pardon me for quoting myself):

Quote

Quantisation noise is adjusted to be inaudible, masked by original sound. Re-encoding over and over again introduces new quantisation noise which is masked not by the original signal, but by the signal of the encoded file---that already contains noise. So the noise accumulates and may become audible.

(by the way, quantisation of samples is done only in mpc. In mp3, the transform coefficients are quantised)

2. Temporal resolution degrades.
time resolution, explained by Frank Klemm
As one can see, signal is smeared (pre-echo etc). So after re-encoding, smearing only becomes worse.
I assume the smearing is caused by the windowing technique in the (time-to-frequency) transformation step. Somebody with a fresher memory of signal-processing course might correct me.

3... anything else that I forget?...

Encode/decode cycles in lossy codecs

Reply #5 – 2003-02-10 08:47:23

Each encoding introduces small errors.
On the first encoding, you introduce small errors.
On the second encoding, the encoder is trying to accurately reproduce what you gave to it, that is the sample with small errors. So it also tryes to reproduce those errors, but in the process is introducing new errors.

Third step is also trying to reproduce those errors,....

The point is that errors are in this case part of the original signal, so in each round you accumulate new ones and keep the old ones.

Encode/decode cycles in lossy codecs

Reply #6 – 2003-02-13 01:12:26

Thanks so much for your replies -- they've been very helpful.

I had naively imagined that an ideal lossy codec would be similar to what is called a "projection operator" in the mathematics of vector spaces; given an input audio sample, we would project out the part of it (the subspace) that is audible to the human ear, and discard the remainder. Then, if the audible and inaudible pieces didn't interact with one another in any way (i.e., if they were "orthogonal"), we could apply the projection operator (codec) repeatedly without any further degradation in audio quality.

I think this would be an excellent idea if it were possible, but Messer suggests above that it probably isn't feasible if we want both transparency and significant compression.

From your replies, there appear to be at least three obstacles; quantisation noise, masking, and temporal/frequency smearing.

I've googled around on the topic of "quantisation noise", and it seems that it enters in two ways: 1) In the initial digitization of the analog sound signal, this being analogous to round-off error; and 2) When the signal is "dithered" to make the noise less "correlated", and hence less noticeable. The dithering noise that is added by the codec appears to be the larger effect, and to be more responsible for the increase in noise levels on repeated encodings. Is this correct, or have I misinterpreted what I've read?

It seems that the way that codecs exploit "masking" prevents the audible and inaudible parts of the sample from being neatly separated into independent pieces. A given high note overtone on the piano, for example, might be retained by the codec if it is played alone, but discarded if it overlaps with a great wallop on the bass drum. Is this right?

And finally, there is temporal/frequency smearing, caused by the impossibility of reproducing a perfectly sharp transient in time without an infinite number of frequencies, and vice-versa. I would suspect that this might be the least significant of the three effects.

So -- is this little summary more or less correct? And, more importantly, can anyone here see if there might be a way to make a "projection-operator" codec that would be both practical and effective at compression (or at least as effective as the lossless compression techniques)?

Encode/decode cycles in lossy codecs

Reply #7 – 2003-02-13 05:28:51

This is something I've always wondered about myself. It's easy to imagine how transcoding presents errors in image and video files. The image data is stored in squares, so when you decode, the image is in all these segments with boundaries.

Anyway, in image compression the codecs rely on the continuity of shapes and colors. Hard boundaries freak them out (JPEG's Fatboy.wav). This is fine because almost all things created by nature have no hard boundaries (also notice that Fatboy.wav is computer generated). So, if on the recompression the new boundaries don't exactly line up with the old ones you've got big problems because it's trying to encode the problems of the last encode.

Now, with audio this is tougher for me to get my head around. I guess it is about the same thing since it is still broken down into things like the JPEG squares (frames) and if they don't line up exactly each time there could be trouble. However, every time I try to ABX a transcoded file from the wav it's always transperent at the same place at the non-transcoded file. This is after one transcode, and I don't care about doing any more. In other words, it seems to me like audio can handle one transcode just fine.

Encode/decode cycles in lossy codecs

Reply #8 – 2003-02-13 07:49:47

Quote

Now, with audio this is tougher for me to get my head around. I guess it is about the same thing since it is still broken down into things like the JPEG squares (frames) and if they don't line up exactly each time there could be trouble.

No - MP3/Vorbis/AAC/... overlap each block to prevent this problem. I can't explain it any better than was done before in this thread though.

You can compare the artifacts you get in a JPEG when you encode small text with very high contrast at a low quality setting. It causes pixels/spikes around the letter edges. This is the same as preecho/transients in audio codecs.

The spikes are also high contrast, so when you encode again, same problem...

Encode/decode cycles in lossy codecs

Reply #9 – 2003-02-13 15:15:17

Quote

2) When the signal is "dithered" to make the noise less "correlated", and hence less noticeable. The dithering noise that is added by the codec appears to be the larger effect, and to be more responsible for the increase in noise levels on repeated encodings. Is this correct, or have I misinterpreted what I've read?

No, no. It is a different matter. Most commonly dithering is used when mastering CDs (after resampling from 20-24bit to 16bit) - for the reasons you stated - to decorrelate qunatisation noise - in this case, the noise is basically 1-2 bit only, caused by rounding-off errors.
(This type of dithering can also be done by lossy decoder sometimes, but not encoder.)

The quantisation noise introduced by lossy encoder is of orders of magnitude larger than dither noise. It depends on the ATH, masking etc.

Quote

A given high note overtone on the piano, for example, might be retained by the codec if it is played alone, but discarded if it overlaps with a great wallop on the bass drum. Is this right?

Yes.

Quote

And finally, there is temporal/frequency smearing, caused by the impossibility of reproducing a perfectly sharp transient in time without an infinite number of frequencies, and vice-versa. I would suspect that this might be the least significant of the three effects.

What you say ("impossibility....") is true for continuous signal. Descrete transient signal can be perfectly reconstructed from finite number of descrete transform coefficients (= number of samples). Well, I'm sure you knew it.
So, the smearing problem is somewhere in the combination of the used filters and window functions.
The evilness of smearing is in that its width doesn't depend on the bit-rate. It's related to the encoder window width.
E.g. transcoding from 320kbps to 320 kbps will add little quantisation noise, but smearing is always present.
So it is a very signigicant effect too. People who are very sensitive to pre-echo will suffer more (as the smearing can quickly escape outside of the pre-echo threshold)

Encode/decode cycles in lossy codecs

Reply #10 – 2003-02-13 16:03:01

Quote

The evilness of smearing is in that its width doesn't depend on the bit-rate.

Not directly no. But indirectly it does - if we encode with smaller blocks or more small blocks we reduce preecho width at the cost of bitrate.

Encode/decode cycles in lossy codecs

Reply #11 – 2003-02-14 15:07:12

Quote

Quote

The evilness of smearing is in that its width doesn't depend on the bit-rate.

Not directly no. But indirectly it does - if we encode with smaller blocks or more small blocks we reduce preecho width at the cost of bitrate.

Yes, it's true.

On the other hand, in MPC pre-echo width doesn't depend on bit-rate at all

Anyways, what I was thinking is that perhaps the psymodel can find the optimal distribution of short/long blocks over a passage, so that transparency is achieved at certain bitrate. I would expect that increasing bit-rate for this passage will not change the short/long blocks usage very much. It is just a speculation, though...
But should be easy to verify. Perhaps I'll check it sometime...

Notice