Help - Search - Members - Calendar
Full Version: Clipping: possible way to make it harmless
Hydrogenaudio Forums > Hydrogenaudio Forum > General Audio
Gabriel
The lossy encoding process can introduce clipping:
Let 's say the maximum value is 1. If we have a sample that is 0.95, it can be encoded into 1.05 after requantization step.
We know that this introduces clipping once decoded.

But from the encoder point of view, this quantization was harmless. The problem is that we have clipping in the time domain once decoded, while the encoder intended it in the the frequency domain.

Image if the decoder was clipping in the frequency domain. On decoding, the 1.05 value sample would be truncated to 1 (the maximum value). This is still correct, according to the required precision computed by the encoder.
Once transformed into the time domain, this would be of maximum value, but not more, and so there would be no time domain clipping.

conclusion: I think that clipping is harmless in the frequency domain, so on decoding it should be handled at this step.
NumLOCK
Because of MDCT properties, to my knowledge this is correct yes :-)

QUOTE
conclusion: I think that clipping is harmless in the frequency domain, so on decoding it should be handled at this step.

Almost harmless, indeed.

By the way, if the clipping is too severe (ie: > 1.2) one should avoid changing the output energy too much - maybe by changing the phase of strong frequency components, to avoid them adding up at the same places. But that would be prohibitively complex to do with MDCT, I think dry.gif

EDIT: We could also do the frequency-domain in a post-processing DSP pass, if the time-domain samples are not clipped to 16 bits (foobar2000 ?) B) smile.gif B)
2Bdecided
Gabriel,

Having followed the link to here from the other thread, I now think I understand what you mean. I think. I'm not 100% sure though! I certainly didn't understand the first time I read this thread!

I won't review what actually happens in en/decoding, mp3gaining etc etc, because I know you know this better than I do! I'm also assuming that you are saying something that has not been said before. I take is as fact that we should (or at least can) deal with clipping by scaling the waveform during decode before truncating it to 16-bits.

Are you saying simply this: We could scale (or simply clip) the frequency domain components. Is that what you mean?

Thus, rather than clipping in the time domain (which usually adds harmonics), we effectively do a frequency dependent filter operation (by reducing the value of some of the spectral data points).

But "clipping" in the frequency domain is not directly related to "clipping" in the time domain. For one thing, the mp3 file never reaches clipping in the frequency domain, because the scale factors and MDCT coefs have more headroom than is needed. For another, you can only predict a level within the frequency domain that is JUST below clipping in the time domain for a single frequency component. Once you start adding the frequency components together, it's impossible; you don't know if it will clip in the time domain until you transform it into the time domain!


I'm not sure I understood what you were saying - if I've got it completely wrong, please explain - it sounds interesting.

Cheers,
David.
http://www.David.Robinson.org/
Gabriel
The encoding process is: (T is time domain, F frequency domain)

wav(T)->mdct(F)->requantization(F)->mp3 bitstream(F)

Then decoding is:

mp3 bitstream(F)->(F)imdct(T)->scaling(T)

Our annoying clipping is occuring in "scaling(T)" were data is transformed from a range of -1/+1 (roughly) up to -32768/+32767. The problem is that some data were higher than the -1/+1 range, so they will "overflow" the 16bits tragetted range.

But why is this clipping occuring? It is occuring in the "requantization(F)" step. In this step the encoder decided something like:
"this 0.98861 data can be encoded with lower precision, let's encode it as 1.1" (of course numbers are just examples)

This was right for the encoder, because it judged that our ears won't be able to ear any difference.
But on decoding, it clips in "scaling(T)".

The encoder never intended this situation, because it assumed frequency domain.

So my idea is to add a step in decoding:
mp3 bitstream(F)->clipping(F)->(F)imdct(T)->scaling(T)

This way, we should avoid (or at least reduce a lot) clipping in the time domain.

The problematic part is:
QUOTE
For another, you can only predict a level within the frequency domain that is JUST below clipping in the time domain for a single frequency component. Once you start adding the frequency components together, it's impossible; you don't know if it will clip in the time domain until you transform it into the time domain!

I am not sure about this part. I though that this was the case with FFT but not with MDCT.
DickD
QUOTE
The problematic part is:
QUOTE
For another, you can only predict a level within the frequency domain that is JUST below clipping in the time domain for a single frequency component. Once you start adding the frequency components together, it's impossible; you don't know if it will clip in the time domain until you transform it into the time domain!


I am not sure about this part. I though that this was the case with FFT but not with MDCT.


I think the FFT makes it especially hard to predict the clipping levels because FFT of a block includes phase information (the complex part), allowing not just cosines (symmetric about time zero) but sines (antisymmetric about time zero) and all phases in between.

However, when you add a whole lot of incommensurate frequencies (i.e. not harmonics of each other) at various amplitudes, it's quite possible for the trough of one component cosine to line up with the peak of another component cosine, thus reducing the peak amplitude of the superposed signal.

If this happened in the original CD audio, but a particular frequency was removed as psychoacoustically inaudible (masked) or its phase was changed, it may be that at some point in the block, even if the amplitude of all component frequencies was perfectly preserved, the peaks of two components might now reinforce more strongly than in the original.

There's a theorem (I found it on the web, but can't remember the name) that states this effect, which initially seems paradoxical. By removing frequencies, you're removing energy, yet the peak can rise to a higher level?

I wouldn't be surprised if some of the abused mastering tools in use today to make pop CDs ever louder, actually made use of the inaudibility of phase information to reduce peak levels while requiring no clipping by juggling the phase of certain frequency components to minimise the peak levels before scaling to full-scale to maximise the loudness.

Regards,

Dick Darlington
Gabriel
Here is the answer from Frank Klemm about this idea:

QUOTE
It do not avoid any clipping, but degrade significantly quality.

- you can nearly not make any prediction from changes in the MDCT room to
 the time room. Decreasing coeffs change the time representation, more
 to predict is impossible.
- rounding of coeffs is done by very sophisticated rules to preserve
 sound feeling. Any change do very likely decrease quality, because
 some properties are not fulfilled anymore
- clipping is no problem. You can decrease level of signal by 6 dB without
 any problem before encoding and the asignal is still TOO LOUD.
- clipping in the decoder occures exactly then when the original signal was
 heavily hard clipped. The encoding/decoding process smooths this clipping
 a little bit and what you see is this little "de-clipping". See clipping
 page.


Seems that it would not work correctly...
NumLOCK
QUOTE
- you can nearly not make any prediction from changes in the MDCT room to
the time room. Decreasing coeffs change the time representation, more
to predict is impossible.

Gabriel, he's right it can't be calculated by single "guess". But an iterative method similar to the mp3 rate-distorsion loop (if I'm not mistaken) could come useful for the purpose (ie: switching freq_domain -> time_domain several times, to measure the concrete effect of MDCT coeff changes).

QUOTE
- rounding of coeffs is done by very sophisticated rules to preserve
sound feeling. Any change do very likely decrease quality, because
some properties are not fulfilled anymore

... which means that freq-domain clipping should be done *before* psychoacoustics processing.

QUOTE
- clipping is no problem. You can decrease level of signal by 6 dB without
any problem before encoding and the asignal is still TOO LOUD.
- clipping in the decoder occures exactly then when the original signal was
heavily hard clipped. The encoding/decoding process smooths this clipping
a little bit and what you see is this little "de-clipping". See clipping
page.

Indeed. Really nice page btw, an interesting read smile.gif
2Bdecided
Thank you Gabriel - I understand clearly what you were saying now.

I don't think it would work.

Also, if it works, there's no point:

We can't know what to change unless we adopt an interative procedure, as suggested by NumLOCK. And, if we only change SOME of the MDCT coefficients, then we're effectively filtering the signal (either EQing, or, if we're really clever and lucky, just phase shifting). Surely we don't want to filter the strongest (probably) spectral components of the signal just to stop it clipping? So, the only solution is reduce them all in level by the same amount. But scaling all the MDCT coefficients by the same amount in the frequency domain is just the same as scaling all the samples (in the time domain). There's no real advantage or disadvantage to either method - except that, if the scaling is done on the time domain signal while decoding, as least you know HOW MUCH you have to scale it to prevent clipping, without itteration or guessing.

Which is a pity, because it did look like a nice idea!

I think, if you're going to do something sensible in the encoder, then the encoder should automatically unclip it's own files - but this job probably belongs in an external utility, because it can only be done after the whole file has been encoded.

Cheers,
David.
P.S. I don't fully appreciate the difference between FFT and MDCT. I know the equations (well, not off the top of my head!), but intuitively I don't realise the difference. So the above isn't 100% certain - but it seems to match what Frank said, so it's probably right!
This is a "lo-fi" version of our main content. To view the full version with more information, formatting and images, please click here.
Invision Power Board © 2001-2009 Invision Power Services, Inc.