Help - Search - Members - Calendar
Full Version: Re-Encoding quality loss
Hydrogenaudio Forums > Lossy Audio Compression > MP3 > MP3 - Tech
miwalter
Hi,

I wondered about the following:

If I encode using Lame with 320kbits (or some other settings), decode this encoded file to wave and then re-encode, why do I loose quality?

As I understand MP3-compression it "just" cuts the not so hearable information from the original signal. So, using the suppossed to be "best audio quality"-setting, when I re-encode this signal it just has nothing to cut, has it?

Thanks for replies.

Mirko
Sawg
Mp3 is not perfect. Audible artifacts (even at 320kbps) get introduced into the audio. Upon re-encoding these artifacts are indistinguishable from normal music. Though using 320kbps to transcode the amount of extra lost to the average listener is usually insignificant or totally transparent to them. But it still happens, even at higher bitrates.

It’s like making a Xerox copy of a copy. Each time the document gets worse, though if the copier is set to it’s highest quality setting the loss is minimal.


Or somthing like that.
kjempen
Encoders doesn't just cut the not-so-hearable sound, but it also adds "noise" (artifacts).

Now think about how it after one encoding would add "noise" and cut out some sound, then you decode it. It's still (after decoding it) in the same as quality as the encoded audio, ie. different from the original because it's not lossless. Encode it a second time, and remove even more audio plus add even more "noise".

The result should be pretty obvious...


Imagine what it would be like if those newbies are right: Improve audio quality by f.ex. re-encoding a 128 kbps MP3 to 192 kbps MP3! biggrin.gif
miwalter
If I understand the information right, mp3-encoding adds noise.
Is this noise a sideeffect of the encoding or errors that get introduces by mathematical round errors (I don't know if that's the correct english expression. The meaning should be: if I cut 3,141562 to 3,1416 (rounded)).

I'm so insisting, because most theoretical articels I read said concluding, that the lossy compression just cuts of (unhearable) pieces. But there is no (or maybe there is, but I can't remember) clue, why there would be noise added...

Is it theoretical possible to do lossy encoding without adding but only cutting something?

Ok. I _can_ listen, that re-encoded audio is loosing quality. But I wondered why.

Thanks,
Mirko
SometimesWarrior
QUOTE
Originally posted by miwalter
If I understand the information right, mp3-encoding adds noise. 
Is this noise a sideeffect of the encoding or errors that get introduces by mathematical round errors (I don't know if that's the correct english expression. The meaning should be: if I cut 3,141562 to 3,1416 (rounded)).

I'm so insisting, because most theoretical articels I read said concluding, that the lossy compression just cuts of (unhearable) pieces. But there is no (or maybe there is, but I can't remember) clue, why there would be noise added...

Is it theoretical possible to do lossy encoding without adding but only cutting something?
I'm still learning this stuff myself, but I'll try to explain what I think I know wink.gif

MP3 encoding doesn't just add noise due to mathematical rounding errors; there is also something called quantization noise that is added to the sound. When a lossy encoder "throws out" sound which it thinks is unhearable or least important, noise takes its place. (At least, that is how I understand it... a more knowledgeable person might have to correct me on that.)

Here is a quote by the designer of MPC:

QUOTE
Originally posted by Buschel (from this thread)
[b]the problem is the following:

1st encoding: the encoder will add quantization noise up to amount that is normally imperceptible and will cut frequencies that won't be perceived.

next encodings: the signal has become noisier and contains less energy at higher frequencies. both will raise the allowed masking threshold -> even more noise will be added. even if the encoder will calculate exactly the same masking thresholds the result after encoding will be noisier...

it may be possible to put up a psychoacoustic model which won't be disturbed by preencoded noise but this will lead to higher bitrates for 1st time encoding... as most users will encode original wavs i see no need to achieve such behaviour.
I hope this quote helps explain things a bit.
Garf
QUOTE
Originally posted by SometimesWarrior
I'm still learning this stuff myself, but I'll try to explain what I think I know wink.gif

MP3 encoding doesn't just add noise due to mathematical rounding errors; there is also something called quantization noise that is added to the sound.


Same thing - quantization is rounding (more or less)

--
GCP
Garf
QUOTE
Originally posted by miwalter

I'm so insisting, because most theoretical articels I read said concluding, that the lossy compression just cuts of (unhearable) pieces. But there is no (or maybe there is, but I can't remember) clue, why there would be noise added...


You have a wrong image of what happens. The 'cutting out' of the inhearable pieces consists of rounding (quantizing) the data that represents them. There's no actual 'deleting' or anything taking place. Quantizing introduces noise, but it's up to the encoder to figure out how much of this noise it can introduce and still keep it inaudible.

The noise being added is a direct effect of the unhearable pieces being 'cut out'.

--
GCP
Garf
QUOTE
Originally posted by miwalter
So, using the suppossed to be \"best audio quality\"-setting, when I re-encode this signal it just has nothing to cut, has it?


The problem is that you're moving the signal through a series of different representations (WAV->MP3->WAV->MP3), all of which can't exactly store the result of the previous one, so you have loss there as well.

--
GCP
miwalter
Quantizizing means "collect nearly the sames infos into one and fill the new 'space' with some random noise"?

Well. I guess I don't understand the _why_ behind this. Why not leave the 'space' empty? The player could just add normal background noise (I don't know if this is necessary but I guess it is (like mobile phone transmissions)).

Maybe I understood something completely wrong:
I have got a wave-form audio signal. Some frequenzies are known to not been heard, so they are cut of (= compression). The rest could be saved into the encoded file and voilá: You have got a compressed audio signal without noise. If you decode this signal you should have the "cutted" wave-form (why noise?). If you try to compress with the same compression level (= kbits or parameters) the encoder simply notice, that there are no more not-hearable frequenzies and encodes to the same signal like before.
Or are compressed audio signals more than just cutted waveforms? Maybe some other "optimiziations"? I know about j-stereo and so on, but these things only transform doubled data into one (so I guess).

Is it in theory possible to compress only the way I described or are there some technical reasons I don't know (yet)?
spoon
Has anyone done any tests on this - I would like to see the same file, say at 160Kbps - converted from and to a wave file a number of times, possibly do a listening test on it. However to keep the results fair, the listeners would not know anything about the quantities of conversions done, so you might have:

Original
10 conversions
2 conversions
5 conversions
25 conversions
7 conversions
18 conversions
Garf
QUOTE
Originally posted by miwalter
Quantizizing means \"collect nearly the sames infos into one and fill the new 'space' with some random noise\"?


No, the noise happens _because_ of the quantization, nobody is adding it on purpose smile.gif

--
GCP
Garf
QUOTE
Originally posted by miwalter

I have got a wave-form audio signal. Some frequenzies are known to not been heard, so they are cut of (= compression). The rest could be saved into the encoded file and voilá: You have got a compressed audio signal without noise.


You can't store them perfectly. Any imperfect storage implies some quantization. And then you've got a compressed audio signal with noise smile.gif

--
GCP
miwalter
QUOTE
Originally posted by Garf


You can't store them perfectly. Any imperfect storage implies some quantization. And then you've got a compressed audio signal with noise smile.gif
GCP


Ahm. Ok. Mirko's action:

Why can't I store them perfectly? I thought that every audio signal can be described as a sinus/cosinus-function. If that is so, only the player may have difficulties to decode the signal "perfect", because to some degree you have to round this function. I think for pictures you already have a format, that compresses without those "imperfect storage"-problems (lurawave), that does excactly that (functions).

Then about the noise. Is it possible to explain, why noise is an automatic effect of quantization? I thought quantization is "collecting" nearly same information into one. So the one is "shorter" then the original signal. I understand, that there has to be some "filling" for these shortened information. But why automatically noise?

I understand the basics of audio-signals (I know some things about GSM-modulation so it should be possible for me to understand the "whys").

Thanks a lot. Now I'm really into it ;-)
Garf
QUOTE
Originally posted by miwalter

Ahm. Ok. Mirko's action:

Why can't I store them perfectly? I thought that every audio signal can be described as a sinus/cosinus-function.


..can be descibed as an [b]infinite
series of ...

QUOTE

Then about the noise. Is it possible to explain, why noise is an automatic effect of quantization? I thought quantization is \"collecting\" nearly same information into one. So the one is \"shorter\" then the original signal. I understand, that there has to be some \"filling\" for these shortened information. But why automatically noise?


What else would it be? If the leftover signal had some structure, your compression/quantization would be suboptimal.

--
GCP
Sunhillow
I'll try to explain this another way (sorry for somehow strange english)

[list]

You see, some parts of the input signal might really be left out, but additionally there will be very much quantization noise in the subbands which are kept to be encoded. And in "normal" music, most subbands will be encoded.

If you re-encode such a signal, the encoder cannot know which parts belong to the former input signal and which are just encoding artifacts from the first encoding step.

Re-encoding only makes sense if you have to convert high quality MP3s to a considerable lower quality (and bitrate)

Hope this helps
miwalter
QUOTE
Originally posted by Garf

..can be descibed as an [b]infinite
series of ...
GCP


Oha. That is news. But... oh well. I guess I have to read some more, first. My main problem is, why do I have to have an infinite series of those functions. When it is possible with picture-data, in the "correct" display you could see the picture as a waveform, why not with audio?

QUOTE
Originally posted by Garf

What else would it be? If the leftover signal had some structure, your compression/quantization would be suboptimal.
GCP


Else? Well. Nothing. That would be optimal compression, wouldn't it? When I have compressed/quantiziest the signal, there would be no "rest" to produce noise. If I left some signal, why is that needed?


QUOTE
Originally posted by Sunhillow

the frequency spectrum is divided into 32 subbands
the encoder calculates, if any of these subbands is masked by its neighbours. If yes, it will not be encoded


Masked means "the previous signal overlaps the next one so the next one is not hearable"? Ok. I understand that.

QUOTE
Originally posted by Sunhillow

within each subband the encoder calculates the allowed noise level which would be masked by the music content


Wow. That's fast. Why is there noise left? Is this masking process not complete (e.g. so some information is needed, not everything is overlapping)? Then I understand, why there is noise left (the signal, that is left by "subtracting" the overlapped signal). I could call this "noise" the rest of the original signal, couldn't I?

QUOTE
Originally posted by Sunhillow

the difference between signal and allowed noise is the necessary encoding accuracy for this subband. Mostly this is only a few bits, so there will be lots of quantisation noise


"Only a few bits" - "so there will be lots of quantization noise". This doesn't match, does it?

QUOTE
Originally posted by Sunhillow

If you re-encode such a signal, the encoder cannot know which parts belong to the former input signal and which are just encoding artifacts from the first encoding step.


Ahaaaaaaa. So. Encoding is not perfect. There has to be another way to reduce audio data without those "sideeffects". Why not cutting down the signal to a level, that a human ear can hear? Ok. That could be philisophical.

Thanks a lot.

Mirko
whathe
I would also appreciate a brief technical explanation of why “quantization noise” is an unavoidable artifact.

Since encoding involves an algorithm transforming a digitally recorded signal, the statement, “You can’t store them perfectly. Any imperfect storage implies some quantization,” is perplexing. tongue.gif

Also, what percentage of a typical encoded signal is quantization noise? It must be insignificant because encoder designers have apparently not tried to minimize it, only to keep it below an audible threshold on the first encoding. On the other hand, if the quantization noise introduced can be described as “artifacts…indistinguishable from normal music” for the encoder and it produces audible quality deterioration with one re-encode, than maybe it’s not so insignificant in terms of %bits.
Garf
QUOTE
Originally posted by miwalter

Oha. [b]That
is news. But... oh well. I guess I have to read some more, first. My main problem is, why do I have to have an infinite series of those functions. 


That is maths...

QUOTE

When it is possible with picture-data, in the \"correct\" display you could see the picture as a waveform, why not with audio?


I'm not sure I understand your question here.

QUOTE

Else? Well. Nothing. That would be optimal compression, wouldn't it? When I have compressed/quantiziest the signal, there would be no \"rest\" to produce noise.


If you have perfect (i.e. lossless) compression/quantization, then yes, there is no leftover and no noise. But you won't get good compression this way smile.gif

QUOTE

If I left some signal, why is that needed?


I don't understand what you mean.

QUOTE

Is this masking process not complete (e.g. so some information is needed, not everything is overlapping)? Then I understand, why there is noise left (the signal, that is left by \"subtracting\" the overlapped signal). I could call this \"noise\" the rest of the original signal, couldn't I?


I think you got the point here smile.gif

The signal that's reconstructed from the quantized version isn't exactly the same as the original - the differences are the noise.

QUOTE

\"Only a few bits\" - \"so there will be lots of quantization noise\". This doesn't match, does it?


It's perfectly right. If the signal is stored in very few bits, that means it was quantized heavily, so there will be a lot of quantization noise.

QUOTE

Ahaaaaaaa. So. Encoding is not perfect. There has to be another way to reduce audio data without those \"sideeffects\".


Well, yes, lossless coding. But that won't give you good compression. The discussion was about MP3.

--
GCP
Garf
QUOTE
Originally posted by whathe
I would also appreciate a brief technical explanation of why “quantization noise” is an unavoidable artifact.

Since encoding involves an algorithm transforming a digitally recorded signal, the statement, “You can’t store them perfectly.  Any imperfect storage implies some quantization,” is perplexing.  tongue.gif 


The problem was that he was talking about cutting out frequencies. A digital signal doesn't come as a set of frequencies - it comes as a PCM stream of bits. (i.e. it's in an amplitude/time representation rather than a frequency/time representation). If you want to work on the frequencies you need to transform the signal. Doing this transformation, cutting out an _exact_ set of frequencies (and how are you going to determine them with 100% accuracy?) and perfectly storing back the result is practically (almost?) impossible.

Another problem is that the number of frequencies that totally disappears will be very little, so you'll have almost no compression.

QUOTE

Also, what percentage of a typical encoded signal is quantization noise?  It must be insignificant because encoder designers have apparently not tried to minimize it, only to keep it below an audible threshold on the first encoding.


The codec designers do everything they can to minimze it. The problem is just: to get more compression, you want more quantization. Hence bigger errors. As long as you keep those errors below the treshold of hearing, you're ok.

--
GCP
Sunhillow
QUOTE
Wow. That's fast. Why is there noise left? Is this masking process not complete (e.g. so some information is needed, not everything is overlapping)? Then I understand, why there is noise left (the signal, that is left by \"subtracting\" the overlapped signal). I could call this \"noise\" the rest of the original signal, couldn't I?


there is no noise "left". the quantisation is so inaccurate that there will noise be added.

If you have an accuracy of 4 bits, which is not so uncommon in data reduction, only 2^4=16 different values of the signal form may be encoded. The difference to the original signal is called "quantisation noise"
JohnV
QUOTE
Originally posted by whathe
I would also appreciate a brief technical explanation of why “quantization noise” is an unavoidable artifact.
It's not an "unavoidable artifact". Encoders try to quantize so that the difference (quantization noise) is as inaudible as possible. When we are talking about artifacts,distortion or just 14/16 ABX results, we are talking about [b]audible quantization noise. If the sample is transparent, then the encoder succeeded in hiding the effect of the quantization (=quantization noise).

However, if we re-encode even this supposedly transparent lets say 320kbps mp3 file with the best possible settings, it goes again trough few lossy phases and an imperfect psychoacoustic model and finally when it's quantized again, it goes again further away from the original (more quantization noise than in the first encoding). At somepoint, maybe even with the first re-encoding round, the quantization noise may become audible.
IveyLeaguer
A very interesting discussion.

If you deal in audio enough, you will sooner or later confront the noise factor, or, should I say, it will confront you.

Whether you are dealing with a retail CD in which the quality has been compromised because it is many 'levels' removed from the Master, reproducing, remixing or remastering a recording, or, as is the case here, compression, there is no escaping the 'quantization error buildup' factor, as my friends here have already pointed out. Mathematically, it's simple enough really - take 3.6 vs. 3.6666666666666666......., as an extreme example - sooner or later you have to round off and the result is going to be less accurate than the original number - resulting, for our purposes, in noise added to the file, as has already been discussed.


I remix and remaster recordings from never compressed (with few exceptions) Wave files with what is probably the finest noise reduction software on the planet - under $50,000 anyway. Even in this format, there is no escaping the quantization errors that inevitably occur when altering a file. For example, if I have a file that needs 1) noise reduction 2) equalization, and say, 3) a high quality Tube Amp applied, and then normalization, that's 4 steps. Each of these steps naturally adds quantization noise. Fortunately, I can apply these functions simultaneously in one action, as opposed to sequentially, effectively eliminating the inevitable quantization errors. Though the sequential results are outstanding, far better than the original CD source file, there is still a slight, and sometimes audible difference (to a trained ear, on a good day) in the result. That is because the special filter that allows the simultaneous processing maintains the highest possible resolution for the signal throughout the process, from the first filter to the fourth, thus eliminating any need for file conversion and the resulting buildup of quantization errors.

Anyway, when it comes to lossy compression, these factors are compounded, as you are now in the business of selecting audio to be removed, further magnifying the quantization isssues and adding even more noise to the equation - one of many reasons there are so many lousy sounding MP3's. What's really amazing to me, considering the inherent noise factors, is that the guys around here have been able to develope and achieve (given the right software and settings, of course) 70 - 90% compression resulting in a small file that is, for all practical purposes, both transparent in quality and noise free.

Personally, I would avoid transcoding if at all possible. You are apparently interested in quality audio, have asked some excellent questions, and received the usual expert responses. I don't know who your sources are you referred to, but when it comes to audio compression, I would highly recommend that you follow, as I have, the advice on this board.

Happy Listening!
miwalter
QUOTE
Originally posted by Sunhillow
 

there is no noise \"left\". the quantisation is so inaccurate that there will noise be added.

If you have an accuracy of 4 bits, which is not so uncommon in data reduction, only 2^4=16 different values of the signal form may be encoded. The difference to the original signal is called \"quantisation noise\"


I think I didn't understand this. I only have 4
bits to encode the whole spectrum? No way.

QUOTE
Originally posted by Garf
 
When it is possible with picture-data, in the \"correct\" display you could see the picture as a waveform, why not with audio?
I'm not sure I understand your question here.


The representation of data can by by waves, by tables or what so ever. And the waveform-compression (lurawave) of picturedata can describe the picture information with mathematical functions. They can match that exact, that you may resize the picture (nearly) without quality loss.
I thougt: Why not do this with audio-data? It's not magic-data, it's just some frequenzies and duration(?).

QUOTE
Originally posted by Garf
 
It's perfectly right. If the signal is stored in very few bits, that means it was quantized heavily, so there will be a lot of quantization noise.


The same said Sunhillow. I'm not sure, why that has to be. Ok. If you would like the smallest possible file, you have to use very few bits to represent a data. But when the goal is maximum quality for _hearable_ audiodata (cut down frequenzies, cut overlapping parts and so on), there should be no reason for forcing some data into a few bits.

Very enriching discussion.
Sunhillow
QUOTE
I think I didn't understand this. I only have 4 bits to encode the whole spectrum? No way.

4 bits to encode one single sample within one subband. This example is very bad, because there are 32 subbands, and even if only half of them are used, there is less than 4 bits per subband to encode one single sample of the audio signal.

<edit>
QUOTE
QUOTE
Originally posted by Garf

It's perfectly right. If the signal is stored in very few bits, that means it was quantized heavily, so there will be a lot of quantization noise.


The same said Sunhillow. I'm not sure, why that has to be. Ok. If you would like the smallest possible file, you have to use very few bits to represent a data. But when the goal is maximum quality for _hearable_ audiodata (cut down frequenzies, cut overlapping parts and so on), there should be no reason for forcing some data into a few bits.


You must understand, every deviation to the original waveform may be called "noise". If you encode with few bits, the deviation will for most samples be quite big.

You insist on maximun quality for hearable audiodata. This is the keyword. The encoder tries to use as many bits as necessary to introduce only inaudible noise. If you always want a resolution of 16 bit, lossless compression is the way to go for you.
</edit>
miwalter
QUOTE
Originally posted by Sunhillow
 
You must understand, every deviation to the original waveform may be called \"noise\". If you encode with few bits, the deviation will for most samples be quite big.

You insist on maximum quality for hearable
audiodata. This is the keyword. The encoder tries to use as many bits as necessary to introduce only inaudible noise. If you always want a resolution of 16 bit, lossless compression is the way to go for you.


Someone in this discussion said, that a waveform may be 1. cutted to frequenzies man can hear and 2. overlapping (= unhearable) signals might be resolved.
So. The signal that is left after this processing has to be encoded.
Why should I have to encode this using fewer than 16bits (if that's the indicator for optimum quality)?

Don't take me wrong. I really try to understand this matter and I'm reading other sources as well. But, yet, I'm not satisfied with the informations.

I know that I may use lossless compression. But why should I, although my ears only get some of the signals that are technical possible to encode (or to represent on a CD e.g.)?
KikeG
QUOTE
Originally posted by miwalter

Why should I have to
encode this using fewer than 16bits (if that's the indicator for optimum quality)?


In order to have some data reduction. If you used the whole 16 bits, you would save no space, and the previous operations would have no sense.

The point is to throw away as much information as possible, and then encode what is left using the fewer possible bits. I mean, the utility of throwing away "parts" of the signal is to be able to encode what is left, using less bits.

Optimum quality is 16 bits, but then, no compression is possible. The utility of compression is to save space, lowering the objective quality of the signal, while maintaining as much as possible the subjective quality of the signal.
Sunhillow
QUOTE
Why should I have to encode this using fewer than 16bits (if that's the indicator for optimum quality)?


you may encode this using 24 bits if you want to. But the purpose of data reduction ist to use as few bits as possible - and as many as necessary.

If you want to cut out some frequency ranges, try to eliminate some silent harmonics, and afterwards insist on encoding this in full 16 bit resolution, I don't see the sense.
Encoding a thinned out musical signal with Monkey surely will result in a somewhat smaller file than without thinning, but the difference won't be worth the effort I think.
miwalter
QUOTE
Originally posted by KikeG

In order to have some data reduction. If you used the whole 16 bits, you would save no space, and the previous operations would have no sense.


Well. You mean, that the source-signal has to be handled such a way, that it could be written with fewer bits. I understand. But when I cut the "input" signal with the above described methods, I already have reduced data and may write the rest with 16bits. Because of the methods (cutting, overlapping) there should be fewer "16bit-pieces" than in the source signal.
Maybe it's so, that I have to transfer the original wave-format into some other representation, but in theory their should be datareduction without loosing necessary information (= producing noise).

QUOTE
Originally posted by KikeG

The point is to throw away as much information as possible, and then encode what is left using the fewer possible bits. I mean, the utility of throwing away \"parts\" of the signal is to be able to encode what is left, using less bits.


Yes. So when I have 1.000 16bit-pieces in the source-signal, then I could have 800 in the destination one. Their is reduction without loosing information in important pieces, isn't it?

QUOTE
Originally posted by KikeG

Optimum quality is 16 bits, but then, no compression is possible.


Ok. If I transfer the original 1.000 16bit-pieces into 1.000 16bit-destination-pieces their is no compression. But when I reduce the data to 800 their should be.

Thank you very much.
miwalter
QUOTE
Originally posted by Sunhillow
 
If you want to cut out some frequency ranges, try to eliminate some silent harmonics, and afterwards insist on encoding this in full 16 bit resolution, I don't see the sense.


Seen this way there is no sense, I agree. In the other posting I tried to explain better, what I mean.

Thanks.
KikeG
QUOTE
Originally posted by miwalter

Yes. So when I have 1.000 16bit-pieces in the source-signal, then I could have 800 in the destination one. Their is reduction without loosing information in important pieces, isn't it?


Then you would have an average of 12.8 bits per original "piece" or sample. (Edit: no good analogy...)

It is more effective to "cut" things in every piece, than to reduce pieces. I'd say that to simply "throw away" pieces, generally is not posible if you want to achieve a high data reduction and also retain subjective quality and avoid obvious artifacts.

If you reduce the amount of data, at last that unavoidably translates into a bit reduction. The trick is to do the highest data reduction the way the best quality is achieved, and one way of achieving this is using subband filtering and encoding.
Sunhillow
QUOTE
Originally posted by miwalter


Seen this way there is no sense, I agree. In the other posting I tried to explain better, what I mean.

Thanks.


I think I did understand what you mean, and I still think this makes no sense. Using full resolution audio data you will never get better results than LPAC or Monkey's Audio.
This is like blurring a photograph and saving it in RLE compressed, lossless format. The improvement in filesize will be marginal.
whathe
So quantization noise is unavoidable in the sense that greater lossy compression is, for a given psychoacoustic model, only achieved at the expense of increased quantization noise (?)--basic data reduction theory I gather.

I assume that quantization noise is well defined theoretically and is calculated in the same way in any lossy compression strategy. So do the various competing lossy compression strategies differ only in their choice of data pre-processing? (For example, musepack is distinguished by its use of more sub bands, if I recall correctly.)

Also, perhaps an interesting, very basic feature of lossy compression strategies is revealed by the failure of the photocopier analogy I have seen frequently used here. The photocopier analogy is usually given here as an example of iterative data loss. In the context, though, of this discussion (quantization noise) perhaps a better analogy would be a “dirty” photocopier that adds noise to the information remaining. Even so, if each copy were “99 percent transparent”, one would expect the 4th copy to be nearly indistinguishable from the 3rd, but easily distinguishable from the 1st copy or original, etc. The failure of this analogy would be due to the global and random nature of the information loss of the dirty photocopier vs. data compression. Perhaps this analogy could modified to a silly extreme so as to work.

Similarly, a naive prediction would be that a good lossy encoder should produce adjacent re-encodes that are 99 percent transparent to each other, which is not the case at all. So, would you explain to the naïve person that the encoder couldn’t almost transparently encode any sound sample that contains a lot of quantization noise? If noise is the limiting factor than any encoder would give equally bad results on re-encoding, but if not, would a different encoder using different codec strategies reduce the amplification of artifacts on re-encode? That is, could an ogg encoded sample sound better than an mpc encode if the original sample turns out to be a decoded mpc?

Anyway, thanks to the experts for their previous concise and informative responses.
miwalter
QUOTE
Originally posted by Sunhillow

I think I did understand what you mean, and I still think this makes no sense. Using full resolution audio data you will never get better results than LPAC or Monkey's Audio.
This is like blurring a photograph and saving it in RLE compressed, lossless format. The improvement in filesize will be marginal.


Sorry. Maybe it is my bad english. I'll try once more.

If I have 1.000 input pieces á 16bit (thats data ranging from 0 to 2^16 = 65536). Then the encoder analyses this and concludes, that every 5th contains no data ("0000000000000000" in bit), it cuts them of and adds a few pointers/notes in the destination (=encoded) file to tell the player.
So it has 800 pieces left. Now it analyses about overlapping and sees, that in about a 1/4th of these there are definitly things, a human ear cannot hear. Here the encoder cuts out the signals, that are the once. I understand, that in _this_ particular steps some things are "left" out (because of rounding errors or because the signal is not so identificable as thought *g*) of cleaning and here may occur, what you would call noise, I guess.
All the time they were 16bit and they stay so; even the once with "noise" will have 16bit-quality (ok, but may sound weird).

Is this it (simplified)?
miwalter
QUOTE
Originally posted by KikeG

Then you would have an average of 12.8 bits per original \"piece\" or sample.

It is more effective to \"cut\" things in every piece, than to reduce pieces. I'd say that to simply \"throw away\" pieces, generally is not posible if you want to achieve a high data reduction and also retain subjective quality and avoid obvious artifacts.



Gracias. So my picture from just "cutting" out some things is maybe more theory than practical.

QUOTE
Originally posted by KikeG

If you reduce the amount of data, at last that unavoidably translates into a bit reduction. The trick is to do the highest data reduction the way the best quality is achieved, and one way of achieving this is using subband filtering and encoding.


This is questionable in my logic. Please take a look at my other posting. Maybe you could see my point and where my error is.
KikeG
QUOTE
Originally posted by miwalter

This is questionable in my logic. Please take a look at my other posting. Maybe you could see my point and where my error is.


Ok, maybe theorically what I said is not rigorously exact, you could do what you explained.

The thing is that at real world, things are not as easy as you describe. What you propose is part of what is done at lossless compression. However, if you want to achieve high compresion rates, you have to be more "drastical" and to actually throw away things, leading to bit reduction and increased quantizacion noise.

One thing is how things could be, and another thing is how they actually are.
miwalter
QUOTE
Originally posted by KikeG


Ok, maybe theorically what I said is not rigorously exact, you could do what you explained. 

The thing is that at real world, things are not as easy as you describe. What you propose is part of what is done at lossless compression. However, if you want to achieve high compresion rates, you have to be more \"drastical\" and to actually throw away things, leading to bit reduction and increased quantizacion noise.

One thing is how things could be, and another thing is how they actually are.


Ok. I can accept that. So some things are cleared.

I thought it over. I can't accept wink.gif

If you reduce the data from e.g. 16bit to 12bit, then why should there be noise added? I don't understand the logic behind "reducing bits means introducing noise".
JohnV
QUOTE
Originally posted by KikeG
Then you would have an average of 12.8 bits per original \"piece\" or sample.
I don't know what the heck you are actually talking about, but the internal resolution of samples with codecs encoding 16 bit audio is over 16bits.
KikeG
QUOTE
Originally posted by JohnV
I don't know what the heck you are actually talking about, but the internal resolution of samples with codecs encoding 16 bit audio is over 16bits.


??? Hey, it was just an example. Maybe the internal representation is over 16 bit, I don't know well the exact methods involved in the different perceptual codecs, but as average you "save" several bits for every original 16 bit sample.

Isn't it what some more knowledgeable (than me) people's last posts were trying to explain, about higher quantization noise due to the actual use of fewer bits to store the signal?
miwalter
@KikeG:

I have edited my post.
JohnV
QUOTE
Originally posted by KikeG
??? Hey, it was just an example. Maybe the internal representation is over 16 bit, I don't know well the exact methods involved in the different perceptual codecs, but as average you \"save\" several bits for every original 16 bit sample.

Isn't it what some more knowledgeable (than me) people's last posts were trying to explain, about higher quantization noise due to the actual use of fewer bits to store the signal?
Yeah, of course the average bits/sample is lower with encoded file. The discussion just started to go weird, i mean that 16 bit resolution concept and average bits/sample maybe started to mix. I don't know, maybe it was just me. smile.gif
IveyLeaguer
QUOTE
Originally posted by JohnV
 
The discussion just started to go weird, i mean that 16 bit resolution concept and average bits/sample maybe started to mix. I don't know, maybe it was just me.


No, I don't think so. I was thinking the same before the 16bit stuff. Looking at the original questions, there have been ample answers, though the quantization stuff was interesting. The guess here is our poster is sort of creating his logic as he goes along - either that, or just being argumentative. But hey, maybe it's just us. smile.gif
krick
QUOTE
Originally posted by miwalter

If you reduce the data from e.g. 16bit to 12bit, then why should there be noise added? I don't understand the logic behind \"reducing bits means introducing noise\".


I think the confusion here is semantics. You don't introduce "noise", you introduce "distortion". i.e. the audio signal is distorted from the original. If you get enough distortion, it becomes audible. The trick with lossy compression is to remove parts of the audio signal that result in the least audible distortion to the listener.

One problem that we are dealing with here is that CD digital audio is already a form of lossy compression. The physical sound created by the musician's instruments are analog waves. To put this on a CD digitally, you have to take "samples" of the original sound at intervals that are close enough together to fool the human ear into thinking it is hearing a continuous sound wave. According to Nyquist's Theorem, you only need to sample at 2X the maximum frequency to capture the sound effectively. Since humans can hear up to about 20KHz, a sample rate of 44.1KHz was chosen for CD audio.

As an aside, the sample rate concept is similar to motion pictures in a theatre. Your eyes are shown 48 frames (visual samples) per second (each of the 24 frmes per second are shown twice) and your brain is fooled into thinking it's seeing actual motion.

The number of bits used for the sample determines the number of distinct "levels" that the sample can be represented with. If you think of a sample as a snapshot of the audio wave at a given instant, it should be easy to see that if you have more bits, you can represent the sample more accurately. For example, say we were sampling the audio volume. With 8 bit samples, we could take a snapshot of the volume and round it to the nearest of any of 256 different volume levels (note that the rounding introduces error and possibly distortion). If we only had 4 bit samples, we would have to round it to one of 16 different volume levels (which introduces a bit more error). If we only had 2 bit samples, we would have to round it to one of 4 different volume levels (which introduces a LOT more error). And if we could only use 1 bit samples, the signal would have to be rounded to either of two volume levels, ON or OFF!!!. This is most certainly not what we want to happen.

Now, to compress digital audio further, you have several options. You can reduce the sample rate, or you can keep the same sample rate and throw away redundant samples.

Reducing the sample rate results in audio that doesn't completly represent the whole frequency range that humans can hear. This isn't really a good option.

To throw away redundant samples, you have to come up with an algorithm that determines what samples are redundant. The quality of this algorithm is what separates the good MP3 encoders from the crappy MP3 encoders.

If the samples are thrown away in the wrong areas, it creates audible distortion. In MP3 audio, this usually manifests itself at the high end in cymbal crashes that aren't crisp (or are non-existant). In extreme cases, you hear an annoying swirling effect.

So to recap, what is "added" to MP3 audio isn't really "noise" in the traditional way one would think of noise (i.e. static), it's actually distortion because the original audio signal is no longer represented accurately.
KikeG
Just a couple of things:

The purpose of sampling is not to "fool" the ear, is to capture a signal within a specified bandwidth and an amplitude error (=quantizacion "background" noise or distortion). The output from any working DAC is a smooth, continuous waveform, limited in frequency to nearly half the sampling rate. The analogy with motion pictures is not adequate.

Depending on how you do the things, this quantization error appears as simple signal-uncorrelated background noise (using dither), or signal-correlated distortion (no dither, or what I believe is done at compressors internally).

To miwalter:

The less bits, the more error when sampling and/or playing the signal --> the more quantization noise or distortion.

The signal to noise ratio (SNR) measures the relationship between signal ("good" information) and noise (error ). For linear PCM digital audio, the max. achievable SNR, in dB, is calculated:

SNR (dB) = 6.02 * N

Where N is the nº of bits used to capture/reproduce the signal.


Also, in lossy compressors you dont remove "redundant" information as main means to achieve data reduction (however this is true for lossless compressors), you remove "inaudible" information.

Edit: er... sorry... the actual formula is: SNR(dB) = 6.02 * N + 1.76
Garf
QUOTE
Originally posted by miwalter

Someone in this discussion said, that a waveform may be 1. cutted to frequenzies man can hear and 2. overlapping (= unhearable) signals might be resolved.
So. The signal that is left after this processing has to be encoded.
Why should I have to
encode this using fewer than 16bits (if that's the indicator for optimum quality)?


If you use 16 bits, then you aren't compressing anything.

Moreover, you're assuming perfect knowledge about what frequencies are and aren't hearable. That knowledge isn't available to the encoder.

--
GCP
Frank Klemm
QUOTE
Originally posted by KikeG
Just a couple of things:


SNR (dB) = 6.02 * N

Where N is the nº of bits used to capture/reproduce the signal.


Completely wrong:

Technical unweighted SNR for audio without dithering:
SNR = 6.02 dB * N + 1.76 dB

Technical unweighted SNR for audio with equal distributed dither:
SNR = 6.02 dB * N - 1.25 dB

Technical unweighted SNR for audio with triangular distributed dither:
SNR = 6.02 dB * N - 3.01 dB

Technical unweighted SNR for audio with subtractive dithering:
SNR = 6.02 dB * N + 1.76 dB

Technical unweighted SNR for video without dithering:
SNR = 6.02 dB * N + 10.79 dB

Weighted SNRs depend on sampling frequencies because the noise is spreaded from 0...fs/2. Also you can weight the noise using A-weighting, B-weighting, CCIR-weighting,
using audio-psycho models for audio or some other CCIR-weighting for video using
lattice filters.

Noise shaping increases SNR. Amount depends on sampling frequency. You get
ca. 4 dB for fs=32 kHz, ca. 15 dB for fs=44.1 kHz and ca. 35...40 dB for fs=96 kHz.

Although SNR = 6 * N is the mostly posted formula, it is wrong from the technical
point and also it does not mentioned the fact, that a 16 bit system can have a SNR
from 88 dB (CCIR, triangular dithered)...113 dB (Psycho, subtractive dither,
noise shaping), depending on measurement and implementation. These differences
are larger than the difference of a 12 and a 16 bit system.

If someone says 16 bit = 96 dB, don't believe him. He didn't understand the problem ;-)

Formulas can be computed using simple math (most difficult part is to integrate f(x)=x²)
KikeG
QUOTE
Originally posted by Frank Klemm


Completely wrong:

Technical unweighted SNR for audio without dithering:
  SNR = 6.02 dB * N + 1.76 dB

If someone says 16 bit = 96 dB, don't believe him. He didn't understand the problem ;-)


Obviously, I was assuming the simplest case of undithered signal. And it is true that I forgot the 1.76 term.

But from there to say that it is "completely wrong" or that because someone says 16 bit=96 dB, "he doesnt' understand the problem", well, I'd say that's quite a big leap.
Destroid
QUOTE
Originally posted by krick
I think the confusion here is semantics


After reading this a while I really believe so also.

Just in case there are ambiguities the point should be made:
[b]Noise
is not necessarily meaning "unpleasant" or "sour" (i.e. FM radio static) but actually should be viewed as 'non-bit-perfect' in the case of digital audio. The quantized signal will not compare exactly to the original. But it should sound pretty damn close -- so close that the space savings are justified for the near-perfect representation of the original file as heard (not as integral data storage).

However, unpleasant and sour artifacts that are audible is not the result of effective quatization.

QUOTE
Originally posted by spoon
Has anyone done any tests on this - I would like to see the same file, say at 160Kbps - converted from and to a wave file a number of times, possibly do a listening test on it. However to keep the results fair, the listeners would not know anything about the quantities of conversions done, so you might have:

Original
10 conversions
2 conversions
5 conversions
25 conversions
7 conversions
18 conversions


I am going to do this, but I probably won't do it in a real-world form (i.e. changing bitrates higher/lower when re-encoding, or using the --aps). I will probably use [b]lame -b 160 -m j
each time and see when it gets out of hand. My computer is too slow to do multiple --aps encodes without annoying me tongue.gif

--edit
====Results====
Sample: Metallica - Metallica (JAP) Nothing else matters - excerpt 4:29 - 6:02
Exe ID: LAME version 3.92 MMX (http://www.mp3dev.org/)
Encode line: lame -b 160 -m j
Decode line: lame --decode
Other: No ABX'ing, crappy speakers and the fan was running cuz it was hot :-/

Note: EAC indicated 100% peak level from the CD track, may have screwed me from the start.
Each transcode had increased percentage of stereo frames and decresed joint-stereo frames. Also each encode/decode had an increase number of total frames +1.
This sample had the loudest chorus w/orchestra+drums, overdriven guitar+solo and single guitar+single voice. 160kbps was barely enough the first time but around the fourth transcode it suffered very badly with loud attacks, hi-hats, single voice. After the eleventh transcode the music was so horribly mangled that it was difficult to differentiate the sixteenth with the twenty-fifth transcode except during the quietest section at the end where the least amount of transcode damage was sustained.
miwalter
I think I now understand, why quality is lost during encoding.

I conclude for myself and for those, who are interested, if I have understood now:

The "noise" is more or less the signal thats left out after the cutting of the redundant or unhearble signal. This one I understood quite fast, I thought :-)

The 16bit thing: So the analogy with Huffmann-Coding and so on is not right. I thought, that a sample of audio would be 16bit wide and I could simply cut out 6bits and poste the 6bits from the next 16bits (during the "cutting" process) not introducing some other things (noise). I was wrong, obviously.
So the lossy encoders try to analyse a lot more than just 16bits on each step and try to cut those signals out, that are irrelevant for the listener. Because of the complexity of "guessing right" there are some signals left in the encoded song, that may not be pleasant. This would be called "noise" also (maybe "distortion", too). The interesting in this is, that this "noise" could also be unhearble (thats the optimal case) but may also be noticed as well hearble artifacts (this I was calling noise before). I could see this as "added", because the original signal is definitly changed into something else. To be clear here: This noise is not some random distortion that is added during the encoding, it's simply not complete cut off things of the original signal and maybe some information, that has changed because of mathematical rounding errors (or not optimal coding I guess).

If this encoded audio signal is decoded and re-encoded, the algorithm has to "guess" again. With the same problems as described above. So it adds noise again (maybe). When I do this again and again, more and more of this noise becomes definitly heareble and so the quality fades. It wouldn't, if it was easy to differentiate between original and changed (encoded) signal (then this changed signal could simply be left out during re-encoding).

Besides: I was not trying to build my logic during discussion. I also was not trying to argue against the experts here :-/

Thanks everone!
This is a "lo-fi" version of our main content. To view the full version with more information, formatting and images, please click here.
Invision Power Board © 2001-2008 Invision Power Services, Inc.