Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: lossywav for lossy codecs? (Read 18749 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

lossywav for lossy codecs?

hi.  As far as I know LossyWav is designed as a pre-processor in conjunction with lossless codecs, significantly reducing the bitrate.

What happens if LossyWav is used before lossy encoding? How big is the average bitrate reduction? And in case it is considerable enough to warrant the combination Lossywav+LossyEncoder what is the impact in terms of quality degradation? Can artifacts of lossy encoders be disproportionally worsened by LossyWav (i.e. degradations multiply instead of simply adding) ?

thanks.

lossywav for lossy codecs?

Reply #1
That would be lossy transcoding where lossywav is the source. You won't save on bitrate as lossywav isn't nearly as aggressive as traditional lossy : lossywav Q2 >> mp3 is near lossless and you might save no more that 2 kbps.

lossywav for lossy codecs?

Reply #2
Many thanks for your reply.
I think I understood what you said ...
... although I wouldn't use the term "lossless" in connection with mp3, that's confusing (lossywav Q2 >> mp3 is near lossless?)

lossywav for lossy codecs?

Reply #3
If you pre-process with lossywav before conventional lossy encoding then the only effect will be to add some noise. The final result could be either larger or smaller depending on the effect that the added noise has.

lossywav for lossy codecs?

Reply #4
lossyWAV adds white noise when it rounds lower significant bits to zero - this covers the frequency range up to Nyquist equally. So, a lossy transcode of lossyWAV processed audio will have (probably) more high frequency noise to contend with compared to the lossless original.
lossyWAV -q X -a 4 -s h -A --feedback 2 --limit 15848 --scale 0.5 | FLAC -5 -e -p -b 512 -P=4096 -S- (having set foobar to output 24-bit PCM; scaling by 0.5 gives the ANS headroom to work)

lossywav for lossy codecs?

Reply #5
thank you for your to-the-point replies. What I love so much about you experts here is that you can answer potentially difficult questions in a matter of 1 or 2 sentences and be so clear and understandable that even beginners like me have (most of the time) no hard time in following. (and I'm not talking just about this thread about your replies in general, I just wanted to use this occasion...)

anyway,
so in essence there is absolutely no good reason to use lossyWav in conjunction with lossy encoding, right?

lossywav for lossy codecs?

Reply #6
another way to put it is that lossy codecs will not take advantage of lossyWav processing as they do the processing to lossy themselves and can do that better with the original (lossless) input.
In theory, there is no difference between theory and practice. In practice there is.

lossywav for lossy codecs?

Reply #7
ok. that's another way to put it. thanks. But is my conclusion correct nevertheless?

lossywav for lossy codecs?

Reply #8
ok. that's another way to put it. thanks. But is my conclusion correct nevertheless?


You can check by yourself.
I was also curious, so I've tried some files with different settings of lossyWav as a preprocessor for Musepack.
The result was almost always a bigger file (0.1 to 0.6 %) and just once a smaller one (about 0.1%).
About the quality, I haven't found some degradation, but at that settings (q5) mpc is usually transparent for natural music, so I haven't bothered so much.

lossywav for lossy codecs?

Reply #9
thanks for the additional testing. Much appreciated.

Actually, it was not so much that I didn't believe it, but I rather wanted to see if the conclusion ("here is absolutely no good reason to use lossyWav in conjunction with lossy encoding") is correct and then move on.

lossywav for lossy codecs?

Reply #10
A good explanation I saw someone post once on why pre-processing the WAV file with lossyWAV prior to lossy encoding with MP3 (for example) doesn't affect the size of the end-resulting MP3 file nearly at all is that the processes that lossyWAV and MP3 do on the WAV file are somewhat "orthogonal" to each other.

When you think about it a bit, MP3 reduces the size of a WAV file by converting the signal to frequency domain (using the Fourier transform), then processing it further by removing hard-to-encode noise that cannot be heard or adding noise that also cannot be heard but will make the signal easier to encode. All that lossyWAV does to a WAV file is set a couple of ending bits (least significant bits) to zero in the WAV audio samples, depending on the noise floor in those samples' block. By doing a long, boring and tedious calculations one could mathematically prove that removing a couple of least significant bits from a quantised signal (such as that in a WAV file) does not significantly affect its Fourier transform. Thus, as far as MP3 is concerned, lossyWAV has done nearly nothing to the WAV file that is to be encoded.

If, however, a lossy encoder such as MP3 or OGG would be programmed to recognise when a block of at least 512 samples (or something) has had their x least significant bits set to zero and use that information to encode those samples more efficiently, perhaps a greater compression ratio could be accomplished. But as far as current lossy encoders go, nearly all they care about in the original signal to be encoded is what its Fourier transform looks like, then proceed to blindly butcher that transform in a way that can't be heard by humans (based on the encoder's calculations) but will make the signal easier to encode - they don't care about least significant bits being or not being zero.

lossywav for lossy codecs?

Reply #11
By doing a long, boring and tedious calculations one could mathematically prove that removing a couple of least significant bits from a quantised signal (such as that in a WAV file) does not significantly affect its Fourier transform.

Also, one thing I forgot to say worth noting about the ending part of this sentence: The fact that removing a couple of least significant bits from a quantised signal does not affect the signal's Fourier transform significantly is the main reason why the artefacts of lossyWAV's processing are difficult to hear and why lossyWAV works so well. In a certain sense, our auditory system is also based on transferring the incoming sound waves from time domain to frequency domain, albeit the process is a bit different from Fourier transform, but the point is that lossyWAV's removal of a couple of least significant bits will not significantly affect the signal's transfer to frequency domain by any sensible time-domain-to-frequency-domain method, including our auditory system's and MP3's Fourier transform.

So basically, this whole story boils down to a quite simple summarisation: MP3 is unaffected by lossyWAV's preprocessing precisely because lossyWAV is near-lossless. It doesn't change the signal significantly, therefore it doesn't significantly change the way MP3 "sees" the signal either, and that's all there is to it.

lossywav for lossy codecs?

Reply #12
Almost everything you are saying in those two posts above is either just not true or at the very least extremely misleading.

Applying something like lossyWAV will significantly change the result of the Fourier Transform (or MDCT or whatever).

It's in the phases after that that the psymodel will decide that certain frequency bands can be encoded in a very noisy manner. So noisy, that the (fewer) noise added by lossyWAV will disappear into that. lossyWAV is able to determine where this will happen because it has a psymodel of its own.

This psymodel is important because it allows lossyWAV to detect how much bits can be safely removed on a per block basis. The need to switch this per block (which represents a very small segment of time) is clearly at odds with your erronous claims above. You would be able to see that even with the same noise floor the effect lossyWAV has is very much dependent on the signal itself (specifically, the tonality of it, which is one of the most important factors in a psymodel AND an mp3 encoder).

So, there is no "orthogonality" at all. If there were, preprocessing with lossyWAV would help but as you can see it clearly does not. The reason why there's so few added benefit is that they are both doing the same kind of processing.

lossywav for lossy codecs?

Reply #13
Almost everything you are saying in those two posts above is either just not true or at the very least extremely misleading.

Applying something like lossyWAV will significantly change the result of the Fourier Transform (or MDCT or whatever).

Not really. It will change it, but not significantly. Look if you don't believe, here are two frequency analyses taken from lossless and lossyWAV-preprocessed "Turn It On" from the latest Franz Ferdinand's album, from 0.08 seconds to 0.18 seconds (right when there's a strong and a bit clipped transient so lossyWAV actually does some serious business):

lossless
lossyWAV

Difference between the images (lossless - lossyWAV) (There is difference in the lower-left text because that text depends on the cursor location, obviously my cursor wasn't on the same location as I took the two screenshots.)

The difference is laughably insignificant. You won't actually make me write out those long, boring and tedious mathematical calculations I was mentioning earlier to believe me, will you? I'm a mathematician BTW, I know what I'm talking about here.

It's in the phases after that that the psymodel will decide that certain frequency bands can be encoded in a very noisy manner. So noisy, that the (fewer) noise added by lossyWAV will disappear into that. lossyWAV is able to determine where this will happen because it has a psymodel of its own.

This psymodel is important because it allows lossyWAV to detect how much bits can be safely removed on a per block basis. The need to switch this per block (which represents a very small segment of time) is clearly at odds with your erronous claims above. You would be able to see that even with the same noise floor the effect lossyWAV has is very much dependent on the signal itself (specifically, the tonality of it, which is one of the most important factors in a psymodel AND an mp3 encoder).

lossyWAV doesn't have a proper psymodel, at least it's completely unlike anything that MP3 does. All lossyWAV does is determine the noise floor in the given block of 512, 1024 or whatever samples. MP3 does a completely different thing. lossyWAV does it on per-block basis because the noise floor changes over time, MP3 does its thing on per-block basis because lots of things changes over time. You don't need a psymodel to determine the noise floor. Even if what lossyWAV uses could be called "psymodel", it still does a completely different thing with it than what MP3 does. You can use a knife to stab a person or to spread mayonnaise on your bread. Those two are not the same actions just because you used the same tool.

So, there is no "orthogonality" at all. If there were, preprocessing with lossyWAV would help but as you can see it clearly does not.

I'm sorry but you apparently don't know what orthogonality means then, or we're misunderstanding each other. Up and down is orthogonal to forward and backward. If I move just up or down ten metres, I haven't moved neither forward nor backward at all. I.e., changes in "up" and "down" don't induce any changes in "forward" and "backward", just like the changes lossyWAV makes don't induce almost any changes in the result of MP3 processing - that's why I said their processes are somewhat orthogonal.

edit: A clarification - lossyWAV won't affect the Fourier transform significantly precisely because it adds white noise (and not much white noise at that) to the signal. If it added one single frequency or just a couple of frequencies then the alteration would be significant.

lossywav for lossy codecs?

Reply #14
lossyWAV doesn't have a proper psymodel
True, it doesn't - but using different width spreading functions at different frequencies is certainly related to ear processing. After this stage (or its equivalent + masking) mp3 would look at the energy in each spectral band, while lossyWAV (currently) only cares about the lowest.

I'm glad you jumped in with the frequency analysis plots. I wouldn't have felt quite so comfortable telling Garf he was very wrong (since he's usually very right!).

Cheers,
David.

lossywav for lossy codecs?

Reply #15
Well if he's wrong it doesn't mean that he's an idiot, which is what most people tend to hear when someone says that they're wrong on the Internet. I don't think he's an idiot, after all I recall seeing bunch of his posts on this forum even though I myself don't have many. No one is always right. I hope someone will be there to correct me when I'm wrong next time, which should be at about tea-time.

Fourier transform "dissects" the signal into base frequencies. More precisely, it decomposes it into sine- and cosine-like oscillatory functions so that when you sum those oscillatory functions up you get the original signal. A change in the signal that would induce significant changes to the oscillatory functions that make up the signal would have to feature either a strong noise across multiple base-oscillatory frequencies, or a not necessarily strong noise but which is concentrated on one or few base-oscillatory frequencies. Since the removed bits are a couple of the least significant ones, that means that the added noise is relatively weak, and because it's white noise it means that it isn't concentrated on any specific frequency. In light of what I've said in the sentence before the previous one, that basically means that removing a couple of least significant bits from the signal will not significantly change the result of its Fourier transform.


edit: Also 2Bdecided it's nice to have a word with the guy who started the whole lossyWAV thing.

lossywav for lossy codecs?

Reply #16
removing a couple of least significant bits from the signal will not significantly change the result of its Fourier transform.
...or if it would (e.g. low level and/or tonal signal), lossyWAV won't do it.

(I know you know this, but just in case someone else reads and misinterprets...!)

Cheers,
David.

lossywav for lossy codecs?

Reply #17
removing a couple of least significant bits from the signal will not significantly change the result of its Fourier transform.
...or if it would (e.g. low level and/or tonal signal), lossyWAV won't do it.


It's a very important qualification (I would even say it negates the original statement and it's why I protested), and it's why you need the psymodel in lossyWAV.

I suspect we both agree on this point.

lossywav for lossy codecs?

Reply #18
In light of what I've said in the sentence before the previous one, that basically means that removing a couple of least significant bits from the signal will not significantly change the result of its Fourier transform.


It's going to add broadband noise all over the spectrum, with a strength related to how many bits you remove. If this noise is masked, you're fine. If it's not, you have problems. It will not be masked for tonal signals. In this case, the broadband noise that you added is a very significant change because it will be audible. It's not because it looks small on a graph that it's insignificant. I think we talked about that enough already.

lossywav for lossy codecs?

Reply #19
You don't need a psymodel to determine the noise floor.


This is just wrong. You do need a psymodel to know the allowable noise floor that still gives a perceptually identical result.

If you can do this without a psymodel, I'm dying to know how. It would be a publishable invention.

Quote
Even if what lossyWAV uses could be called "psymodel", it still does a completely different thing with it than what MP3 does. You can use a knife to stab a person or to spread mayonnaise on your bread. Those two are not the same actions just because you used the same tool.


Both psymodels are used to determine the allowable quantization noise to introduce. If we're doing lame analogies I would say that one is spreading butter and the other is spreading chocolate paste.

lossywav for lossy codecs?

Reply #20
I suspect we both agree on this point.

It's a good idea, but I am unsure though at the moment how precisely could lossyWAV benefit from a proper psymodel when basically all the freedom lossyWAV has got to change the bit values in a single block is one bit - the least significant one of those that remain non-zeroed, because all less significant end up being zeroed and all more significant shouldn't be changed because they're above the noise floor. In fact I have a feeling that anything other than rounding would just produce worse results.

Unless...

Here goes idea.

You find the noise floor (which is relatively close to just finding the weakest frequency band) in the block, but you also find out which frequency band is the strongest in that block. Then, you alternate rounding up and down the least significant above-the-noise-floor bit in that block so that it aligns with that frequency, of the strongest band. This might go wrong because now it would be as I have said in the previous post - weak noise, but concentrated in a single base-oscillatory frequency. But it might also turn out to be good, testing and tweaking should be done, because think about it - instead of adding the noise to all the frequency bands, you only add it to the strongest one, but since that one is the strongest it should be the least "damaged" by the added noise of all the other bands. Two things to note:

1) It should definitely be scaled depending on the difference between the value of the strongest band and the others (in other words, how big an outlier the strongest band is), maybe the resulting rounding should be a certain linear combination of the currently used pure white noise and the aimed-to-align-with-the-strongest-frequency rounding, by that I mean

bit_value = round(t*rounded + (1-t)*aimed_for_strongest_frequency_band), t ? [0,1]

How close t is to 0 or 1 would depend on how much the strongest frequency band outlies.

2) Care should be taken that where the adjacent blocks touch there's no unnecessary additional strange noise because of a sudden jump from one frequency to another.

lossywav for lossy codecs?

Reply #21
It's going to add broadband noise all over the spectrum, with a strength related to how many bits you remove. If this noise is masked, you're fine. If it's not, you have problems. It will not be masked for tonal signals.

Yes, that's kinda the reason why lossyWAV doesn't remove nearly any bits when there are too many silent frequency bands. I didn't mean that removing the least significant bits won't change the spectrum significantly no matter what, but that based on lossyWAV's good estimations of the noise floor when you remove the least significant bits below that noise floor the spectrum won't change significantly.

So yeah, it was all a misunderstanding.

lossywav for lossy codecs?

Reply #22
It's a good idea, but I am unsure though at the moment how precisely could lossyWAV benefit from a proper psymodel when basically all the freedom lossyWAV has got to change the bit values in a single block is one bit - the least significant one of those that remain non-zeroed, because all less significant end up being zeroed and all more significant shouldn't be changed because they're above the noise floor. In fact I have a feeling that anything other than rounding would just produce worse results.

You're a mathematician. If your DSP skills are good enough you should be able to figure out that/how it's possible to turn signal block x into y where the samples of y have k least significant bits that are zeros and where the difference, y-x, has any spectral shape you want. Hint: You also have to touch the other bits above the least significant bits as well. doccolinni, meet noise shaping. noise shaping, meet doccolinni.

This is how a noise shaping quantizer usually looks like:



In case of LossyWav the noise that's added should be always "below the signal". This keeps it predictable and lets us get away without any dithering.

So... a "proper psymodel" can be very useful, here.

Unless ...

You find the noise floor (which is relatively close to just finding the weakest frequency band) in the block

What you call "noise floor" is actually not that interesting. At least not to me. What's interesting is the masking threshold.

The only difference between MP3 and what LossyWav+Flac could be in terms of reducing data rate by introducing noise is that the noise can only be controlled in steps of 6 dB. That's it. You can shape noise in time (by selecting different word lengths in different blocks) and you can shape the noise spectrally (by using noise shaping filters). As for the noise-less part of coding: Flac decorrelates the signal via linear prediction whereas MP3/AAC/etc do it with their filterbanks.

Cheers,
SG

lossywav for lossy codecs?

Reply #23
It's a good idea, but I am unsure though at the moment how precisely could lossyWAV benefit from a proper psymodel when basically all the freedom lossyWAV has got to change the bit values in a single block is one bit - the least significant one of those that remain non-zeroed, because all less significant end up being zeroed and all more significant shouldn't be changed because they're above the noise floor. In fact I have a feeling that anything other than rounding would just produce worse results.

You're a mathmatician. If your DSP skills are good enough you should be able to figure out how it's possible to turn signal block x into y where the samples of y have k least significant bits that are zeros and where the difference, y-x, has any spectral shape you want. Hint: You also have to touch the other bits above the least significant bits as well. doccolinni, meet noise shaping. noise shaping, meet doccolinni.

So... a "proper psymodel" can be very useful, here.

Cheers,
SG

Thanks for the sarcasm, I know what noise shaping is, but then you've basically made the bits that were calculated to be above the noise floor noisy, and FLAC again has to encode noise - yes, you've helped it by reducing the number of non-zero bits, but you've also made its job just as complicated as it was before by making additional bits noisy. The whole point of below-the-noise-floor-bit-zeroing was to remove the bits that are noisy, and now when you've done that you want to also make more bits (that were previously not noise) noisy. A good related expression is "you're sawing the branch on which you're sitting."

edit: Oh, you mean shaping the bits below the noise floor? Ah, that's a different story then... But then less LSBs are going to be zero. It's all a bunch of tradeoffs that have to be weighed by testing, we can speculate on what would work better and what wouldn't ad infinitum.

 

lossywav for lossy codecs?

Reply #24
So much discussion for no real idea!

LossyWAV's success is based on 2Bdecided's observation that lossless codecs like FLAC or TAK can take advantage of a certain number of least significant bits being zero within all the samples of a block. So lossyWAV rounds the least significant bits in a block to zero in a way controlled by kind of a simple robust psy model. That's the mechanism.

This mechanism is irrelevant when exchanging the lossless codec against a lossy one. Without a specific positive idea about how lossyWAV could help also with a lossy codec lossyWAV's output can't be considered anything else but original signal + noise, something Nick.C wrote in one of the first posts here. No improvement is expected for a lossy codec when adding noise to the signal.
The question why lossyWAV doesn't work with a lossy codec isn't a useful question anyway. It should be the other way around: there should be an attractive idea why it should work.
Of course it's not wrong trying, but when trying and finding that the approach doesn't work, and having no idea why it should work, there's no much sense in continuing the approach. Just theoretical reasoning whether an idea works or not and theoretical reasoning about why it doesn't is a bit strange. Strange enough the OP didn't even try his idea in a simple little real world experiment before posting.
lame3995o -Q1.7 --lowpass 17