hellokeith
Jul 20 2009, 02:08
I was wondering if there has been any work done on lossy audio encoding that is NOT based on perceptual encoding.?
I don't know exactly a situation where you would need to deliver compressed audio w/o regard to its (humanly-perceived) quality. Maybe something to do with sonar, whales, or general atmospheric research?
You're asking about something that makes no sense. The point of lossy encoding is to throw out the parts that humans cannot hear. Lossy is intrinsically aimed at some target audience. Without a target audience, you cannot begin to make intelligent decisions about what data should be thrown out. Therefore, it makes no sense to consider lossy audio that has no target audience.
Bit shaving combined with lossless compression would count as "lossy encoding NOT based on perceptual encoding", and is often discussed here.
lossyWAV is but one example. See link for discussion not only of the implementation itself, but also the considerations made, pros/cons, and the rationale. Far from making no sense, IMHO.
Woodinville
Jul 20 2009, 07:41
Stock ADPCM (remember that?) is a non-perceptual lossy algorithm.
Sebastian Mares
Jul 20 2009, 08:12
Isn't WavPack lossy also a non-perceptual lossy encoder?
muaddib
Jul 20 2009, 08:51
ADPCM and WavPack Lossy could be considered as lossy encoders with very simple perceptual algorithm, because they still reduce size based on assumption that humans will accept the difference to the original if they hear any difference at all.
odyssey
Jul 20 2009, 09:29
QUOTE (muaddib @ Jul 20 2009, 09:51)

ADPCM and WavPack Lossy could be considered as lossy encoders with very simple perceptual algorithm, because they still reduce size based on assumption that humans will accept the difference to the original if they hear any difference at all.
I'd say the same for LossyWAV given the discussions that has been regarding finetuning?
QUOTE (odyssey @ Jul 20 2009, 04:29)

QUOTE (muaddib @ Jul 20 2009, 09:51)

ADPCM and WavPack Lossy could be considered as lossy encoders with very simple perceptual algorithm, because they still reduce size based on assumption that humans will accept the difference to the original if they hear any difference at all.
I'd say the same for LossyWAV given the discussions that has been regarding finetuning?
Clearly LossyWAV can be used in a perceptual manner, and IIRC the default is to shave the bitdepth only to the edge of audability.
Its fundamental technique (of decreasing bitdepth), though, is not dependent on human audio perception - unlike any lossy encoder which tosses masked frequencies and the like.
QUOTE (Soap @ Jul 20 2009, 05:11)

Clearly LossyWAV can be used in a perceptual manner, and IIRC the default is to shave the bitdepth only to the edge of audability. Its fundamental technique (of decreasing bitdepth), though, is not dependent on human audio perception - unlike any lossy encoder which tosses masked frequencies and the like.
Not true - the fundamental parameters of LossyWAV's operation (the widths of the spectrum analysis windows and the per-band masking estimation) are purely psychoacoustic in derivation. (One could imagine aliens with radically different cochlear structure, perhaps with a much coarser/finer cilia density or whatnot, for which the FFT window sizes used in LossyWAV would be unoptimal.)
The key point about LossyWAV is that it makes far, far fewer assertions about the nature of human hearing compared to other lossy encoding schemes - so that we can presumably be more certain about its transparency, for all possible inputs, than many other codecs.
shadowking
Jul 20 2009, 12:38
Current wavpack lossy uses auto noise shaping and mid-side switching so there is a basic model there. That it often reaches near transparency at 250k or even lower is a testiment to that.
QUOTE (Axon @ Jul 20 2009, 06:31)

Not true - the fundamental parameters of LossyWAV's operation (the widths of the spectrum analysis windows and the per-band masking estimation) are purely psychoacoustic in derivation. (One could imagine aliens with radically different cochlear structure, perhaps with a much coarser/finer cilia density or whatnot, for which the FFT window sizes used in LossyWAV would be unoptimal.)
The key point about LossyWAV is that it makes far, far fewer assertions about the nature of human hearing compared to other lossy encoding schemes - so that we can presumably be more certain about its transparency, for all possible inputs, than many other codecs.
I'm not sure what statement of mine you are disagreeing with. I stated that LossyWAV uses a basic perceptual model to determine audibility (though I spelled audibility wrong in that post), but that the basic technique in which LossyWAV throws away data is not itself based on perceptual encoding.
Shaving bitdepth and increasing the noise floor is a technique not dependent on an understanding of human hearing, and as such has little in common with most perceptual encoding techniques which exploit "holes" in human hearing.
shadowking
Jul 20 2009, 13:25
They are not perceptual according to Florin Ghido:
Disadvantages of OptimFROG DualStream compared with transform coders
(TC), such as MPC, OGG, MP3, AAC, WMA etc.:
- as it does not take into account the human auditory system
limitations, it needs a much higher bitrate (up to twice) to achieve
perceptual transparency. However, together with reaching perceptual
transparency, many other important audio qualities are preserved
http://www.losslessaudio.org/DualStream.php
Mike Giacomelli
Jul 20 2009, 13:51
In a sense even optimizing RMS error is perceptual in that it assumes people are less likely to hear smaller errors

More often a codec is considered perceptual if its taking advantage of the specific time/bandwidth limits of human hearing or some other effect thats specific to how the ear works and not generally applicable to most systems (i.e. higher SNR == better).
QUOTE (Soap @ Jul 20 2009, 04:43)

Shaving bitdepth and increasing the noise floor is a technique not dependent on an understanding of human hearing, and as such has little in common with most perceptual encoding techniques which exploit "holes" in human hearing.
No, but choosing what data to throw away
is still based on human hearing. That's the point I'm making. The codecs that simply increase the noise floor without taking into account human hearing are not based on perceptual encoding, but they're roughly equivalent to simply decreasing bit-depth anyhow.
Edit: Put another way, is 16-bit PCM lossy? It's certainly lossy when starting with 24-bit sources. And 4-bit ADPCM is lossy when starting with just about any source. But are these "lossy codecs"? That seems to be the core of this discussion. What is a lossy codec, precisely? What is perceptual encoding? How far are we willing to stretch these terms?
I acknowledge that I'm probably wrong with respect to WavPack lossy and similar. However, that loss is basically equivalent to simply decreasing the bit-depth. Are we considering that to be "lossy encoding"? That makes sense to me in one regard, but by that same reasoning, it would make 16-bit PCM necessarily a "lossy codec" in certain frames of reference.
Wait. You're telling me that shaving bitdepth is roughly equivalent to simply decreasing bit-depth?
QUOTE (shadowking @ Jul 20 2009, 14:25)

They are not perceptual according to Florin Ghido:
QUOTE
[...]
Advantages of OptimFROG DualStream compared to WavPack hybrid:
- true separate quantization levels for each channel
- quality mode maintains constant quality thorough the whole file
- advanced noise shaping option, improving transparency
[...]
They most certainly are. How can noise shaping not be a perceptual technique?
QUOTE (Soap @ Jul 20 2009, 07:59)

Wait. You're telling me that shaving bitdepth is roughly equivalent to simply decreasing bit-depth?
Yup. By increasing the noise floor, you decrease the amount of information needed to reconstruct the signal in a manner precisely analogous to decreasing bit-depth. Unless I've got some massive misunderstanding going on...
QUOTE (Soap @ Jul 20 2009, 13:43)

I'm not sure what statement of mine you are disagreeing with. I stated that LossyWAV uses a basic perceptual model to determine audibility (though I spelled audibility wrong in that post), but that the basic technique in which LossyWAV throws away data is not itself based on perceptual encoding.
Shaving bitdepth and increasing the noise floor is a technique not dependent on an understanding of human hearing, and as such has little in common with most perceptual encoding techniques which exploit "holes" in human hearing.
You really seem totally confused.
What do you think MP3 encoders do beside determine audibility and shave off bits where they can afford to?
What do you think a perceptual model does besides finding "holes in the human hearing"?
QUOTE (Garf @ Jul 20 2009, 11:06)

What do you think MP3 encoders do beside determine audibility and shave off bits where they can afford to?
What do you think a perceptual model does besides finding "holes in the human hearing"?
I'm the one confused?
I don't know where I went wrong with my explanation - but clearly I did, for you are the third to tell me I'm wrong and then basically state what I've already stated.
I said bit
depth, not rate (though we can argue all day that is a distinction w/o a difference).
LossyWAV is an example of decreasing bitdepth (increasing noise), not tossing content which the human system has trouble hearing (unless you want to argue low-amplitude sounds are just that) due to masking and that ilk. This technique is not judging the audibility of transients, masked frequencies, etc.
The fact LossyWAV
does attempt (successfully) to determine audibility is something I mentioned in my first post. The point was, though, that LossyWAV is an example of decreasing bitdepth, which is an easy technique and can be applied blindly, can be considered a lossy encoding style, and need not be based on human perception.
QUOTE (Soap @ Jul 20 2009, 17:19)

LossyWAV is an example of decreasing bitdepth (increasing noise), not tossing content which the human system has trouble hearing (unless you want to argue low-amplitude sounds are just that) due to masking and that ilk.
But these really are the same thing, which is why I asked you those 2 questions.
People think the MP3 encoders "remove sounds", but they do no such thing. What they really do is just add noise (by reducing bitdepth/rate to code each band!). They use a perceptual model to know where they can do this so it affects audibility the least.
There really is no difference at all between the two.
shadowking
Jul 20 2009, 16:49
So mp3 and co use a much more aggressive method of adding noise to reduce bitrate as the psymodel is more advanced . I think thats the way i understand it.
They lowpass, they do all sorts of tricks (which their model tells them are unlikely to be audible) to improve compression efficiency. They do not simply decrease precision.
Across the board decreasing the bitdepth isn't the same game at all. There are similarities, but the differences are so great that comparisons are invalid IMHO.
And as I stated from my first post (I'll work on my clarity), LossyWAV attempts to decrease bitdepth with applied intelligence (a basic psymodel if you will), but decreasing bitdepth ("bitshaving") need not be done with the aid of a model.
SebastianG
Jul 20 2009, 17:18
QUOTE (shadowking @ Jul 20 2009, 16:49)

So mp3 and co use a much more aggressive method of adding noise to reduce bitrate as the psymodel is more advanced . I think thats the way i understand it.
Exactly. Plus a little more.
This "adding noise by quantization" works if the signal is "self-dithering". In transform codecs you also have nonlinear quantization artefacts for very low signal-to-noise ratios (for "nonlinear artefacts" read "not really noise anymore"). This is where the "metallic/watery" sounds come from. Also, the really high frequencies are usually chopped off ("quantized to zero" so to speak).
Finally you can go further into the parametric coding direction (perceptual noise substitution, intensity stereo, ...)
Still, the "adding noise by quantization"-idea (reduced precision) is the most important one that is involved, Soap. Pretty much anything else that is part of such a codec (filterbank, huffman coding, etc) I would classify as "noiseless" building blocks.
Cheers!
SG
rpp3po
Jul 20 2009, 17:41
Every AD conversion is already a lossy process. You choose quantization word length and rate according to your needs. There are plenty of use cases, where those parameters are not chosen based on perceptual models. Take two sonars, A with a very long range but coarse, B with a short range but very precise. If both systems are able to process about the same data rate, you'd choose a higher sensitivity and word length for A and a higher temporal resolution for B while sacrificing the other.
Woodinville
Jul 20 2009, 17:42
QUOTE (SebastianG @ Jul 20 2009, 09:18)

QUOTE (shadowking @ Jul 20 2009, 16:49)

So mp3 and co use a much more aggressive method of adding noise to reduce bitrate as the psymodel is more advanced . I think thats the way i understand it.
Exactly. Plus a little more.
This "adding noise by quantization" works if the signal is "self-dithering". In transform codecs you also have nonlinear quantization artefacts for very low signal-to-noise ratios (for "nonlinear artefacts" read "not really noise anymore"). This is where the "metallic/watery" sounds come from. Also, the really high frequencies are usually chopped off ("quantized to zero" so to speak).
Finally you can go further into the parametric coding direction (perceptual noise substitution, intensity stereo, ...)
Still, the "adding noise by quantization"-idea (reduced precision) is the most important one that is involved, Soap. Pretty much anything else that is part of such a codec (filterbank, huffman coding, etc) I would classify as "noiseless" building blocks.
Cheers!
SG
Well, that's a bit off, I think. MP3 and the usual run of perceptual encoders change the bit resolution in a filterbank representation based on a perceptual model crossed with the bit rate requirements.
Dithering in the quantizer is not to be desired, it would introduce more noise at high frequencies. Controlling watery/jingling sounds comes about more from making sure that you don't have lines popping on and off, on and off, across sucessive blocks.
Some other non-perceptual methods (by the definition I use in my tutorials, i.e. those that have no explicit perceptual model, as opposed to some passive accomodation) are most any ADPCM, but not including the word done at Lucent that did perceptual noise shaping, G722, most CELP for voice (postfiltering is a kinda-sorta-perceptual technique), APCM (does anyone really use it?), delta-mod of various kinds (cvsd, etc), zip archiving, Unix compress ...
All non-perceptual at the heart, they don't actively change noise injection according to an explicit perceptual model.
Woodinville
Jul 20 2009, 17:45
QUOTE (Garf @ Jul 20 2009, 08:39)

QUOTE (Soap @ Jul 20 2009, 17:19)

LossyWAV is an example of decreasing bitdepth (increasing noise), not tossing content which the human system has trouble hearing (unless you want to argue low-amplitude sounds are just that) due to masking and that ilk.
But these really are the same thing, which is why I asked you those 2 questions.
People think the MP3 encoders "remove sounds", but they do no such thing. What they really do is just add noise (by reducing bitdepth/rate to code each band!). They use a perceptual model to know where they can do this so it affects audibility the least.
There really is no difference at all between the two.
Well, in that many lines in the spectrum are reduced to zero bits, and thereby zeroed in the spectrum, this is a way of removing parts of the signal. What you're realy doing is removing INFORMATION that isn't audible, by injuecting noise through the process of using large quantizer step sizes.
And rate loops, by and large, are not strictly speaking "bit allocation", rather they allocate noise and count the bits that result.
QUOTE (Canar @ Jul 19 2009, 18:40)

You're asking about something that makes no sense. The point of lossy encoding is to throw out the parts that humans cannot hear. Lossy is intrinsically aimed at some target audience. Without a target audience, you cannot begin to make intelligent decisions about what data should be thrown out. Therefore, it makes no sense to consider lossy audio that has no target audience.
Sure it makes sense. Lossless compressors decrease size by taking advantage of the redundancy in the data. There is no model of perception there (although there is a degree of modelling the content).
You could take a lossless codec like FLAC and do something like add rate distortion optimization on the coded residue without using a psychoacoustic metric for distortion and instead just optimizing for the best SNR.
I'm not sure *why* you'd want this... music for dolphins perhaps.

But it's not an illogical question.
SebastianG
Jul 20 2009, 20:26
QUOTE (Woodinville @ Jul 20 2009, 17:42)

Well, that's a bit off, I think. MP3 and the usual run of perceptual encoders change the bit resolution in a filterbank representation based on a perceptual model crossed with the bit rate requirements.
I focused on the parts that are lossy. Of course, this is done in some domain that is favorable w.r.t. energy compaction (or "diagonalization") and allows clever distribution of the noise.
QUOTE (Woodinville @ Jul 20 2009, 17:42)

Dithering in the quantizer is not to be desired, it would introduce more noise at high frequencies. Controlling watery/jingling sounds comes about more from making sure that you don't have lines popping on and off, on and off, across sucessive blocks.
I think you should reconsider this. I actually expect subtractive dithering to
increase the
perceived quality-per-bit ratio when applied correctly and selectivly. You could use subtractive dithering to blend smoothly between low-SNR-quantization and PNS for noisy-type signal parts. If it's supposed to sound noisy I want my artefacts to be noisy as well and not just some "lines popping on and off". Those weird high frequency artefacts can be really annoying...
QUOTE (Woodinville @ Jul 20 2009, 17:42)

Some other non-perceptual methods (by the definition I use in my tutorials, i.e. those that have no explicit perceptual model, as opposed to some passive accomodation) are most any ADPCM, but not including the word done at Lucent that did perceptual noise shaping, G722, most CELP for voice (postfiltering is a kinda-sorta-perceptual technique), [...]
I think most CELP coders come with a code book search that applies perceptual weighting -- albeit a very simple one (that is possibly based on the LPC filter). I guess this is somewhere in the gray area between "perceptual" and what you call "passive accomodation". :-)
Cheers!
SG
Woodinville
Jul 20 2009, 21:38
QUOTE (SebastianG @ Jul 20 2009, 12:26)

I think you should reconsider this. I actually expect subtractive dithering to increase the perceived quality-per-bit ratio when applied correctly and selectivly.
It will raise the bit rate of the regiions of the spectrum that are being subtractively dithered. This is not necessarily what you want or need in order to get the best result.
As to the fixed-Q noise shaping in some CELPS, I suppose youc ould cal it an attmept. It certanily imoroved quality.
QUOTE (NullC @ Jul 20 2009, 11:52)

I'm not sure *why* you'd want this... music for dolphins perhaps.
That's the point I'm getting at. Unless you have a reason to do it, what sense does doing it make?
SebastianG
Jul 21 2009, 19:10
QUOTE (Woodinville @ Jul 20 2009, 21:38)

QUOTE (SebastianG @ Jul 20 2009, 12:26)

I think you should reconsider this. I actually expect subtractive dithering to increase the perceived quality-per-bit ratio when applied correctly and selectivly.
It will raise the bit rate of the regiions of the spectrum that are being subtractively dithered.
I think you're missing something. The idea of subtractive dithering is to use the same dither signal on both sides (encoder and decoder). This has two implications: Since the noise is subtracted again at the decoder side "it is not part of" the overall error (--> higher SNR compared to additive dithering using the same linear quantizer). Also, it allows you to account for shifted PDFs the noiseless coding stage has to deal with. Actually, selecting code books (same quantizer step size, different offsets) in a pseudo-random fashion is equivalent to subtractive dithering.
Cheers!
SG
Woodinville
Jul 21 2009, 19:30
QUOTE (SebastianG @ Jul 21 2009, 11:10)

QUOTE (Woodinville @ Jul 20 2009, 21:38)

QUOTE (SebastianG @ Jul 20 2009, 12:26)

I think you should reconsider this. I actually expect subtractive dithering to increase the perceived quality-per-bit ratio when applied correctly and selectivly.
It will raise the bit rate of the regiions of the spectrum that are being subtractively dithered.
I think you're missing something. The idea of subtractive dithering is to use the same dither signal on both sides (encoder and decoder). This has two implications: Since the noise is subtracted again at the decoder side "it is not part of" the overall error (--> higher SNR compared to additive dithering using the same linear quantizer). Also, it allows you to account for shifted PDFs the noiseless coding stage has to deal with. Actually, selecting code books (same quantizer step size, different offsets) in a pseudo-random fashion is equivalent to subtractive dithering.
Cheers!
SG
Yes, I undestand subtractive dither. HOWEVER, it will mean that there are more quantized values that are non-zero, and in doing so, will raise the BIT RATE. Not the noise level, the BIT RATE, and ergo raise the noise level elsewhere because you'll have to back down on bits elsewhere.
The problem is not that it adds noise, but that it adds BIT RATE. Consider, if I have something that is 7/8 zeros, and I dither, I'm going to be back to 1 bit per line vs. .2 bits per line. At high frequencies, that is a bleepin' lot of bits to lose at low rates.
C.R.Helmrich
Jul 21 2009, 23:09
QUOTE (Woodinville @ Jul 21 2009, 20:30)

Yes, I undestand subtractive dither. HOWEVER, it will mean that there are more quantized values that are non-zero, and in doing so, will raise the BIT RATE. Not the noise level, the BIT RATE, and ergo raise the noise level elsewhere because you'll have to back down on bits elsewhere.
The problem is not that it adds noise, but that it adds BIT RATE. Consider, if I have something that is 7/8 zeros, and I dither, I'm going to be back to 1 bit per line vs. .2 bits per line. At high frequencies, that is a bleepin' lot of bits to lose at low rates.
Agreed. I recently took a look at the number of MDCT lines which are not quantized to zero in AAC, and even at relatively high bit rates (in terms of bits per sample), the number is surprisingly low. I would say that for certain bit rates, almost 90% of the high-frequency part of the spectrum is quantized to zero. Please correct me if I'm wrong, but I think that, depending on the PDF of the dither, dithering would lower this to 50%.
Going back to earlier posts: I never became friends with the (apparently common) notion that transform coders add noise which is inaudible. I also prefer calling it "removal of spectral and temporal acoustic information and trying to conceal this removal from the human listener".
Time-domain coders, on the other hand, usually introduce wide-band noise which they then try to hide psychacoustically by means of spectral (and temporal) noise shaping. That's also what LossyWAV does.
Chris
hellokeith
Jul 22 2009, 02:06
QUOTE (Woodinville @ Jul 20 2009, 11:45)

Well, in that many lines in the spectrum are reduced to zero bits, and thereby zeroed in the spectrum, this is a way of removing parts of the signal. What you're realy doing is removing INFORMATION that isn't audible, by injecting noise through the process of using large quantizer step sizes.
I still don't really understand the difference between noise-shaping and dither, which
I think is related to what you are saying, but is "adding noise" a function/goal, or is just a bi-product of removing information? It seems counter-productive to add bit rate in order to mask some destructive operation you have done on the audio data..
knutinh
Jul 22 2009, 16:52
The last page seems to be focused on nitty gritty details.
To reply the thread starter:
Isnt it kind of obvious that if you have a lossy encoding, meaning that errors are introduced to the signal in order to save bits, then any implementation will have some error-characteristic, and that characteristic is going to be the result of some implicit or explicit model of the application that is to use the decoded data (such as a human listener).
Now, the implicit/explicit model may be good or bad, leading to better or worse compromises between rate and perceptual distortion. Even an analog tape recorder with no Dolby noise reduction has some perceptually sensible use of the physics of the magnetic media.
-k
Woodinville
Jul 23 2009, 17:05
QUOTE (hellokeith @ Jul 21 2009, 18:06)

QUOTE (Woodinville @ Jul 20 2009, 11:45)

Well, in that many lines in the spectrum are reduced to zero bits, and thereby zeroed in the spectrum, this is a way of removing parts of the signal. What you're realy doing is removing INFORMATION that isn't audible, by injecting noise through the process of using large quantizer step sizes.
I still don't really understand the difference between noise-shaping and dither, which
I think is related to what you are saying, but is "adding noise" a function/goal, or is just a bi-product of removing information? It seems counter-productive to add bit rate in order to mask some destructive operation you have done on the audio data..
Well, in the classical sense I'm talking about neither. You could think of what a perceptual coder does as signal-dependent noise shaping, but that's a gross oversimplification.
Woodinville
Jul 23 2009, 17:07
QUOTE (knutinh @ Jul 22 2009, 08:52)

The last page seems to be focused on nitty gritty details.
To reply the thread starter:
Isnt it kind of obvious that if you have a lossy encoding, meaning that errors are introduced to the signal in order to save bits, then any implementation will have some error-characteristic, and that characteristic is going to be the result of some implicit or explicit model of the application that is to use the decoded data (such as a human listener).
Now, the implicit/explicit model may be good or bad, leading to better or worse compromises between rate and perceptual distortion. Even an analog tape recorder with no Dolby noise reduction has some perceptually sensible use of the physics of the magnetic media.
-k
Well, as I said, I tend to reserve the term "perceptual encoder" for encoders that have a specific, explicit perceptual model.
This is to separate them from LMS coders, that may have some fixed or signal-dependent noise shaping based on rule, or from homomorphic models.
hellokeith
Jul 24 2009, 02:51
QUOTE (knutinh @ Jul 22 2009, 10:52)

To reply the thread starter:
Isnt it kind of obvious that if you have a lossy encoding, meaning that errors are introduced to the signal in order to save bits, then any implementation will have some error-characteristic, and that characteristic is going to be the result of some implicit or explicit model of the application that is to use the decoded data (such as a human listener).
For example sake, say you are using a microphone to measure/record air pressure and atmospheric qualities. That analog-to-digital data must be stored, and to be stored it must be encoded. A human ear may never even listen to that data, as it may go into some processing application. So if a lossy encoding scheme is employed in this case, it would be difficult in my mind to describe it as perceptual.
I didn't really want to get knee-deep in examples, because I don't feel I'm knowledgeable enough on digital audio to properly explain enough examples to encompass my original query, but hopefully you get the gist.
bryant
Jul 24 2009, 04:02
What you are talking about makes perfect sense to me. There are many types of sampled analog data that have nothing to do with human audio perception. There is your example of sounds which are basically too low in frequency to hear, or seismic data (which are basically sounds in the ground), or even non-audio data like EKGs. I'm sure there are also cases where perfectly audible sounds are being used for some purpose other than listening (like scientific study or process control) where regular perceptual encoders might destroy important (but likely inaudible) details.
For data like this an obvious first choice would be lossless encoders, and I have been approached about using WavPack for a whole host of non-audio applications (especially because WavPack can handle floating point data). But I'm sure there are also situations in which non-perceptual lossy encoding would be appropriate, but the characteristics of the signal and the relevant information would have to be taken into account before knowing what would work.
2Bdecided
Jul 24 2009, 10:10
I think that last line hits the point.
Any lossy encoder works on the basis that a smaller difference is better - the question is what domain you measure that difference in - or how you calculate it.
The calculation may have a little perceptual relevance, a lot, or none at all. In truth it's quite hard to have none at all - once you looking in a log, power, or floating point domain, you could argue it's got some basis on perception - though you could also argue that the world is logarithmic, and linear calculations are somewhat artificial / unnatural. (That argument could get quite philosiphical!).
So on this basis, the line between perceptual and non-perceptual becomes quite fuzzy.
Much clearer is to look at the various limitations of human hearing, and to say in what way, if at all, a given codec exploits them. If there is a cut off, it's between the codecs that (at least) dynamically spectrally shape the coding noise related to some estimate of short-term spectral masking, and those that don't.
That's what I'd call a perceptual codec, but it's probably abusing the term. I can't think of a simple term that properly nails this distinction though!
mp2, mp3, aac, vorbis, musepack etc are "codecs that dynamically spectrally shape the coding noise related to some estimate of short-term spectral masking".
lossyWAV (at present), NICAM, WavPack (most / ?all? lossy modes), ADPCM, a-law, u-law etc are not "codecs that dynamically spectrally shape the coding noise related to some estimate of short-term spectral masking"
Some of those codecs in the second list have things like log or floating point representation of sample values, fixed noise shaping related to hearing threshold, and even something approaching a spectral masking calculation - these are perceptually related tricks - but the codecs don't dynamically spectrally shape the coding noise.
It's one useful distinction.
Cheers,
David.
SebastianG
Jul 26 2009, 17:42
QUOTE (Woodinville @ Jul 21 2009, 19:30)

QUOTE (SebastianG @ Jul 21 2009, 11:10)

I think you're missing something. The idea of subtractive dithering is to use the same dither signal on both sides (encoder and decoder). This has two implications: Since the noise is subtracted again at the decoder side "it is not part of" the overall error (--> higher SNR compared to additive dithering using the same linear quantizer). Also, it allows you to account for shifted PDFs the noiseless coding stage has to deal with. Actually, selecting code books (same quantizer step size, different offsets) in a pseudo-random fashion is equivalent to subtractive dithering.
Yes, I undestand subtractive dither. HOWEVER, it will mean that there are more quantized values that are non-zero, and in doing so, will raise the BIT RATE.
I think you missed my point. I'm
not considering an entropy coder that
ignores the known dither signal. The enctropy coder I would use
accounts for the known dither signal.
QUOTE (Woodinville @ Jul 21 2009, 19:30)

The problem is not that it adds noise, but that it adds BIT RATE. Consider, if I have something that is 7/8 zeros, and I dither, I'm going to be back to 1 bit per line vs. .2 bits per line. At high frequencies, that is a bleepin' lot of bits to lose at low rates.
Let's assume we have a memoryless source which emits symbols from a 3-symbols alphabet, "zero", "pulsone", and "minusone" with probabilities 7/8, 1/16, 1/16. The entropy is 0.669 bits. That's much more than 0.2 bits.
Also, this is a good example of what I was talking about w.r.t. sound of the artefacts and "linearity". If it is supposed to sound noisy and we don't have "enough bitrate" the undithered mid-tread scalar quantizer will sound awful.
Cheers!
SG
Woodinville
Jul 27 2009, 02:59
QUOTE (SebastianG @ Jul 26 2009, 09:42)

Let's assume we have a memoryless source which emits symbols from a 3-symbols alphabet, "zero", "pulsone", and "minusone" with probabilities 7/8, 1/16, 1/16. The entropy is 0.669 bits. That's much more than 0.2 bits.
Also, this is a good example of what I was talking about w.r.t. sound of the artefacts and "linearity". If it is supposed to sound noisy and we don't have "enough bitrate" the undithered mid-tread scalar quantizer will sound awful.
Cheers!
SG
The problem with your point is that if you capture more of the signal, you have more entropy to code, no matter how you cut it.
As to the mid-tread quantizer sounding awful, that's what all the best coders use.
SebastianG
Jul 27 2009, 08:06
QUOTE (Woodinville @ Jul 27 2009, 02:59)

The problem with your point is that if you capture more of the signal, you have more entropy to code, no matter how you cut it.
What do you mean by "capture more of the signal"? Please explain.
QUOTE (Woodinville @ Jul 27 2009, 02:59)

As to the mid-tread quantizer sounding awful, that's what all the best coders use.
OK. So, that's proof that you can't do better, right?
Check out JPEG2000 part 2, specifically the trellis-coded quantization part. Check out "CELT" (Jean-Marc Valin's new low-delay full-bandwidth speech coder).
Cheers!
SG
Woodinville
Jul 27 2009, 17:30
QUOTE (SebastianG @ Jul 27 2009, 00:06)

QUOTE (Woodinville @ Jul 27 2009, 02:59)

The problem with your point is that if you capture more of the signal, you have more entropy to code, no matter how you cut it.
What do you mean by "capture more of the signal"? Please explain.
Consider, when you capture the under .5 step size information, you are adding information to the bitstream.
Yes. Really.
QUOTE
QUOTE (Woodinville @ Jul 27 2009, 02:59)

As to the mid-tread quantizer sounding awful, that's what all the best coders use.
OK. So, that's proof that you can't do better, right?
Check out JPEG2000 part 2, specifically the trellis-coded quantization part. Check out "CELT" (Jean-Marc Valin's new low-delay full-bandwidth speech coder).
Cheers!
SG
Well, the issue really is "how compressible" (in the noiseless sense) is the data. There is a tradeoff, and the results to date seem pretty clear for audio.
This is a "lo-fi" version of our main content. To view the full version with more information, formatting and images, please
click here.