Help - Search - Members - Calendar
Full Version: Lossy-based lossless coding
Hydrogenaudio Forums > Hydrogenaudio Forum > Scientific Discussion
Efenstor
I've got a rather stupid idea: did anybody try to use a wavelet/DCT/FFT-based compression to make a "point of reference" for lossless compression?

I mean, you compress high frequencies using MP3-like but NOT psychoacoustic coding, then decode the frame, find differences with original uncompressed frame and compress those differences using some strong compression together with that lossy frame.

I think in any case such compression would be much more effective than La/Monkey's Audio/Optimfrog.

Or maybe those encoders are using it already?
JohnV
Wavpack hybrid compression can do this.
There's not any gain in efficiency though.
Efenstor
No gain? It's impossible! For wave data frequency representation is much more efficient than byte-by-byte. Probably they used psychoacoustic-based approach in their hybrid compression but it should be not human-ear- but compression-effectiveness-oriented! For example: compress the range of mostly low and mid-range frequencies, they can be compressed using byte-by-byte methods well enough, use DCT or wavelets mostly for high frequencies.

It must give some gain anyway! I compared files compressed using MP3 128 kbps with original using WaveLab's comparison function: differences are rather subtle (at least for 16-bit data).

And what if to use the multi-stage approach: differences are also compressed using wavelets, then unpacked and compared with the original differences, then those differences are compressed using Huffman method (with delta-compression) or/and maybe with some dictionary-based method and etc.
robUx4
I think this should be investigated more. Imagine HE-AAC with the difference losslessly encoded. I think it's possible to gain efficiency compared to usual lossless codec.
Garf
QUOTE(Efenstor @ May 15 2004, 06:52 PM)
No gain? It's impossible! For wave data frequency representation is much more efficient than byte-by-byte.

Not if it has to decode back identically to the 'byte-by-byte' version.
Garf
QUOTE(Efenstor @ May 15 2004, 06:52 PM)
It must give some gain anyway! I compared files compressed using MP3 128 kbps with original using WaveLab's comparison function: differences are rather subtle (at least for 16-bit data).

Try to losslessy encode the difference and look at that bitrate. It'll be surprisingly high. Now add the mp3's bitrate.

Compare to plain lossless.
Efenstor
It is becuase psychoacoustic-based lossy compression has been used! It should be optimized to simplify compression of the difference data! It should be not psychoacoustic, it's compressionacoustic.

The major idea of good compression is to predict what byte will be next and use it as the point of reference to reduce data which corrects the predicted value. I think that "bad quality" audio can be used as a point of reference very efficiently.

Anyway audio compression is not my profession, I'm a desktop programmer (mostly image processing and file utilities), I think someone who made some lossless codec would experiment a bit with this approach.

Interesting, does any lossless codec use a dictionary-based compression (i.e. like WinRAR or WinAce)?
Garf
QUOTE(Efenstor @ May 15 2004, 07:38 PM)
I think that "bad quality" audio can be used as a point of reference very efficiently.

Why would it be?
Efenstor
Because it reduces the range of difference and, if it's compressionacoustic, makes the difference more smooth and thus easily compressed.

Sure, noise is unpredictable, but is there so much loud white noise?
Efenstor
I dunno, maybe it was the stupid idea indeed. But noone tried. Maybe there something is yet.
SebastianG
QUOTE(Efenstor @ May 15 2004, 09:57 AM)
I dunno, maybe it was the stupid idea indeed. But noone tried. Maybe there something is yet.


Check out the LTAC thread

bye,
Sebi
Efenstor
Mostly what I meditated on. Though clearly my thought was to research compressionacoustics. After all, all sound is a bunch of sines, it may be really possible to find out which frequenices give minimum differences when extracted, re-created and compared with the original wave. At least in the most cases.
Ivan Dimkovic
There is a devil called "entropy of the source" - whatever you do with the source, it is mathematically impossible to loselessly code signal in a such way that it require less bits than it's own entropy. Discussion about this is more like inventing a perpetuum mobile or tri-secting the angle, better forget about that smile.gif

Hybrid loseless codec implemented perfectly will be a percent or few percents less efficient than plain losless due to signalling overhead.
Pinky's brain
Besides, if you use an invertible transform why not simply losslessly code the coefficients?

AFAICS the best performing lossless audio coders use LMS IIR prediction, and indeed multistage compression is almost always inefficient.

See :
http://edocs.tu-berlin.de/diss/2003/kim_jonghwa.pdf (easy to read, nothing too complex here)
http://www.eurecom.fr/~mary/publications.html (uses multistage compression, but math is too dense for me to get my head around)

Lossless image coders are moving to using bayesian techniques using multiple predictors, using the history of prediction errors to determine the "weighting" of each predictor (the simplest way is to do straight weighting of the predictions or choose the prediction with the heighest weight and then use traditional entropy coding assuming a laplacian or GG distribution ... but in theory you could compute a compound probability distribution for the actual coefficient and use that for coding, instead of using differential coding).

If I (as a complete layman I should add) wanted to improve lossless audio coding I would go the same way. Use multiple LMS trained IIR predictors with different adaption rates, and maybe a library of pretrained predictors, and use the history of prediction errors of each to weight them.
cabbagerat
I decided to test this out, so I wrote a shell script that:
  • Encodes a wave with LAME
  • Subtracts the resultant mp3 from the original
  • Flac compresses the resulting difference file
  • tar.bz2 compresses the two files (MP3 and FLAC difference) togther
This process generally gets within 20% larger than the FLAC of the original file. Some samples do REALLY badly (like BigYellow from the current 128bit test - the compressed version is bigger than the original) and I haven't yet found a sample that does better than straight FLAC with this process. Interestingly, compression rates vary wildly with the individual sample and LAME settings (some samples get best compression with LAME --preset insane, others with --preset 64). While this is hardly a scientifically interesting test, it was realtively interesting to write and test.

Ivan Dimkovic is right about information entropy. Lossless audio encoders are a very interesting case here - as they can losslessly compress a sound file into a smaller number of bits than it's information entropy predicts. While this sounds impossible, it's not as some of the information contained in the waveform is actually contained in the decoding algorithm. Thus lossless audio compressors will do very badly with the vast majority of waveforms, but very well with ones that represent sensical audio data.

A good place to start if you don't know anything about the subject is here:
http://en.wikipedia.org/wiki/Information_theory
http://en.wikipedia.org/wiki/Information_entropy
http://en.wikipedia.org/wiki/Kolmogorov_complexity
robUx4
Mmm, I don't think that you can compress below the entropy. Or I don't know how you compute it. But given the nature of sound, you can predict that the next sample will be somehow like the previous one. That's this difference that needs to be encoded... Of course with pure electronic music things are getting harder since the signal/sound doesn't necessarily have this "sound" nature...

The problem of this approach (compress the difference) is that it's like encoding pure noise. All the "sound" nature of the signal is removed and nothing remains but noise... But when the lossy codecs get closer and closer to the source with a small bitrate, this noise can be compressed to less bits (imagine if the difference is never bigger than 8 bits in a 16 bits source, you already get a 50% compression for free). So IMO this should be investigated a bit further. Even with MP3 if you chose an MP3 at 320kbps and at 64 kbps, you have more bitrate for the lossy codec. That'sd why I think that HE-AAC could be a good candidate in this case. The result is not so far from the original, compared to the bitrate.

Anyone willing to do a test with MP3 and/or Vorbis + FLAC and/or ZIP at different bitrates ?

Or is there any command-line tool to compute the difference of 2 WAV files ? (maybe I should code one)...
NumLOCK
QUOTE(cabbagerat @ May 17 2004, 11:36 AM)
Ivan Dimkovic is right about information entropy.

Indeed. smile.gif

QUOTE
Lossless audio encoders are a very interesting case here - as they can losslessly compress a sound file into a smaller number of bits than it's information entropy predicts.


Wrong ! The reason why you can usually compress a sound file, is because PCM coding needs more bits than what the actual entropy of the data would require.

Therefore when you code the data in a more clever way (ie: remove redundant data), you will approach that theoretical limit (which is a hard limit, determined by the real entropy value).

QUOTE
While this sounds impossible, it's not as some of the information contained in the waveform is actually contained in the decoding algorithm.

Wrong.. this is a misconception.
Even with a 10GB algorithm which compresses a 1MB sound file, the worst-case compression (with a clever algorithm) would be 1MB + 1 bit, and the best-case compression (with the same algorithm) *could* be 1 bit.

A 1-bit compressed file contains just the information: "YES, it is the exact data which is known to the algorithm". Unfortunately it can only be done with one set of input data. And it will expand all other possibilities by 1 bit (1st bit, which will say "NO, this it something else").

When you take more space for the algorithm you can make it more clever, but still, all data-dependant information will be in the compressed file. Otherwise you cannot unpack the files.

QUOTE
Thus lossless audio compressors will do very badly with the vast majority of waveforms, but very well with ones that represent sensical audio data.

Yes, because:
- sensical audio data is redundant (ie: contains correlation)
- and: the algorithm is made to take advantage these correlations.

Unfortunately: you cannot make a compression algorithm which is good on any (ie: random) data, so you will expand random data by at least 1 bit, no matter how complex and clever the algorithm is dry.gif

Edit: By the way, if you could make an algorithm which reduces any data by just 1 bit, you could also use it several times, and therefore compress anything down to zero bits laugh.gif
cabbagerat
QUOTE
By the way, if you could make an algorithm which reduces any data by just 1 bit, you could also use it several times, and therefore compress anything down to zero bits

Which would be very cool. I am really looking forward to the new LAME --preset onebit. smile.gif
QUOTE
Wrong ! The reason why you can usually compress a sound file, is because PCM coding needs more bits than what the actual entropy of the data would require.

I didn't know that and find it very interesting. Do you have any references? I am not doubting you, I would just like to know more.
QUOTE
A 1-bit compressed file contains just the information: "YES, it is the exact data which is known to the algorithm". Unfortunately it can only be done with one set of input data. And it will expand all other possibilities by 1 bit (1st bit, which will say "NO, this it something else").
When you take more space for the algorithm you can make it more clever, but still, all data-dependant information will be in the compressed file. Otherwise you cannot unpack the files.

I agree completely. However, your 1 bit files contains, obviously, precisely one bit of information (in the Shannon sense). However the original contained more - which shows that the compressor decreased the information entropy of the original file. This information can't, by definition, dissapear - it's stored in the algorithm itself. There is as much (or more) information in (Algorithm+Compressed File) as in (Original File). If there wasn't then there couldn't be a 1->1 mapping between compressed files and uncompressed files.

Suppose I copy all my MP3s (and OGGs, MPCs, etc) into a big database indexed by their MD5 sum. Then I replace all my MP3 files with 128 bit binary files containing this sum. I have "compressed" my MP3 collection to a couple of hundred kilobytes. However that doesn't mean that they would be any good without the database (part of the algorithm), which runs to tens of gigabytes.
QUOTE
The result is not so far from the original, compared to the bitrate.

The result isn't very far, percuptually, from the original - but that doesn't mean that there isn't still a huge amount of data in the difference file. Some of the frequencies in the original will not reduce substantially in amplitude if the encoded file were subtracted from the original.
robUx4
QUOTE(cabbagerat @ May 17 2004, 06:34 PM)
QUOTE
Wrong ! The reason why you can usually compress a sound file, is because PCM coding needs more bits than what the actual entropy of the data would require.

I didn't know that and find it very interesting. Do you have any references? I am not doubting you, I would just like to know more.

The entropy of digital silence is close to nothing (1 bit ?). And it can be encoded at 24 bits 192kHz.
robUx4
QUOTE(cabbagerat @ May 17 2004, 06:34 PM)
QUOTE
The result is not so far from the original, compared to the bitrate.

The result isn't very far, percuptually, from the original - but that doesn't mean that there isn't still a huge amount of data in the difference file. Some of the frequencies in the original will not reduce substantially in amplitude if the encoded file were subtracted from the original.

I know but IMO it would be nice to make a few tests to ensure this is a complete wrong way.
cabbagerat
QUOTE
I know but IMO it would be nice to make a few tests to ensure this is a complete wrong way.

Fair enough. I have done some fiddling myself, but with exams coming up I don't have time to do any conclusive experiments. It would be cool if somebody did them though.

If anybody is interested, they can get my script and sources from here: http://users.smuts.uct.ac.za/~mbrooker/lossyless.tar.gz
Nothing that most here couldn't put together in half an hour, though.
robUx4
How do you make the WAV substraction ? That's all I'm missing (I've started coding such a utility called pcm-)...

edit: I see you do a basic difference on raw data. I was thinking about doing it with WAV (with my own WAV classes). But it LAME, OGG and FLAC can handle raw audio, that's fine.
Pamel
QUOTE(cabbagerat @ May 17 2004, 12:34 PM)
I agree completely. However, your 1 bit files contains, obviously, precisely one bit of information (in the Shannon sense). However the original contained more - which shows that the compressor decreased the information entropy of the original file. This information can't, by definition, dissapear - it's stored in the algorithm itself. There is as much (or more) information in (Algorithm+Compressed File) as in (Original File). If there wasn't then there couldn't be a 1->1 mapping between compressed files and uncompressed files.

I'm happy to see that someone finally brought up the (Algorithm+Compressed File) problem. There is a theory of compression that goes something like this:

QUOTE
It is impossible to make a compression algorithm that can compress random data where the resulting compressed data plus the size of the algorithm is smaller than the original data.
Given this, how is anything compressed losslessly? Well, most data isn't very random. Images and sounds have distinct patterns that allow them to be compressed by programs that depend on those patterns. If you try to compress data that does not contain those patterns, your resulting file plus the size of the algorithm will always be larger than the original file.

Now, how does that apply to the posters question? Well, if you compress data (audio data in this case) in a lossy manner, then take the difference of the lossy data from the original, the resulting file will be lacking those patterns that are needed for good compression. In fact, the data contained in the diff will be nearly random. So, bad compression.

More information about this is available from the Comp.Compression FAQ, or you can ask directly at the Comp.Compression usenet group.

IIRC, there was a prize being offered at one time to anyone that could break this 'Law of Compression'.
Pamel
Found it. Here is a Wikipedia reference to the challenge. And here is the quote from the FAQ:
QUOTE
Steve Tate <srt@cs.unt.edu> suggests a good challenge for programs
that are claimed to compress any data by a significant amount:

    Here's a wager for you: First, send me the DEcompression algorithm.  Then I
    will send you a file of whatever size you want, but at least 100k.  If you
    can send me back a compressed version that is even 20% shorter (80k if the
    input is 100k) I'll send you $100.  Of course, the file must be able to be
    decompressed with the program you previously sent me, and must match
    exactly my original file.  Now what are you going to provide
    when... er... if you can't demonstrate your compression in such a way?

So far no one has accepted this challenge (for good reasons).

Mike Goldman <whig@by.net> makes another offer:

    I will attach a prize of $5,000 to anyone who successfully meets this
    challenge.  First, the contestant will tell me HOW LONG of a data file to
    generate.  Second, I will generate the data file, and send it to the
    contestant.  Last, the contestant will send me a decompressor and a
    compressed file, which will together total in size less than the original
    data file, and which will be able to restore the compressed file to the
    original state.

    With this offer, you can tune your algorithm to my data.  You tell me the
    parameters of size in advance.  All I get to do is arrange the bits within
    my file according to the dictates of my whim.  As a processing fee, I will
    require an advance deposit of $100 from any contestant.  This deposit is
    100% refundable if you meet the challenge.


Now, I am not saying that what Efenstor says is impossible, just not with the tools available. You would need to use an algorithm that produced a file that was very close to the original. This means that any codecs that use any type of "psychoacoustic-based approach" would need to be skipped since the data removed by this approach would create more "random" data for the diff. With an accurate enough file, you could create a diff that was mostly zeros. Then to compress that, you wouldn't want to use a lossless audio codec as those are tuned to audio needs. You would need to use an algorithm tuned to the data produced by these diffs that was mostly zeros.

This MIGHT be an efficient way to do it, but testing would be needed to see for sure, and that is a lot of work to really do it right.
NumLOCK
QUOTE(cabbagerat @ May 17 2004, 06:34 PM)
I agree completely. However, your 1 bit files contains, obviously, precisely one bit of information (in the Shannon sense). However the original contained more - which shows that the compressor decreased the information entropy of the original file. This information can't, by definition, dissapear - it's stored in the algorithm itself. There is as much (or more) information in (Algorithm+Compressed File) as in (Original File). If there wasn't then there couldn't be a 1->1 mapping between compressed files and uncompressed files.

Suppose I copy all my MP3s (and OGGs, MPCs, etc) into a big database indexed by their MD5 sum. Then I replace all my MP3 files with 128 bit binary files containing this sum. I have "compressed" my MP3 collection to a couple of hundred kilobytes. However that doesn't mean that they would be any good without the database (part of the algorithm), which runs to tens of gigabytes.

You're right, when we consider specific files (ie: 1000 predefined MP3's), the size of the algorithm should be taken into account.

In the more general sense, ie. when you need to compress any given MP3, you can make a database of all possible MP3's and then, during compression, just store the index to the received file.

Unfortunately if you really have all possible MP3's in the database, then the index size will be equal to the file size -- thus a MD5sum will not be enough to differentiate them wink.gif

About the entropy: In fact the entropy of a file whose contents can be completely predicted, is zero.

Note: A possibility to compress a MP3 further, would be to build a table of all possible MP3's of a given length (which should produce a fairly big table whistling.gif ). Ok, now there's a 1:1 relationship between indices and elements. Then, eliminate all awful-sounding possibilities from the table. Now you gain a few bits on the indices, which means further lossless compression of MP3's cool.gif

This would mean, obviously, exponentially large memory and processing requirements. In fact, it would already become infeasible with a 16-byte MP3 -- which doesn't even cover the full header biggrin.gif
SirGrey
QUOTE
<Wrong ! The reason why you can usually compress a sound file, is because PCM coding needs more bits than what the actual entropy of the data would require. >
I didn't know that and find it very interesting. Do you have any references? I am not doubting you, I would just like to know more.

Read about Huffman codes (or coding).
The idea that any *real* data use more space because of correlation of values is very interesting to understand biggrin.gif
The idea is that in any *real* alphabet you use to represent the data, some elements are used frequently than others. For example, symbol "A" is used more often than others in an english text. So, it is waste of bits to code A to 8Bits. It should be coded with 1.
And the same applies to 16Bit audio sample representation and so on...
cabbagerat
QUOTE
QUOTE
It is impossible to make a compression algorithm that can compress random data where the resulting compressed data plus the size of the algorithm is smaller than the original data.

Given this, how is anything compressed losslessly? Well, most data isn't very random. Images and sounds have distinct patterns that allow them to be compressed by programs that depend on those patterns. If you try to compress data that does not contain those patterns, your resulting file plus the size of the algorithm will always be larger than the original file.

Maybe I should rephrase that - how about "It is impossible to make a compression algorithm that can compress data where the amount of information entropy contained in the sum of the algorithm and the compressed file is less than the information entropy contained in the original.
QUOTE
Then, eliminate all awful-sounding possibilities from the table.

Can't do that - It would eliminate half of all current releases wink.gif
QUOTE
Read about Huffman codes (or coding).
The idea that any *real* data use more space because of correlation of values is very interesting to understand

Fair enough - I think I misunderstood what numLOCK was saying - I heard:
"The PCM representation of an arbitrary wave form is not that most compact representation that preserves all the original data". Or "All waveforms can be expressed more efficiently in way other than PCM".
QUOTE
How do you make the WAV substraction ? That's all I'm missing (I've started coding such a utility called pcm-)...

edit: I see you do a basic difference on raw data. I was thinking about doing it with WAV (with my own WAV classes). But it LAME, OGG and FLAC can handle raw audio, that's fine.

I use SoX to convert to RAW, subtract them, then use SoX to convert back to wav. Mostly this is because I was too lazy to write a proper program that understands WAV. Using libsndfile this should be trivial, however.
The relevent command lines (for 16 bit, 44100Hz stereo):
sox in.wav -r 44100 -c 2 -w -s out.raw
sox -r 44100 -c 2 -w -s in.raw out.wav
robUx4
I made some test here (without an audio editor to check some things). Apparently the Vorbis@128+FLAC combination can give good results on some file (and poor on others). Vorbis@64+FLAC gave worst result than the 128 one.

Bzip2 can't compress any of the residual noise I've produced with different combinations.

MP3 has this encoder delay problem that seems to make sample accurate files impossible. (I also had problems with the --nogap option).

I may try the same thing with AAC using FAAC/FAAD. Hopefully there is no such encoder delay problem...

But at least I found one file in which I gained 14% compared to the same file compressed with FLAC. So this could work practically with more tuning.
robUx4
OK, it seems to be useless with AAC also. So I'll just drop the idea from now on...
SebastianG
QUOTE(robUx4 @ May 18 2004, 03:06 AM)
Bzip2 can't compress any of the residual noise I've produced with different combinations.


Bzip2 makes use of the Burrow-Wheeler-Transform. This is kind of permutation which usually (ie for text) leads to long runs of same symbols because there is a strong inter-symbol relationship. Since the residual is a very noisy digital signal this transform is more or less useless.

Most - if not all - general purpose compressors don't perform very well in this case.

I think the best approach would be to use vorbis at -q6 and to code the difference via LPC+rice coding. The cool thing about Vorbis in this case is: The already-coded floor-curve tells you something about the time/frequency energy distribution within the difference signal. This information could be used to calculate an LPC-whitening filter & a good guess for the optimal rice-coding parameter k. So, the usual FLAC-overhead (LPC filter & rice coding parameters) can be minimized.

The bad thing is: All this has to be done in a 100% deterministic way in order to be lossless.
If you don't wanna rely on a certain floating-point implementations (which is certainly a good idea!) then you have to do everything using integer arithmetic. (tough job that is!)

Sure, it's possible somehow, but IMHO not worth the effort.

QUOTE(robUx4 @ May 18 2004, 03:06 AM)
But at least I found one file in which I gained 14% compared to the same file compressed with FLAC. So this could work practically with more tuning.


That is a surprise. I suppose this is a very rare case.

bye,
Sebastian
mcbevin
QUOTE
I made some test here (without an audio editor to check some things).


Firstly, in my experience its _very_ important to make fully sure that whatever you're doing is fully bit-identical reversible. Otherwise you can be 99% sure, whenever you find yourself with some big improvement in compression, that you've just made a mistake somewhere.


QUOTE
But at least I found one file in which I gained 14% compared to the same file compressed with FLAC. So this could work practically with more tuning.


Thats _very_ interesting, even if its not a common case. But first, it would be very good if you can have a test setup whereby you do the compression and then the decompression and then bitwise compare the two files to be sure you're doing something that can work.

I take when you say this file gained 14% that you mean the total size of the .ogg(?) file + flac compressed residual(?) was 14% smaller than flac compressing the original wav? If so it would definitely be worth looking further into, especially if the music file used wasn't too unusual. You could also try with some of the better compressing lossless audio codecs to see if they can be improved as well.
cabbagerat
My own tests reveal that the OGG+Flac combination can get very close to, or in rare cases a few percent smaller than, raw FLAC on a minority of samples. On about 80% of samples it is between 10 and 50% larger.
QUOTE
Bzip2 makes use of the Burrow-Wheeler-Transform. This is kind of permutation which usually (ie for text) leads to long runs of same symbols because there is a strong inter-symbol relationship. Since the residual is a very noisy digital signal this transform is more or less useless.

That's interesting. Most of the reason I used bzip2 is that I type "tar cjf" almost by reflex, rather than anything else.
QUOTE
Sure, it's possible somehow, but IMHO not worth the effort.

I would tend to agree. Even if it does manage to do a few percent better than FLAC (which looks doubtful) it is much, much slower and implementing seeking and the rest would be a real pain.It was an interesting idea though.
robUx4
I agree on all this.

If you want to test yourself the 14% one, it's LFO "Blown" which can be found on my website. And yes it gives 14% better compression than FLAC alone. But the other files I tried were much bigger. So IMO it's not really spending more time on this.
mcbevin
Well the importance of this is not that one should try and create a lossless coder by slapping ogg and flac together. However if, even on only a few songs, ogg+flac beat flac alone, especially if by such a large number as 14%, then that shows that theres a huge potential room for improvement.

However, its already known that FLAC has room for improvement - just look at comparisons to the other encoders. If you could repeat the tests with La or Optimfrog (I am the admittedly the La developer but the reason I would like to see the results with La is that La also generally has the best compression) and there was improvement in some cases, then that would be important. The next step would then _not_ be to slap ogg+la/ofr together, but rather to determine what it was that ogg was doing that was improving things, to code something doing something similar but modified to suit lossless compression, modify the lossless compressor to be more suited to the new signal, put the two together in the filter pipeline, play around a bit, and then see if that couldn't give significantly better results.

As you can imagine, a difficult and lengthy process, which is why its something that would only be worth undertaking if some more definitive results were first available.

Now I would take this 'blown' file and construct a test with it except the file on your website is an mp3, and converting mp3->wav and then doing the test would be rather meaningless (i trust this isn't how you've performed it?).
robUx4
QUOTE(mcbevin @ May 19 2004, 02:00 PM)
Now I would take this 'blown' file and construct a test with it except the file on your website is an mp3, and converting mp3->wav and then doing the test would be rather meaningless (i trust this isn't how you've performed it?).

This is how I did it. Actually the converted file is audio anyway, not even with the maximum entropy it originally had, but the difference is not so important in this case (what if you want to encode a poor recording with this codec ?). Otherwise you can buy the CD "Sheath" which is on Warp.

The only interresting point in this kind of hybrid codec would be to have both a lossy part and a lossless backup. That means when you want to put the audio file in a portable device or stream it, you don't have to reencode it but just use the lossy part with a rather good quality. That would be convenient in the future (large HD and portable devices with few CPU power).
mcbevin
QUOTE
This is how I did it. Actually the converted file is audio anyway, not even with the maximum entropy it originally had, but the difference is not so important in this case (what if you want to encode a poor recording with this codec ?). Otherwise you can buy the CD "Sheath" which is on Warp.


Deary me. It almost goes without saying if you lossy encode something, then decode it, then re-lossy encode it, and decode it again, that you might be able to compress the final result better with a 'lossy+lossless on the residual' approach than a pure lossless approach.

I.e., its quite plausible that some lossy codecs, though I don't know which if any, have a property whereby (say for ogg as an example) after performing the conversions wav->ogg1->wav1->ogg2->wav2 that ogg1==ogg2 bit identically and thus wav1==wav2, and then the residual would be a bunch of zeroes which would of course be easy to compress and the combination of the ogg+the close to 0-byte residual would be more efficient than losslessly compressing wav1. Or if this is not the case its at least very possible that wav1 and wav2 are much more similar than wav and wav1, even if you use different lossy codecs for the two stages.


QUOTE
The only interresting point in this kind of hybrid codec would be to have both a lossy part and a lossless backup. That means when you want to put the audio file in a portable device or stream it, you don't have to reencode it but just use the lossy part with a rather good quality. That would be convenient in the future (large HD and portable devices with few CPU power).


That could be interesting except that:
1. two lossless compressors already do this quite well.
2. using a lossy compressed file as the base and then encoding the residual is generally a horrible basis for lossless compression if you're looking for good compression.

From my perspective the interesting thing would be if some _techniques_ from lossy compression could be incorporated into a lossless codec. In general I'm dubious of the idea as the needs of lossy and lossless compression are so different, but I try to keep an open mind to all possibilities.
SebastianG
QUOTE(mcbevin @ May 19 2004, 05:00 AM)
The next step would then not be to slap ogg+la/ofr together, but rather to determine what it was that ogg was doing that was improving things, to code something doing something similar but modified to suit lossless compression, modify the lossless compressor to be more suited to the new signal, put the two together in the filter pipeline, play around a bit, and then see if that couldn't give significantly better results.

I guess frequency adaptive channel decorrelation would be one thing Vorbis does using adaptive vector codebooks that helps reducing file size of ogg+FLAC in comparison to pure FLAC.
FLAC's channel decorrelation is rather poor compared to Vorbis'.

Here's an advanced idea of how one could try to decorrelate a stereo signal.

1) choose a channel CH1 out of [L,R} to be the first that will be coded and CH2 the other channel that comes second.
2) code channel CH1 "the usual way" (ie LPC-filter + residual)
3) calculate a "good decorrelation-filter impulse response"
4) code this filter somehow
5) calculate the decorrelation residual by CH2' = CH2 - filter(CH1)
6) code channel CH2' the usual way

This decorrelation-filter should minimize the energy of [CH2 - filter(CH1)]
Its impulse response could be modeled as a weighted sum of bandpass filters. These weights could then be coded compactly by something like delta&huffman-coding.

To exploit phase correlations we could assign 2 bandpassfilters for the same frequency band with a phase shift of 90°. This way we would be able to predict CH2 well even if there are phase differences.

The actual weights could be calculated the following way:
1) calculate FFT on both channels
2) divide spectrum into smaller subbands
3) calculate subband energies and (complex) cross-correlation factors
4) calculate weights by weight[subband]:=crosscorr[subband]*energy[CH2][subband]/energy[CH1][subband]

any coments / suggestions ?

bye,
SebastianG
Efenstor
As I said before, none of the existing codecs fit the need, especially those which extract the noise component and remake it when decoding (e.g. Vorbis).

When I compressed a wave using Musepack (indeed it doesn't matter), then decoded it, compared with the original, generated the alpha-file and listened to it. As one could expect, it consisted exclusively of noises, those noises which human ear somewhat cannot hear.

In the case of lossy-based coding it should behave otherwise: it should code pure sines VERY roughly but encode noises as fine as possible. Probably it would be not an MP3-like compression at all. It should split wave to noise and sines and pay 9 of 10 to noise and 1 to sines.

In other words, it requires much more in-depth research. It is impossible to prove or deny it using existing psychoacoustic-based codecs.
robUx4
Yes, as Pamel said a lossy codec in this case would probably have no need for a psychoacoustic model.

And about the source, I'm sorry but the decoded file is just music that has almost the same entropy as the original. So wether a codec will almost produce the same result is not important. That's what you would expect from a codec anyway ! And actually that would have been the case with MP3 but not Vorbis. And the opposite happens...
This is a "lo-fi" version of our main content. To view the full version with more information, formatting and images, please click here.
Invision Power Board © 2001-2008 Invision Power Services, Inc.