Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: Compressing audio by analyzing it as an image (Read 5481 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

Compressing audio by analyzing it as an image

I am by no means an expert on audio compression but I just was thinking about compressing audio by analyzing it as an image.

Just as a test I created a 1-bit image that was 441 x 65536 in size.  441 to represent 1/100th of a second at 44.1khz and 65536 for the amplitude range.  I drew a 1 pixel line straight across so that each individual pixel represented a sample point for the audio (could have drawn a sin wave but a straight line was easier).  So I had 1 pixel in each 65536 columns that was white instead of black.  Anyhow, I saved this as a png file (lossless) with a size of 3,803 bytes, I then compressed this to .7z (lossless) and got a size of 337 bytes.  So if my thinking is right 441x65536 should cover all possible bits in 1/100th of a second of audio with a size of 337 bytes, which would be 337b x 100 x 60 = 2,022,000 bytes for 1 minute of audio and unless I'm mistaken would be lossless as well.

Ok I'm actually probably missing something here that makes this not possible so go easy on me if I'm totally off base here.
What do you think?

Compressing audio by analyzing it as an image

Reply #1
Nope, won't work.  No real signal is as simple as what you made.  Make anything more complicated and it won't compress to 337 bytes.

Compressing audio by analyzing it as an image

Reply #2
After reading the thread subject, I was waiting for word "Fractal" being mentioned there in text ...

Since it's 'bout an idea ... here's one losless method (least for Windows O/S):

- compress the hard drive using build-in system feature 

Juha

 

Compressing audio by analyzing it as an image

Reply #3
Interesting.

I did a similar test, saving to a .bmp format, monocrome with a straight line. The size, as to be expected, was around 3.6MB. Compressed with .7z gave 739bytes. (so we can get that png's compression was unnecessary)

With a little more complex drawing (a curve), the final .7z sized 1.3KB.  I believe that for a real signal (and stereo!), the would be still some compression compared to a .wav file, but i doubt that it could be better than standard audio lossless encoders. Also, the encoders/decoders need to manage a notably bigger amount of data (1 sec of CD audio in this image format = 300MB!)

7zip does a dictionary encoding on the audio, which is only available because with lots of white space it is quite predictable. The more the signal varies, the less predictable will be the bytes of those single dots.

Compressing audio by analyzing it as an image

Reply #4
Quote
After reading the thread subject, I was waiting for word "Fractal" being mentioned there in text ...


... looks like "fractal audio coding"  has been recearched already - http://research.cs.queensu.ca/home/xiao/doc/Thesis.pdf

Juha

Compressing audio by analyzing it as an image

Reply #5
well I was bored and tired last night when I was thinking about it.  Obviously a straight line would compress far easier than an actual signal.  So as another test I generated a text file 441 x 65536 all zeroes then set a 1 in a random position for each 441 columns.  Probably worst case signal I could think of.  Well the smallest I could get the final file after playing with different compressions was 4090 bytes which is 24,540,000 bytes per minute then double that for stereo without any kind of joint stereo concept.

So ya, not very efficient.

Compressing audio by analyzing it as an image

Reply #6
audio != text != image

Without even getting into the details, this much should already be apparent. They do not compress the same way.

Compressing audio by analyzing it as an image

Reply #7
yes I'm aware audio != text != image.  I was just thinking of a different way of interpreting the audio.  Such as grooves on a record or magnetic material in cassettes.  You see a representation of audio as an image all the time like looking at a wave in audacity.  Technically you should be able to create a program that could look at the actual wave form and produce the audio from that, like digital grooves.  That's what I was thinking.  Anyways doesn't matter, it would never be practical anyways.

Compressing audio by analyzing it as an image

Reply #8
Computers do not know what the data is that they are compressing, it is just a bunch of numbers. So a mono audio signal is the same as a long 1 pixel high image as far as a comptuer is concerned. The best compression approach is determined by the information contained in the data, and that is something a human is best at determining.

A method that I just thought of that might be helpful for sample based music, is to compare the current bar to the previous and encode the differences. So if the same drum samples are being used at a constant tempo there is some redundacy to be exploited.

Compressing audio by analyzing it as an image

Reply #9
A method that I just thought of that might be helpful for sample based music, is to compare the current bar to the previous and encode the differences. So if the same drum samples are being used at a constant tempo there is some redundacy to be exploited.

Perhaps you should read about how audio encoders work.  You'd possibly enjoy it.  The method you describe is simplistic compared to the tools they carry in their bag.
Creature of habit.

Compressing audio by analyzing it as an image

Reply #10
Perhaps you should read about how audio encoders work.  You'd possibly enjoy it.  The method you describe is simplistic compared to the tools they carry in their bag.


I realise that audio encoders are very clever and use all kind of methods. I was not aware that my suggestion was one of them, and in the spirit of this thread (novel compression ideas) I thought I would throw it out there regardless of its merit.

Compressing audio by analyzing it as an image

Reply #11
yes I'm aware audio != text != image.  I was just thinking of a different way of interpreting the audio.  Such as grooves on a record or magnetic material in cassettes.  You see a representation of audio as an image all the time like looking at a wave in audacity.


PCM Audio is exactly equivalent to a %samples%×1 greyscale image with a certain pixel bit depth. One could go for a 1-bit image that's 2^n high and %samples% wide like you did, but staying with a 1-D sample axis and a bit depth per sample makes other audio concepts (like filtering, noise, dither, antialias etc) transferrable to the image domain as well.

You won't be able to paint any reasonable audio in Photoshop, though, because it has a size limit of 300,000 pixels which at 44.1KHz would amount a few seconds of audio.

The waveform, as you may realise, is a plot graph of relative sound pressure generated by the audio signal. Like that, all other visual displays of sound, however accurate of intuitive, are interpretations of the 1s and 0s. A spectrogram is closest to how we perceive sound, but it's actually grossly inaccurate and utterly unsuitable as a "visual" method of storage*.



*) this doesn't mean that there aren't any programs that can generate audio from an image by interpreting it as a spectrogram. There was an interesting little experimental program I played with some years ago, but I can't remember the name. Something with a C.

Compressing audio by analyzing it as an image

Reply #12
Computers do not know what the data is that they are compressing, it is just a bunch of numbers. So a mono audio signal is the same as a long 1 pixel high image as far as a comptuer is concerned. The best compression approach is determined by the information contained in the data, and that is something a human is best at determining.

A method that I just thought of that might be helpful for sample based music, is to compare the current bar to the previous and encode the differences. So if the same drum samples are being used at a constant tempo there is some redundacy to be exploited.
Perhaps you should read about how audio encoders work.  You'd possibly enjoy it.  The method you describe is simplistic compared to the tools they carry in their bag.
Simplistic 'on paper', maybe, but surely not in terms of computation?

Any samples will be mixed with other instruments, perhaps interpolated due to having originated in a different sample rate, etc. So an encoder (lossless or lossy, though I suppose the latter is slightly more feasible) can't just say "four of these".

It's not much more far-fetched than expecting an encoder to be able to compress, almost to nothing, a whole chorus that has the same notes/lyrics as the last; that can't be done, since the instrumental/vocal takes will be different, etc.

Iain actually hinted at this himself with "that is something a human is best at determining." The samples, section, etc. may sound the same to us, but the subtle differences may be enough to make lossless compression quite inefficient, i.e. they won't be the same to a computer. And it certainly wouldn't be lossless to encode such regions as "just put another sample/chorus that sounds (about) the same here" (and I imagine a lossy method for this is unlikely to emerge).

Compressing audio by analyzing it as an image

Reply #13
*) this doesn't mean that there aren't any programs that can generate audio from an image by interpreting it as a spectrogram. There was an interesting little experimental program I played with some years ago, but I can't remember the name. Something with a C.

Coagula.  Pretty fun.

Compressing audio by analyzing it as an image

Reply #14
Photosounder is a one-of-a-kind image-sound editing program.  - http://photosounder.com/

Juha

Compressing audio by analyzing it as an image

Reply #15
Computers do not know what the data is that they are compressing, it is just a bunch of numbers. So a mono audio signal is the same as a long 1 pixel high image as far as a comptuer is concerned. The best compression approach is determined by the information contained in the data, and that is something a human is best at determining.


The Computer itself may not know what the data is, however that doesn't mean it doesn't care.  Music compression, Video compression, Image compressions, and archive compression are all largely different because the data is interpreted as different things.  Namely the difference is what similarities can be exploited.  If you assume that the signal is just random, you have to work with a set of rules where you can't look for similarity too often.  However, if you knew that there were 5.1 Channels of audio that had large similarities between the channels, you can already start to look somewhere else.  If you know it's an image, you can look at adjacent pixels for similarity.  If you know it's a video, you can look at adjacent frames for motion changes. 

It's perfectly valid to attempt to find different ways to find similarities, and different interpretations can offer that.

tl;dr: the computer does care what type of data something is, as it has specialized algorithms for compressing different types of data.