Help - Search - Members - Calendar
Full Version: New Lossy Audio Codec
Hydrogenaudio Forums > Misc. > Off-Topic
Pages: 1, 2
cabbagerat
While studying for a Fourier Analysis test some of my flatmates and I were discussing how well JPEG would encode music. Since both lossy audio codecs (MP3, Vorbis, etc) and JPEG operate on the same basic idea (discarding unimportant data in the frequency domain) we decided it would be an interesting thing to test.

So I wrote a shell script which does the following:
1) Takes a 10 second sample of an MP3 and converts it to 8bit 44100Hz raw PCM
2) Arranges the data into a square image and Jpegs it
3) Unjpegs it and converts back to raw PCM data
4) Creates a WAV from the raw sound

I used imagemagick and sox to perform all the necessary conversions.

Looking just at compression, JPEG performs very poorly compared to MP3. Obviously changing the JPEG quality factor made a big difference, but even at terrible quality the images were pretty large compared to the MP3.

We sat down, whipped out the abx program from the LAME source and very quickly decided that JPEG is not a great audio codec. At 95% quality the music was alright - similar quality to a 64kbps MP3. The music degraded quickly as we increased the compression. At 75% the music started sounding really horrible - with wierd artifacts unlike anything I had heard before. The samples were more or less recognisable up till about 20% quality factor, any less and we couldn't tell Al Dimeola from Springbok Nude Girls.

Several conclusions can be drawn from this test:
- Procrastination leads people to do all sorts of insane things
- MP3, Vorbis and the rest do all sorts of magic unrelated to just dumping data
- JPEG's habit of dividing an image into 9x9 pixel blocks produces some very strange artifacts, including what sounded like pre- and post-echos with up to a second delay

At this point some of the involved parties started blaming the fact that we were transcoding for the bad quality of the sound. Another student blamed my speaker cables. It was an interesting experiment. I was very surprised that the sound didn't come out completely mangled. smile.gif
JohnV
Why didn't you use original wav source? smile.gif
I have absolutely no idea if it would have make difference, but doesn't JPG perform better with "smooth" rather than "dithered" data. Maybe the mp3 encoding makes the data more "dithered" (very unscientific description, but I'm tired, infact I don't know if I'm talking only BS), like it has some of the dc-coeffs 0...
NumLOCK
JPEG performs very poorly on sound, because it's not continous across 8-pixel boundaries. Thus you get discontinuities every 8 samples, basically adding a square wave to your sound biggrin.gif

The more you compress, the more it becomes unusable.

You should definitely try JPEG2000 !!
dreamliner77
THere was a program mention at HA probably about a year ago that did the same thing (converted wav's to jpg's and back). I forget the name of the program, i have it on my computer but seeing that I'm at a buddy's house, i can't remember. Maybe someone else remembers what i'm talking about.
danchr
Have you tried the other way around? Compressing a picture using MP3?
ShootThemLater
Do you think the RIAA's searches include pictures? If so, could this be the future of trading music on P2P networks? I'm sure I'm not the first person to have this thought. Could you also use PNG?
phong
PNG would be lossless, but would have much poorer compression than any lossless audio codec. With JPEG compression, you could produce better compression by using a "blur" filter, but I imagine it would have the same (or similar) effect as a combination of a lowpass plus echoes added before and after displaced by a number of samples equal to the width of the image.
rjamorim
QUOTE(dreamliner77 @ Oct 11 2003, 08:43 PM)
Maybe someone else remembers what i'm talking about.

I posted it on a News thread, more than an year ago.

http://www.webcenter.ru/~vsoft/BitmapPlayer.htm
cabbagerat
danchr asked above whether I had tried compressing an image using MP3. I hadn't yet, but I decided to try it. Turns out, LAME --alt-preset standard isn't an all-bad image encoder. Compression is a bit dissapointing, but the picture quality isn't bad at all.

I exported a picture I took of my dog to PPM, stripped the header then converted it to wav with sox. I ran lame on it then used sox to export the raw sound data. I needed to strip off a pile of bytes at the beginning (about 2 lines worth) then re-add the ppm header.

Lame came in with a average bitrate of 130, which created a 340KB file from a 921KB original. Compared to JPEG, it's pretty poor compression, but a 3-1 ratio isn't that bad. The output image is a little bit softer with odd aliasing artifacts on fine details. The colour saturation also seems to have been increased.

Maybe with a carefully built image it will be possible to see stuff like pre-echo artifacts.

If anybody is interested you can get the input and output images from here:
Original Image
Output Image

Needless to say, the fact that I use a generic power cord on my PC means the MP3 sounds terrible.
Ivan Dimkovic
QUOTE
While studying for a Fourier Analysis test some of my flatmates and I were discussing how well JPEG would encode music. Since both lossy audio codecs (MP3, Vorbis, etc) and JPEG operate on the same basic idea (discarding unimportant data in the frequency domain) we decided it would be an interesting thing to test.


That simply won't work well - because audio coders exploit the irrelevancy according to the human psychoacoustics, adding noise in frequency regions that are masked by outer-inner ear transfer and inner-ear processing.

Good audiovisual coders exploit the visual irrelevancy - so, you will end up with noise allocated in regions that do not correspond to psychovisual masking critereia.
sld
QUOTE(cabbagerat @ Oct 12 2003, 05:03 PM)
Needless to say, the fact that I use a generic power cord on my PC means the MP3 sounds terrible.

You aren't kidding... are you?
Hanky
No meaning to be offensive but I know more pleasant ways to waste my time
Joe Bloggs
QUOTE(sld @ Oct 12 2003, 06:04 PM)
QUOTE(cabbagerat @ Oct 12 2003, 05:03 PM)
Needless to say, the fact that I use a generic power cord on my PC means the MP3 sounds terrible.

You aren't kidding... are you?

That really made me laugh laugh.gif
Pio2001
The artifacts in the dog picture are mostly horizontal lines. When the picture is converted to sound, is it scanned line after line ?

It's too bad that the sound is one dimentional while the picture is 2 dimentional.
This experiment with jpeg compression will mostly show the effects of the descanning-filtering-rescanning. I think that any other filter, as soon as it is a function of the neighborous pixels (blur, artistic effects...) would have given the same kind of sound artifacts.
The "pre and post echos up to one second delay" comes from the fact that you "listen" to the picture line after line. When one dot is blurred, it expands into the above and below lines of the picture, that are converted into sound data playing long before or long after the central dot.

You should get a fine pre/post echo effect applying a vertical motion blur on the picture instead of a jpeg compression wink.gif
dreamliner77
Yes Roberto that was it. Now I know I'm not crazy. Well, not that crazy anyway.
n68
Gday..

just read this.. and it reminds me of a little util/prog.
called Camouflage.. it disguise the mp3. as jpeg.

it was a huge thing among streamload community
a couple years back.. still in use as far as i see.

and i can`t hear any "damage" on them

there is a few of those progs. and camouflage
is the best one..

for those who wan`t to try this out
http://www.freewaredownloads.de/cgi-bin/de...tail.cgi?ID=228
or do a google.
use a pic. template.. the compression adds the mp3 file
together with the pic. in a jpeg container..
with a option to add pwd. when uncamouflage the file..
you get the option to extract the mp3. or the pic..
the file becomes ca. 10Kb bigger..


smile.gif
Mac
That's just 'glueing' two files together, not encoding sound as jpeg smile.gif You are just hiding mp3 data inside a jpeg from what I remember?
NeoRenegade
Yup. And from what I remember it's not limited to MP3 and JPEG. I think you could stuff pretty much anything in.
n68
Gday..

@Mac.. i belive i wrote "reminds me".. wink.gif
i am totally aware of the fact that camouflage just
write a container with a different extension..
not encode/tranzcode the data.. smile.gif
Eugene
actually i tried the other way round

picture -> mp3 -> picture

i dont think i will spend too much time on that, but a short easy way to proof_of_concept:

http://eugene.ath.cx/graphic2mp3/

ok ok... using mp3 onto the raw rgb data would have been better than to compress a tga header and fix up the resulting wav-data with a new tga header suxx...

but hey, u can see the image !!!!!!!!

but why on earth is it tuned around 180 degrees?!?!

anyways, was fun...

Eugene
Eugene
ah, i forget the sizes :-(

tga (wav) size: 5760 kb

jpg size before mp3 conversion (90%) : 420 kb

jpg size after mp3 conversion (90%) : 1460 kb

(as expected more entropy in the decodes mp3-> wav -> tga)

mp3 size : 992 kb (--alt-preset standard) 544 kb (--alt-preset 128)

(as expected a compressor is better when it knows about the data to operate onto than a generic compressor or a compressor for such a different format)

Eugene
tuxp3
y not try splitting up each second of audio into bitmaps then making them into a avi and compress them with xvid (1pass 100%)
here the catch the bitmaps have to be nearly lossless copys of the original (~10) sec wave after being rebulit..... if someone could make the 10 bitmaps for me i could do the rest smile.gif
just a crazy idea using video codecs to store music.......(with minimal loss of data)
thankz, tuxp3
in short
10sec mp3 -> raw pcm audio -> (10x) bmp pix -> 10frame avi -> xvid 100% quality -> then back
(as little sound data loss as possible)
Niknak
The problem is that if you write the audio file to the uncompressed image one line at a time left to right then your going to hit a JPEG block boundary every 8 pixels and you get 8 different parts of the audio within each block.

I think the best bet would be to walk through the image buffer following a Hilbert curve - that way you will get the highest correlation between samples that are close to each other in the audio and pixels that are close to each other in the image. You'll need to pad you data with zeros to make the image dimensions a power of 2.
Niknak
Oh yeah, and do 8 bit audio and a monochrome image. I know JPEG only does colour images but you can convert to colour before compressing and back to mono after decompressing. The colour channels in the JPEG will compress down to virtually nothing.

Results will still be poor but will probably be the best you're going to get.

You can Google for a Hilbert Curve if you don't know what it is.
Doctor
Also, if you want to feed a color picture to an audio codec, make sure to feed each color as a separate audio channel (24 bit -> 8-bit 3-channel). Should be relatively simple with a generic image editor, or if you can make the codec recognize 24 bits as three samples.
johnsonlam
QUOTE(cabbagerat @ Oct 12 2003, 05:03 PM)
Needless to say, the fact that I use a generic power cord on my PC means the MP3 sounds terrible.

Changing powercord will improve sound only TRUE for analog audio.

If you insist it's true, maybe your PC have bad RF or EC interference, or the grounding of the old powercord is not connected, the analog part of the sound card and the audio cable was colorize by noise.

If the shielding is good, changing powercord in PC should have no significant changes in terms of sound quality.
ErikS
QUOTE(johnsonlam @ Nov 15 2003, 06:10 AM)
QUOTE(cabbagerat @ Oct 12 2003, 05:03 PM)
Needless to say, the fact that I use a generic power cord on my PC means the MP3 sounds terrible.

Changing powercord will improve sound only TRUE for analog audio.

If the shielding is good, changing powercord in PC should have no significant changes in terms of sound quality.

Care to explain why it would on analog audio (whatever that means...)?
Doctor
QUOTE(johnsonlam @ Nov 15 2003, 12:10 AM)
Hong Kong - International Joke Center

Hm, should have got it...
wkwai
QUOTE(NumLOCK @ Oct 11 2003, 03:25 PM)
JPEG performs very poorly on sound, because it's not continous across 8-pixel boundaries. Thus you get discontinuities every 8 samples, basically adding a square wave to your sound  biggrin.gif

The more you compress, the more it becomes unusable.

You should definitely try JPEG2000 !!

Yes I agree.. JPEG uses a non-overlapped 2D DCT Block.. There would be "blocking effects".. the frame boundary noise would be very annoying.. The use of overlapped DCT would result in the boundary noise being spread across 2 different frames and below the perceptual masking level of the human hearing.
aspifox
Ouch. I did something similar (and, sadly, in full seriousness) for audio->jpg->audio around 1993, for a crude streaming-voice hack so I could shout obsceneties at a friend over my little modem. It was an inetd-launched shell script(!) which used pipes, cjpeg and djpeg.

There were also two home-grown C filters in the pipeline, only one of which is probably of interest to forum readers; it (de/)arranged every 64 8-bit audio samples onto an 8x8 Hilbert curve in the greyscale bitmap, which made the 2d arrangement of the data much more jpeg-friendly.
Supacon
QUOTE(johnsonlam @ Nov 14 2003, 09:10 PM)
QUOTE(cabbagerat @ Oct 12 2003, 05:03 PM)
Needless to say, the fact that I use a generic power cord on my PC means the MP3 sounds terrible.

Changing powercord will improve sound only TRUE for analog audio.

*



A system that I use for DJing once did actually have sound issues related to the power cable... er... you could think of it that way.

There was actually a ground loop because of the way that the audio equipment was plugged in, and it would feed a 60 Hz buzz through all the equipment that ended up in the sound, and it was very annoying. A temporary workaround was to break off the ground pin from the PC power cable and the mixer power cable.

After a lot of reading and rewiring stuff, I was able to plug every single component into the same outlet, to ensure that they were indeed sharing a common ground point (which must not have been the case before, resulting in the ground loop). The problem was almost completely solved after that.

So, as silly as it may sound, a power cord can affect the sound coming out of a computer. Of course, this would have absolutely nothing to do with the process of encoding music... uhm... last I checked that was an entirely digital process.
rc55
Release some code!

Some IDM freak will use it in a track, and then Mac will say it's amazing beautiful soundscapes of mad clickery! (maybe...)

But seriously, any form of creative distortion is cool by me, so share the code. smile.gif

Ruairi
Woodinville
Well, JPEG is mostly a source coder, not a perceptual coder, and its perceptual tuning is for the eye.

MP3 is an audio coder, and gets its gain about equally from source and perceptual coding.

I can't imagine why one would do what the other does, let alone well.
Lyx
QUOTE(rc55 @ Mar 10 2005, 03:51 AM)
Some IDM freak will use it in a track, and then Mac will say it's amazing beautiful soundscapes of mad clickery! (maybe...)
*



LOL! You've red too much electronica reviews in recent times ;)

- Lyx
rjamorim
QUOTE(Lyx @ Mar 28 2005, 01:49 PM)
LOL! You've red too much electronica reviews in recent times wink.gif
*



More like he has been talking too much with Mac :-B
sven_Bent
hmm why not make the picture in 1*XXXX so that the picture data is "1 dimensional" like audio. That would probaly remove the one second pre-/post- echo.

:-)

--edit--
Some words came out in danish
rutra80
Audio isn't quite "1 dimensional" - it has depth (bit depth), and if it's not mono it also has parallel worlds (additional channels) wink.gif
I wonder what would happen if audio were stored in the way similar to lossy encoders (so the "dimensions" would be: time, frequency, and dB), so basically the picture would be a spectrogram. I guess the problem would be with converting such a spectrogram JPEG back to PCM audio...
deej_1977
This thread is still alive o_O? Guess April fool's is here allready.
XoR
I was disappointed that there are no images to view in this interesting topis ohmy.gif

Fortunatelly some time (2 years ago?) I had briliant idea of doing all what is in this funny topis cool.gif

"Audio -> JPG -> Audio" - this was terrible experience. there was no way to make it sound right though mono jpeg on mono 8bit wav sounded best laugh.gif
"BMP -> MP3 -> BMP" - and heare I still got some picture cool.gif
user posted image <- oryginal image that was used
user posted image <- MP3ed (i don't remember bitratee but it was low, maybe 64... (image is larger cos I wanted it not to have any further jpeg artefacts smile.gif )

I used some audio program and IrfanView's RAW plugin laugh.gif

How do you say about my colorful mp3 image? cool.gif If I remember right it is cobined from 3 layers (Red layer + Blue ...) laugh.gif cool.gif laugh.gif

As you can see mp3 start to change picture before and after mountain in the same distance. On high bitrate rfom I remember this distance was smaller and whole image looks more solid though still with a lots of horizontal artifacts wink.gif

Waiting for your pictures smile.gif
rutra80
QUOTE(XoR @ Sep 15 2005, 12:28 AM)
How do you say about my colorful mp3 image? cool.gif
*


I'd say that MP3 is a nice retouch effect (kinda like wrecked VHS tape) wink.gif
I guess that this picture is a good example of MP3 quantization noise?
Anyway, quality is suprisingly good! smile.gif
HbG
I love this thread.
Shade[ST]
I no longer need photoshop to add noise + motion blur my images!!

How about doing the same on jpeg files?
HisInfernalMajesty
How exactly does one go about doing this (converting a JPG/BMP to WAV)? I've tried that program posted earlier (Bitmaps and Waves) and when I convert the picture wav back to a picture, I kinda get the picture, but it's severly distorted...
XoR
I think that with OGG 64kbps those my pictures would look a way better smile.gif
Klyith
I call BS on XoR's picture. I tried doing a image -> audio -> image conversion myself and it was nothing like his. Using any lossy codec I got huge amounts of noise, color shifts, and other garbage. The image was recognizable in outlines and general shapes, but nothing else. The only exception was that I did the same process with flac as a control, and it worked perfectly.

I'm not sure what produced XoR's pic, but it wasn't mp3 or any other audio codec. If you want to see what I got, I uploaded them here:
http://home.earthlink.net/~klyith/forums/lossypic/
The original is 640x480, but I resized all the output ones by 50% to save space & bandwidth. The actually look better when smaller as well, because some of the noise is cut down.
rutra80
Did you compress planes of colour components separately like XoR did? If you compressed whole chunk at once then this is why your results differ so much.
Klyith
QUOTE(rutra80 @ Sep 15 2005, 11:53 PM)
Did you compress planes of colour components separately like XoR did? If you compressed whole chunk at once then this is why your results differ so much.

Yes, I did both three seperate files for RGB color and an interleaved style raw file, they were not substantially different.

However, I was just messing with PSP and found it had nore options for exporting raw than photoshop. So I tried a planar (non-interleaved) raw file. That ended up being totally different. See here. It has odd problems, but there are areas much better definition is preserved. Wtf?

I think maybe my methodology is flawed somehow. Here's what I'm doing:
1. Save as raw using
a) Photoshop - three files for seperate channels
b) Photoshop - one interleaved raw file
c) PSP - one planar raw file
2. Import raw files as pcm using Audacity
3. Save as 16bit 44khz wav files
4. Convert to lossy
5. Convert back to wav (using foobar)
6. Strip wav headers using StripHdr.exe
7. Import raw file as image

I guess I'm not as dubious of Xor's pic anymore, but it still looks artificial to me. Hrmph. I need a more detailed explanation of how other people are generating their results.
Rotareneg
Here's an example of aoTuV beta 4 at -q 0, the raw file is 288 kB, the Vorbis only 46 kB. smile.gif

user posted image

[edit]Actually, I take back the part about interleaving, it doesn't matter (just tried with it.) Just make sure you're processing the raw file as 8 bit all the time and it'll work fine apparently.[/edit]
Klyith
OK, suddenly I got it working perfectly. Sorry I doubted everyone.

I was screwing up something with the bitness when importing to audacity. It seems to make random guesses about the pcm when you do that... Anyways, the important thing is to use unsigned 8-bit. I was also confused by the options for endianness that seem to have no effect.

I was experimenting on a different pic and found that an image with vertical structures and lots of contrast works really well. You can actually see the pre/post-echo. I'll put up some images tomorrow to make up for my earlier stupidity.
XoR
I did it again smile.gif

I must say I didn't compress alla layers separalely. That was my idea of how I would do color image before I really started to do this (2 years ago). Practice shows that it wasn't necessary smile.gif

Now I do some comparsion of audio codec on my 17' monitor and 24bit RGB images and I must say that what I heard is the same I really can see cool.gif

MPC - increase contrast and saturation of images blink.gif
MP3 - on the contrary decrease contrast a bit
OGG - do perfectly from all above and show oryginal colors
WV - I see no diffrence from oryginal picture (best audio picture codec :] )
OFS - the are any diffrences too !!!!!!!!!!!!!!!!!

Later I will make some pictures for public wiev and give url's smile.gif
This is a "lo-fi" version of our main content. To view the full version with more information, formatting and images, please click here.
Invision Power Board © 2001-2008 Invision Power Services, Inc.