IPB

Welcome Guest ( Log In | Register )

> Hydrogenaudio Forum Rules

- No Warez. This includes warez links, cracks and/or requests for help in getting illegal software or copyrighted music tracks!
- No Spamming or Trolling on the boards, this includes useless posts, trying to only increase post count or trying to deliberately create a flame war.
- No Hateful or Disrespectful posts. This includes: bashing, name-calling or insults directed at a board member.
- Click here for complete Hydrogenaudio Terms of Service

3 Pages V   1 2 3 >  
Reply to this topicStart new topic
New Lossy Audio Codec, JPEG is not an ideal audio codec
cabbagerat
post Oct 11 2003, 22:29
Post #1





Group: Members
Posts: 1018
Joined: 27-September 03
From: Cape Town
Member No.: 9042



While studying for a Fourier Analysis test some of my flatmates and I were discussing how well JPEG would encode music. Since both lossy audio codecs (MP3, Vorbis, etc) and JPEG operate on the same basic idea (discarding unimportant data in the frequency domain) we decided it would be an interesting thing to test.

So I wrote a shell script which does the following:
1) Takes a 10 second sample of an MP3 and converts it to 8bit 44100Hz raw PCM
2) Arranges the data into a square image and Jpegs it
3) Unjpegs it and converts back to raw PCM data
4) Creates a WAV from the raw sound

I used imagemagick and sox to perform all the necessary conversions.

Looking just at compression, JPEG performs very poorly compared to MP3. Obviously changing the JPEG quality factor made a big difference, but even at terrible quality the images were pretty large compared to the MP3.

We sat down, whipped out the abx program from the LAME source and very quickly decided that JPEG is not a great audio codec. At 95% quality the music was alright - similar quality to a 64kbps MP3. The music degraded quickly as we increased the compression. At 75% the music started sounding really horrible - with wierd artifacts unlike anything I had heard before. The samples were more or less recognisable up till about 20% quality factor, any less and we couldn't tell Al Dimeola from Springbok Nude Girls.

Several conclusions can be drawn from this test:
- Procrastination leads people to do all sorts of insane things
- MP3, Vorbis and the rest do all sorts of magic unrelated to just dumping data
- JPEG's habit of dividing an image into 9x9 pixel blocks produces some very strange artifacts, including what sounded like pre- and post-echos with up to a second delay

At this point some of the involved parties started blaming the fact that we were transcoding for the bad quality of the sound. Another student blamed my speaker cables. It was an interesting experiment. I was very surprised that the sound didn't come out completely mangled. smile.gif


--------------------
Simulate your radar: http://www.brooker.co.za/fers/
Go to the top of the page
+Quote Post
JohnV
post Oct 11 2003, 22:51
Post #2





Group: Developer
Posts: 2797
Joined: 22-September 01
Member No.: 6



Why didn't you use original wav source? smile.gif
I have absolutely no idea if it would have make difference, but doesn't JPG perform better with "smooth" rather than "dithered" data. Maybe the mp3 encoding makes the data more "dithered" (very unscientific description, but I'm tired, infact I don't know if I'm talking only BS), like it has some of the dc-coeffs 0...


--------------------
Juha Laaksonheimo
Go to the top of the page
+Quote Post
NumLOCK
post Oct 12 2003, 00:25
Post #3


Neutrino G-RSA developer


Group: Developer
Posts: 852
Joined: 8-May 02
From: Geneva
Member No.: 2002



JPEG performs very poorly on sound, because it's not continous across 8-pixel boundaries. Thus you get discontinuities every 8 samples, basically adding a square wave to your sound biggrin.gif

The more you compress, the more it becomes unusable.

You should definitely try JPEG2000 !!


--------------------
Try Leeloo Chat at http://leeloo.webhop.net
Go to the top of the page
+Quote Post
dreamliner77
post Oct 12 2003, 00:43
Post #4





Group: Members
Posts: 2150
Joined: 29-June 02
From: Boston
Member No.: 2427



THere was a program mention at HA probably about a year ago that did the same thing (converted wav's to jpg's and back). I forget the name of the program, i have it on my computer but seeing that I'm at a buddy's house, i can't remember. Maybe someone else remembers what i'm talking about.


--------------------
"You can fight without ever winning, but never win without a fight." Neil Peart 'Resist'
Go to the top of the page
+Quote Post
danchr
post Oct 12 2003, 01:26
Post #5





Group: Members
Posts: 487
Joined: 6-April 03
From: Århus, Denmark
Member No.: 5861



Have you tried the other way around? Compressing a picture using MP3?
Go to the top of the page
+Quote Post
ShootThemLater
post Oct 12 2003, 02:43
Post #6





Group: Members
Posts: 8
Joined: 18-May 03
Member No.: 6700



Do you think the RIAA's searches include pictures? If so, could this be the future of trading music on P2P networks? I'm sure I'm not the first person to have this thought. Could you also use PNG?
Go to the top of the page
+Quote Post
phong
post Oct 12 2003, 02:53
Post #7





Group: Members
Posts: 346
Joined: 7-July 03
From: 15 & Ryan
Member No.: 7619



PNG would be lossless, but would have much poorer compression than any lossless audio codec. With JPEG compression, you could produce better compression by using a "blur" filter, but I imagine it would have the same (or similar) effect as a combination of a lowpass plus echoes added before and after displaced by a number of samples equal to the width of the image.


--------------------
I am *expanding!* It is so much *squishy* to *smell* you! *Campers* are the best! I have *anticipation* and then what? Better parties in *the middle* for sure.
http://www.phong.org/
Go to the top of the page
+Quote Post
rjamorim
post Oct 12 2003, 03:32
Post #8


Rarewares admin


Group: Members
Posts: 7515
Joined: 30-September 01
From: Brazil
Member No.: 81



QUOTE (dreamliner77 @ Oct 11 2003, 08:43 PM)
Maybe someone else remembers what i'm talking about.

I posted it on a News thread, more than an year ago.

http://www.webcenter.ru/~vsoft/BitmapPlayer.htm


--------------------
Get up-to-date binaries of Lame, AAC, Vorbis and much more at RareWares:
http://www.rarewares.org
Go to the top of the page
+Quote Post
cabbagerat
post Oct 12 2003, 10:03
Post #9





Group: Members
Posts: 1018
Joined: 27-September 03
From: Cape Town
Member No.: 9042



danchr asked above whether I had tried compressing an image using MP3. I hadn't yet, but I decided to try it. Turns out, LAME --alt-preset standard isn't an all-bad image encoder. Compression is a bit dissapointing, but the picture quality isn't bad at all.

I exported a picture I took of my dog to PPM, stripped the header then converted it to wav with sox. I ran lame on it then used sox to export the raw sound data. I needed to strip off a pile of bytes at the beginning (about 2 lines worth) then re-add the ppm header.

Lame came in with a average bitrate of 130, which created a 340KB file from a 921KB original. Compared to JPEG, it's pretty poor compression, but a 3-1 ratio isn't that bad. The output image is a little bit softer with odd aliasing artifacts on fine details. The colour saturation also seems to have been increased.

Maybe with a carefully built image it will be possible to see stuff like pre-echo artifacts.

If anybody is interested you can get the input and output images from here:
Original Image
Output Image

Needless to say, the fact that I use a generic power cord on my PC means the MP3 sounds terrible.


--------------------
Simulate your radar: http://www.brooker.co.za/fers/
Go to the top of the page
+Quote Post
Ivan Dimkovic
post Oct 12 2003, 10:21
Post #10


Nero MPEG4 developer


Group: Developer
Posts: 1466
Joined: 22-September 01
Member No.: 8



QUOTE
While studying for a Fourier Analysis test some of my flatmates and I were discussing how well JPEG would encode music. Since both lossy audio codecs (MP3, Vorbis, etc) and JPEG operate on the same basic idea (discarding unimportant data in the frequency domain) we decided it would be an interesting thing to test.


That simply won't work well - because audio coders exploit the irrelevancy according to the human psychoacoustics, adding noise in frequency regions that are masked by outer-inner ear transfer and inner-ear processing.

Good audiovisual coders exploit the visual irrelevancy - so, you will end up with noise allocated in regions that do not correspond to psychovisual masking critereia.
Go to the top of the page
+Quote Post
sld
post Oct 12 2003, 11:04
Post #11





Group: Members
Posts: 1015
Joined: 4-March 03
From: Singapore
Member No.: 5312



QUOTE (cabbagerat @ Oct 12 2003, 05:03 PM)
Needless to say, the fact that I use a generic power cord on my PC means the MP3 sounds terrible.

You aren't kidding... are you?
Go to the top of the page
+Quote Post
Hanky
post Oct 12 2003, 13:40
Post #12





Group: Members (Donating)
Posts: 531
Joined: 18-November 01
From: The Netherlands
Member No.: 481



No meaning to be offensive but I know more pleasant ways to waste my time
Go to the top of the page
+Quote Post
Joe Bloggs
post Oct 12 2003, 14:37
Post #13





Group: Members
Posts: 375
Joined: 29-September 01
Member No.: 55



QUOTE (sld @ Oct 12 2003, 06:04 PM)
QUOTE (cabbagerat @ Oct 12 2003, 05:03 PM)
Needless to say, the fact that I use a generic power cord on my PC means the MP3 sounds terrible.

You aren't kidding... are you?

That really made me laugh laugh.gif
Go to the top of the page
+Quote Post
Pio2001
post Oct 12 2003, 19:27
Post #14


Moderator


Group: Super Moderator
Posts: 3936
Joined: 29-September 01
Member No.: 73



The artifacts in the dog picture are mostly horizontal lines. When the picture is converted to sound, is it scanned line after line ?

It's too bad that the sound is one dimentional while the picture is 2 dimentional.
This experiment with jpeg compression will mostly show the effects of the descanning-filtering-rescanning. I think that any other filter, as soon as it is a function of the neighborous pixels (blur, artistic effects...) would have given the same kind of sound artifacts.
The "pre and post echos up to one second delay" comes from the fact that you "listen" to the picture line after line. When one dot is blurred, it expands into the above and below lines of the picture, that are converted into sound data playing long before or long after the central dot.

You should get a fine pre/post echo effect applying a vertical motion blur on the picture instead of a jpeg compression wink.gif
Go to the top of the page
+Quote Post
dreamliner77
post Oct 12 2003, 20:04
Post #15





Group: Members
Posts: 2150
Joined: 29-June 02
From: Boston
Member No.: 2427



Yes Roberto that was it. Now I know I'm not crazy. Well, not that crazy anyway.


--------------------
"You can fight without ever winning, but never win without a fight." Neil Peart 'Resist'
Go to the top of the page
+Quote Post
n68
post Oct 12 2003, 20:29
Post #16


yup..


Group: Banned
Posts: 715
Joined: 1-February 02
Member No.: 1225



Gday..

just read this.. and it reminds me of a little util/prog.
called Camouflage.. it disguise the mp3. as jpeg.

it was a huge thing among streamload community
a couple years back.. still in use as far as i see.

and i can`t hear any "damage" on them

there is a few of those progs. and camouflage
is the best one..

for those who wan`t to try this out
http://www.freewaredownloads.de/cgi-bin/de...tail.cgi?ID=228
or do a google.
use a pic. template.. the compression adds the mp3 file
together with the pic. in a jpeg container..
with a option to add pwd. when uncamouflage the file..
you get the option to extract the mp3. or the pic..
the file becomes ca. 10Kb bigger..


smile.gif

This post has been edited by n68: Oct 12 2003, 20:39
Go to the top of the page
+Quote Post
Mac
post Oct 13 2003, 12:04
Post #17





Group: Members
Posts: 650
Joined: 28-July 02
From: B'ham UK
Member No.: 2828



That's just 'glueing' two files together, not encoding sound as jpeg smile.gif You are just hiding mp3 data inside a jpeg from what I remember?


--------------------
< w o g o n e . c o m / l o l >
Go to the top of the page
+Quote Post
NeoRenegade
post Oct 13 2003, 12:30
Post #18





Group: Members
Posts: 723
Joined: 29-November 01
Member No.: 563



Yup. And from what I remember it's not limited to MP3 and JPEG. I think you could stuff pretty much anything in.
Go to the top of the page
+Quote Post
n68
post Oct 13 2003, 15:50
Post #19


yup..


Group: Banned
Posts: 715
Joined: 1-February 02
Member No.: 1225



Gday..

@Mac.. i belive i wrote "reminds me".. wink.gif
i am totally aware of the fact that camouflage just
write a container with a different extension..
not encode/tranzcode the data.. smile.gif
Go to the top of the page
+Quote Post
Eugene
post Oct 13 2003, 16:20
Post #20





Group: Members
Posts: 2
Joined: 13-October 03
Member No.: 9285



actually i tried the other way round

picture -> mp3 -> picture

i dont think i will spend too much time on that, but a short easy way to proof_of_concept:

http://eugene.ath.cx/graphic2mp3/

ok ok... using mp3 onto the raw rgb data would have been better than to compress a tga header and fix up the resulting wav-data with a new tga header suxx...

but hey, u can see the image !!!!!!!!

but why on earth is it tuned around 180 degrees?!?!

anyways, was fun...

Eugene
Go to the top of the page
+Quote Post
Eugene
post Oct 13 2003, 16:30
Post #21





Group: Members
Posts: 2
Joined: 13-October 03
Member No.: 9285



ah, i forget the sizes :-(

tga (wav) size: 5760 kb

jpg size before mp3 conversion (90%) : 420 kb

jpg size after mp3 conversion (90%) : 1460 kb

(as expected more entropy in the decodes mp3-> wav -> tga)

mp3 size : 992 kb (--alt-preset standard) 544 kb (--alt-preset 128)

(as expected a compressor is better when it knows about the data to operate onto than a generic compressor or a compressor for such a different format)

Eugene
Go to the top of the page
+Quote Post
tuxp3
post Oct 31 2003, 02:19
Post #22





Group: Members
Posts: 1
Joined: 30-October 03
Member No.: 9560



y not try splitting up each second of audio into bitmaps then making them into a avi and compress them with xvid (1pass 100%)
here the catch the bitmaps have to be nearly lossless copys of the original (~10) sec wave after being rebulit..... if someone could make the 10 bitmaps for me i could do the rest smile.gif
just a crazy idea using video codecs to store music.......(with minimal loss of data)
thankz, tuxp3
in short
10sec mp3 -> raw pcm audio -> (10x) bmp pix -> 10frame avi -> xvid 100% quality -> then back
(as little sound data loss as possible)

This post has been edited by tuxp3: Oct 31 2003, 02:31
Go to the top of the page
+Quote Post
Niknak
post Nov 14 2003, 23:28
Post #23





Group: Members
Posts: 22
Joined: 22-September 03
Member No.: 8954



The problem is that if you write the audio file to the uncompressed image one line at a time left to right then your going to hit a JPEG block boundary every 8 pixels and you get 8 different parts of the audio within each block.

I think the best bet would be to walk through the image buffer following a Hilbert curve - that way you will get the highest correlation between samples that are close to each other in the audio and pixels that are close to each other in the image. You'll need to pad you data with zeros to make the image dimensions a power of 2.
Go to the top of the page
+Quote Post
Niknak
post Nov 14 2003, 23:36
Post #24





Group: Members
Posts: 22
Joined: 22-September 03
Member No.: 8954



Oh yeah, and do 8 bit audio and a monochrome image. I know JPEG only does colour images but you can convert to colour before compressing and back to mono after decompressing. The colour channels in the JPEG will compress down to virtually nothing.

Results will still be poor but will probably be the best you're going to get.

You can Google for a Hilbert Curve if you don't know what it is.
Go to the top of the page
+Quote Post
Doctor
post Nov 15 2003, 02:47
Post #25





Group: Members
Posts: 160
Joined: 16-January 03
Member No.: 4597



Also, if you want to feed a color picture to an audio codec, make sure to feed each color as a separate audio channel (24 bit -> 8-bit 3-channel). Should be relatively simple with a generic image editor, or if you can make the codec recognize 24 bits as three samples.
Go to the top of the page
+Quote Post

3 Pages V   1 2 3 >
Reply to this topicStart new topic
1 User(s) are reading this topic (1 Guests and 0 Anonymous Users)
0 Members:

 



RSS Lo-Fi Version Time is now: 21st April 2014 - 09:29