New Lossy Audio Codec, JPEG is not an ideal audio codec |
- No Warez. This includes warez links, cracks and/or requests for help in getting illegal software or copyrighted music tracks!
- No Spamming or Trolling on the boards, this includes useless posts, trying to only increase post count or trying to deliberately create a flame war.
- No Hateful or Disrespectful posts. This includes: bashing, name-calling or insults directed at a board member.
- Click here for complete Hydrogenaudio Terms of Service
![]() ![]() |
New Lossy Audio Codec, JPEG is not an ideal audio codec |
Oct 11 2003, 22:29
Post
#1
|
|
![]() Group: Members Posts: 1018 Joined: 27-September 03 From: Cape Town Member No.: 9042 |
While studying for a Fourier Analysis test some of my flatmates and I were discussing how well JPEG would encode music. Since both lossy audio codecs (MP3, Vorbis, etc) and JPEG operate on the same basic idea (discarding unimportant data in the frequency domain) we decided it would be an interesting thing to test.
So I wrote a shell script which does the following: 1) Takes a 10 second sample of an MP3 and converts it to 8bit 44100Hz raw PCM 2) Arranges the data into a square image and Jpegs it 3) Unjpegs it and converts back to raw PCM data 4) Creates a WAV from the raw sound I used imagemagick and sox to perform all the necessary conversions. Looking just at compression, JPEG performs very poorly compared to MP3. Obviously changing the JPEG quality factor made a big difference, but even at terrible quality the images were pretty large compared to the MP3. We sat down, whipped out the abx program from the LAME source and very quickly decided that JPEG is not a great audio codec. At 95% quality the music was alright - similar quality to a 64kbps MP3. The music degraded quickly as we increased the compression. At 75% the music started sounding really horrible - with wierd artifacts unlike anything I had heard before. The samples were more or less recognisable up till about 20% quality factor, any less and we couldn't tell Al Dimeola from Springbok Nude Girls. Several conclusions can be drawn from this test: - Procrastination leads people to do all sorts of insane things - MP3, Vorbis and the rest do all sorts of magic unrelated to just dumping data - JPEG's habit of dividing an image into 9x9 pixel blocks produces some very strange artifacts, including what sounded like pre- and post-echos with up to a second delay At this point some of the involved parties started blaming the fact that we were transcoding for the bad quality of the sound. Another student blamed my speaker cables. It was an interesting experiment. I was very surprised that the sound didn't come out completely mangled. -------------------- Simulate your radar: http://www.brooker.co.za/fers/
|
|
|
|
Oct 11 2003, 22:51
Post
#2
|
|
![]() Group: Developer Posts: 2797 Joined: 22-September 01 Member No.: 6 |
Why didn't you use original wav source?
I have absolutely no idea if it would have make difference, but doesn't JPG perform better with "smooth" rather than "dithered" data. Maybe the mp3 encoding makes the data more "dithered" (very unscientific description, but I'm tired, infact I don't know if I'm talking only BS), like it has some of the dc-coeffs 0... -------------------- Juha Laaksonheimo
|
|
|
|
Oct 12 2003, 00:25
Post
#3
|
|
|
Neutrino G-RSA developer Group: Developer Posts: 852 Joined: 8-May 02 From: Geneva Member No.: 2002 |
JPEG performs very poorly on sound, because it's not continous across 8-pixel boundaries. Thus you get discontinuities every 8 samples, basically adding a square wave to your sound
The more you compress, the more it becomes unusable. You should definitely try JPEG2000 !! -------------------- Try Leeloo Chat at http://leeloo.webhop.net
|
|
|
|
Oct 12 2003, 00:43
Post
#4
|
|
![]() Group: Members Posts: 2144 Joined: 29-June 02 From: Boston Member No.: 2427 |
THere was a program mention at HA probably about a year ago that did the same thing (converted wav's to jpg's and back). I forget the name of the program, i have it on my computer but seeing that I'm at a buddy's house, i can't remember. Maybe someone else remembers what i'm talking about.
-------------------- "You can fight without ever winning, but never win without a fight." Neil Peart 'Resist'
|
|
|
|
Oct 12 2003, 01:26
Post
#5
|
|
![]() Group: Members Posts: 487 Joined: 6-April 03 From: Århus, Denmark Member No.: 5861 |
Have you tried the other way around? Compressing a picture using MP3?
|
|
|
|
Oct 12 2003, 02:43
Post
#6
|
|
|
Group: Members Posts: 8 Joined: 18-May 03 Member No.: 6700 |
Do you think the RIAA's searches include pictures? If so, could this be the future of trading music on P2P networks? I'm sure I'm not the first person to have this thought. Could you also use PNG?
|
|
|
|
Oct 12 2003, 02:53
Post
#7
|
|
![]() Group: Members Posts: 346 Joined: 7-July 03 From: 15 & Ryan Member No.: 7619 |
PNG would be lossless, but would have much poorer compression than any lossless audio codec. With JPEG compression, you could produce better compression by using a "blur" filter, but I imagine it would have the same (or similar) effect as a combination of a lowpass plus echoes added before and after displaced by a number of samples equal to the width of the image.
-------------------- I am *expanding!* It is so much *squishy* to *smell* you! *Campers* are the best! I have *anticipation* and then what? Better parties in *the middle* for sure.
http://www.phong.org/ |
|
|
|
Oct 12 2003, 03:32
Post
#8
|
|
![]() Rarewares admin Group: Members Posts: 7515 Joined: 30-September 01 From: Brazil Member No.: 81 |
QUOTE (dreamliner77 @ Oct 11 2003, 08:43 PM) Maybe someone else remembers what i'm talking about. I posted it on a News thread, more than an year ago. http://www.webcenter.ru/~vsoft/BitmapPlayer.htm -------------------- Get up-to-date binaries of Lame, AAC, Vorbis and much more at RareWares:
http://www.rarewares.org |
|
|
|
Oct 12 2003, 10:03
Post
#9
|
|
![]() Group: Members Posts: 1018 Joined: 27-September 03 From: Cape Town Member No.: 9042 |
danchr asked above whether I had tried compressing an image using MP3. I hadn't yet, but I decided to try it. Turns out, LAME --alt-preset standard isn't an all-bad image encoder. Compression is a bit dissapointing, but the picture quality isn't bad at all.
I exported a picture I took of my dog to PPM, stripped the header then converted it to wav with sox. I ran lame on it then used sox to export the raw sound data. I needed to strip off a pile of bytes at the beginning (about 2 lines worth) then re-add the ppm header. Lame came in with a average bitrate of 130, which created a 340KB file from a 921KB original. Compared to JPEG, it's pretty poor compression, but a 3-1 ratio isn't that bad. The output image is a little bit softer with odd aliasing artifacts on fine details. The colour saturation also seems to have been increased. Maybe with a carefully built image it will be possible to see stuff like pre-echo artifacts. If anybody is interested you can get the input and output images from here: Original Image Output Image Needless to say, the fact that I use a generic power cord on my PC means the MP3 sounds terrible. -------------------- Simulate your radar: http://www.brooker.co.za/fers/
|
|
|
|
Oct 12 2003, 10:21
Post
#10
|
|
|
Nero MPEG4 developer Group: Developer Posts: 1466 Joined: 22-September 01 Member No.: 8 |
QUOTE While studying for a Fourier Analysis test some of my flatmates and I were discussing how well JPEG would encode music. Since both lossy audio codecs (MP3, Vorbis, etc) and JPEG operate on the same basic idea (discarding unimportant data in the frequency domain) we decided it would be an interesting thing to test. That simply won't work well - because audio coders exploit the irrelevancy according to the human psychoacoustics, adding noise in frequency regions that are masked by outer-inner ear transfer and inner-ear processing. Good audiovisual coders exploit the visual irrelevancy - so, you will end up with noise allocated in regions that do not correspond to psychovisual masking critereia. |
|
|
|
Oct 12 2003, 11:04
Post
#11
|
|
![]() Group: Members Posts: 1015 Joined: 4-March 03 From: Singapore Member No.: 5312 |
QUOTE (cabbagerat @ Oct 12 2003, 05:03 PM) Needless to say, the fact that I use a generic power cord on my PC means the MP3 sounds terrible. You aren't kidding... are you? |
|
|
|
Oct 12 2003, 13:40
Post
#12
|
|
|
Group: Members (Donating) Posts: 531 Joined: 18-November 01 From: The Netherlands Member No.: 481 |
No meaning to be offensive but I know more pleasant ways to waste my time
|
|
|
|
Oct 12 2003, 14:37
Post
#13
|
|
|
Group: Members Posts: 367 Joined: 29-September 01 Member No.: 55 |
QUOTE (sld @ Oct 12 2003, 06:04 PM) QUOTE (cabbagerat @ Oct 12 2003, 05:03 PM) Needless to say, the fact that I use a generic power cord on my PC means the MP3 sounds terrible. You aren't kidding... are you? That really made me laugh |
|
|
|
Oct 12 2003, 19:27
Post
#14
|
|
|
Moderator Group: Super Moderator Posts: 3934 Joined: 29-September 01 Member No.: 73 |
The artifacts in the dog picture are mostly horizontal lines. When the picture is converted to sound, is it scanned line after line ?
It's too bad that the sound is one dimentional while the picture is 2 dimentional. This experiment with jpeg compression will mostly show the effects of the descanning-filtering-rescanning. I think that any other filter, as soon as it is a function of the neighborous pixels (blur, artistic effects...) would have given the same kind of sound artifacts. The "pre and post echos up to one second delay" comes from the fact that you "listen" to the picture line after line. When one dot is blurred, it expands into the above and below lines of the picture, that are converted into sound data playing long before or long after the central dot. You should get a fine pre/post echo effect applying a vertical motion blur on the picture instead of a jpeg compression |
|
|
|
Oct 12 2003, 20:04
Post
#15
|
|
![]() Group: Members Posts: 2144 Joined: 29-June 02 From: Boston Member No.: 2427 |
Yes Roberto that was it. Now I know I'm not crazy. Well, not that crazy anyway.
-------------------- "You can fight without ever winning, but never win without a fight." Neil Peart 'Resist'
|
|
|
|
Oct 12 2003, 20:29
Post
#16
|
|
![]() yup.. Group: Banned Posts: 715 Joined: 1-February 02 Member No.: 1225 |
Gday..
just read this.. and it reminds me of a little util/prog. called Camouflage.. it disguise the mp3. as jpeg. it was a huge thing among streamload community a couple years back.. still in use as far as i see. and i can`t hear any "damage" on them there is a few of those progs. and camouflage is the best one.. for those who wan`t to try this out http://www.freewaredownloads.de/cgi-bin/de...tail.cgi?ID=228 or do a google. use a pic. template.. the compression adds the mp3 file together with the pic. in a jpeg container.. with a option to add pwd. when uncamouflage the file.. you get the option to extract the mp3. or the pic.. the file becomes ca. 10Kb bigger.. This post has been edited by n68: Oct 12 2003, 20:39 |
|
|
|
Oct 13 2003, 12:04
Post
#17
|
|
![]() Group: Members Posts: 650 Joined: 28-July 02 From: B'ham UK Member No.: 2828 |
That's just 'glueing' two files together, not encoding sound as jpeg
-------------------- < w o g o n e . c o m / l o l >
|
|
|
|
Oct 13 2003, 12:30
Post
#18
|
|
![]() Group: Members Posts: 723 Joined: 29-November 01 Member No.: 563 |
Yup. And from what I remember it's not limited to MP3 and JPEG. I think you could stuff pretty much anything in.
|
|
|
|
Oct 13 2003, 15:50
Post
#19
|
|
![]() yup.. Group: Banned Posts: 715 Joined: 1-February 02 Member No.: 1225 |
Gday..
@Mac.. i belive i wrote "reminds me".. i am totally aware of the fact that camouflage just write a container with a different extension.. not encode/tranzcode the data.. |
|
|
|
Oct 13 2003, 16:20
Post
#20
|
|
|
Group: Members Posts: 2 Joined: 13-October 03 Member No.: 9285 |
actually i tried the other way round
picture -> mp3 -> picture i dont think i will spend too much time on that, but a short easy way to proof_of_concept: http://eugene.ath.cx/graphic2mp3/ ok ok... using mp3 onto the raw rgb data would have been better than to compress a tga header and fix up the resulting wav-data with a new tga header suxx... but hey, u can see the image !!!!!!!! but why on earth is it tuned around 180 degrees?!?! anyways, was fun... Eugene |
|
|
|
Oct 13 2003, 16:30
Post
#21
|
|
|
Group: Members Posts: 2 Joined: 13-October 03 Member No.: 9285 |
ah, i forget the sizes :-(
tga (wav) size: 5760 kb jpg size before mp3 conversion (90%) : 420 kb jpg size after mp3 conversion (90%) : 1460 kb (as expected more entropy in the decodes mp3-> wav -> tga) mp3 size : 992 kb (--alt-preset standard) 544 kb (--alt-preset 128) (as expected a compressor is better when it knows about the data to operate onto than a generic compressor or a compressor for such a different format) Eugene |
|
|
|
Oct 31 2003, 02:19
Post
#22
|
|
|
Group: Members Posts: 1 Joined: 30-October 03 Member No.: 9560 |
y not try splitting up each second of audio into bitmaps then making them into a avi and compress them with xvid (1pass 100%)
here the catch the bitmaps have to be nearly lossless copys of the original (~10) sec wave after being rebulit..... if someone could make the 10 bitmaps for me i could do the rest just a crazy idea using video codecs to store music.......(with minimal loss of data) thankz, tuxp3 in short 10sec mp3 -> raw pcm audio -> (10x) bmp pix -> 10frame avi -> xvid 100% quality -> then back (as little sound data loss as possible) This post has been edited by tuxp3: Oct 31 2003, 02:31 |
|
|
|
Nov 14 2003, 23:28
Post
#23
|
|
|
Group: Members Posts: 22 Joined: 22-September 03 Member No.: 8954 |
The problem is that if you write the audio file to the uncompressed image one line at a time left to right then your going to hit a JPEG block boundary every 8 pixels and you get 8 different parts of the audio within each block.
I think the best bet would be to walk through the image buffer following a Hilbert curve - that way you will get the highest correlation between samples that are close to each other in the audio and pixels that are close to each other in the image. You'll need to pad you data with zeros to make the image dimensions a power of 2. |
|
|
|
Nov 14 2003, 23:36
Post
#24
|
|
|
Group: Members Posts: 22 Joined: 22-September 03 Member No.: 8954 |
Oh yeah, and do 8 bit audio and a monochrome image. I know JPEG only does colour images but you can convert to colour before compressing and back to mono after decompressing. The colour channels in the JPEG will compress down to virtually nothing.
Results will still be poor but will probably be the best you're going to get. You can Google for a Hilbert Curve if you don't know what it is. |
|
|
|
Nov 15 2003, 02:47
Post
#25
|
|
|
Group: Members Posts: 160 Joined: 16-January 03 Member No.: 4597 |
Also, if you want to feed a color picture to an audio codec, make sure to feed each color as a separate audio channel (24 bit -> 8-bit 3-channel). Should be relatively simple with a generic image editor, or if you can make the codec recognize 24 bits as three samples.
|
|
|
|
![]() ![]() |
|
Lo-Fi Version | Time is now: 21st May 2013 - 14:27 |