What is this low-frequency content?

2012-04-26 09:32:23

Hello everyone! New member here

I'm a student in media technology and I'm currently in the process of writing a paper on lossy audio compression. I was recently studying some spectral figures comparing an audio file in its "original" Wave-format (imported straight from a CD in 16-bit 44,100 kHz) and two versions of the same audio file coded in 128 kbps AAC and 128 kbps Mp3. While just randomly fiddling around with the scale of the spectral view in Adobe Audition, learning how o use the program, I stumbled across this seemingly low frequency content appearing in the 128 kbps versions:

What is this low-frequency content?

Reply #1 – 2012-04-26 11:19:20

It appears to occur mostly during transients. It's possible there's even clipping, which produces white noise distortion. Try copying the files then applying negative gain in the decoder (e.g. use foobar2000 with Replaygain turned on in the converter dialogue).

What is this low-frequency content?

Reply #2 – 2012-04-26 14:08:20

I tried re-encoding the original Wave file with replaygain enabled in Foobar2000 (I hope that was what you meant? Being primarily a mac user I'm not very familiar with Foobar2000 overall) and the noise is still there, although it seems to be altered and kind of "shuffled around" quite a bit. It's still mostly prominent during transients though.

What is this low-frequency content?

Reply #3 – 2012-04-26 18:36:44

This is most probably a quantization noise spreading in frequency. Hard to tell without WAV files...

What is this low-frequency content?

Reply #4 – 2012-04-26 20:13:09

Here is a part of the original Wave file (cut due to file size. No other modifications done):

http://dl.dropbox.com/u/518147/El_Colibri_part.wav

Forgive me if I'm not fully understanding all of the inner workings of perceptual coding, but if the added noise in the coded files is indeed quantization noise, then the difference in the noise between Mp3 and AAC is surely due to the differences in the codecs regarding psychoacoustic model used, different technologies such as Temporal Noise Shaping and such?

I've been studying perceptual coding for quite some time now, but I find it to be such a huge subject and many parts are not that easy to comprehend. I'm finding it quite fascinating though.

What is this low-frequency content?

Reply #5 – 2012-05-04 14:57:15

Seem like there's no decoder clipping. Having now seen the file, it only reached full scale once in one channel and has plenty of headroom the rest of the time.

I just made an encoding using lame 3.98.4 (not the latest version, I know)

Code: [Select]

lame -V5 filename.wav

This created filename.wav.mp3 which averaged about 144 kbps of variable bitrate MP3. It's likely to use significantly higher bitrate at certain times (often transients) and lower bitrates at others. With such a lot of picking sounds on the guitar strings, and possibly some timing difference in when these transients reach left and right channels, I dare say this is more demanding of bitrate than most samples.

I then decoded this using

Code: [Select]

lame --decode filename.wav.mp3

to create filename.wav.mp3.wav

It also removed the timing offsets introduced by the encode-decode process, which most mp3 encoder-decoder pairs don't do.

I used an old version of Cool Edit 96 (predecessor to Adobe Audition) to view the spectrogram (mainly because I remembered where to find the menu to change spectral resolution to 2048 bands with Blackman window)

As you see in my capture image (not embeddable, so click the link), neither version exhibits significant content below 40Hz (displaying whole file but only lower portion of frequency spectrum):

http://www.mediafire.com/i/?56sa66550vsxbjk

I then repeated using lame without the -V5 option so it encodes to CBR 128kbps (same as -b 128 option) and found the same as you.

None of the areas involved were clipping (except maybe one towards the end of the sample). With so much picking during the piece, I'd imagine there could be significant transients throughout, so it's plausible that the encoder switches to short blocks (with poorer frequency resolution) and in the case of CBR 128 it doesn't have the available bits to encode with the accuracy that will result in no bleed-through and concentrates the bits available on encoding more accurately in the most important parts of the spectrum. But this is very hand-waving guess at the causes.

The important thing is not whether the spectrum looks identical but whether or not it sounds identical. Transients introduce temporal masking such that more distortion can go un-noticed for a short time after a transient (and a shorter time before it), so it might be that you can't hear the difference. The best way to find that out is to run an ABX test comparing the decoded MP3 or AAC to the original WAV. A spectrogram that looks great can sound awful (try the old BLADE encoder) and one that looks a poor match can sound indistinguishable (often LAME -V5 or so will look lacking in the treble area but sound perfect thanks to its very well tuned psychoacoustic model, which applies to the VBR modes but not CBR).

If you want to find out if it's significant, forget 'measurement' and rule-of-thumb engineering specs (like 20 Hz- 20 kHz range of human hearing rule of thumb) and use ABX to see how it sounds to the human being. That's absolutely necessary with psychoacoustic encoders in the presence of real music rather than test tones.

[edit: minor typos]

Notice