Why are 128kbit/s MP3s usually 44,100 Hz? |
![]() ![]() |
Why are 128kbit/s MP3s usually 44,100 Hz? |
Oct 14 2012, 01:44
Post
#1
|
|
|
Group: Members Posts: 30 Joined: 14-October 12 Member No.: 103838 |
Since 128kbit/s MP3s usually has a low-pass filter at 16,000 Hz, and the Nyquist-Shannon theorem states that all frequencies under f/2 Hz can be totally described at an f Hz sampling rate, why not encode at 32,000 Hz? Wouldn't it save more space?
|
|
|
|
Oct 14 2012, 01:53
Post
#2
|
|
|
Group: Members Posts: 96 Joined: 23-July 03 Member No.: 7935 |
... Wouldn't it save more space? Maybe. Maybe not. It depends. Try it with a few tracks and see. Remember to check if your encoder uses a lower cutoff than 16 KHz if it is encoding a 32 KHz sample rate signal. -------------------- Regards,
Don Hills |
|
|
|
Oct 14 2012, 04:03
Post
#3
|
|
|
Group: Members Posts: 4129 Joined: 2-September 02 Member No.: 3264 |
Since 128kbit/s MP3s usually has a low-pass filter at 16,000 Hz, and the Nyquist-Shannon theorem states that all frequencies under f/2 Hz can be totally described at an f Hz sampling rate, why not encode at 32,000 Hz? Wouldn't it save more space? Since encoding happens in the frequency domain anyway, there isn't much savings. It'll just not encode those frequencies, which is pretty close to downsampling, but much easier to implement. That said, at very low bitrates LAME does downsample. |
|
|
|
Oct 14 2012, 08:12
Post
#4
|
|
|
Group: Members Posts: 2257 Joined: 9-October 05 From: Dormagen, Germany Member No.: 25015 |
Compreession is more effective when using 32 kHz. Quality of tonal parts of the music improves. However pre-echo issues get worse, and you don't necessarily have a 16 kHz lowpass when using 128 kbps mp3 (though staying below 16 kHz is very adequate at this bitrate).
-------------------- lame3100i -V0.5+ --adbr_short 480
|
|
|
|
Oct 14 2012, 21:10
Post
#5
|
|
![]() Group: Members Posts: 349 Joined: 31-March 06 From: Houston, Texas Member No.: 29046 |
Indeed, many encoders *do* resample for low bit rates. I've never seen any that do it at 128 kb/s, but once you drop below 100 kb/s it's more common. I forgot exactly where it kicks in with LAME, but LAME does this if you set the quality low enough (V6 maybe? V7?)
-------------------- http://www.last.fm/user/sls/
|
|
|
|
Oct 15 2012, 03:02
Post
#6
|
|
|
Group: Members Posts: 3080 Joined: 1-September 05 From: SE Pennsylvania Member No.: 24233 |
A 128 kbps encoding is the same size regardless of the sample rate.
|
|
|
|
Oct 15 2012, 05:02
Post
#7
|
|
|
Group: Members Posts: 30 Joined: 14-October 12 Member No.: 103838 |
Since encoding happens in the frequency domain anyway, there isn't much savings. Listen to these test files and tell me what you think (24 vs 48 kHz, both with 11.5 kHz lpf): https://rapidshare.com/files/666351554/Baba...y24.11.5lpf.mp3 https://rapidshare.com/files/2623776446/Bab...y48.11.5lpf.mp3 QUOTE (halb27) pre-echo issues get worse If you ask me the attack sounds as good in the 24 kHz file as the 48 kHz, and everything else sounds better. What do you make of it? |
|
|
|
Oct 15 2012, 05:38
Post
#8
|
|
|
Group: Members Posts: 4129 Joined: 2-September 02 Member No.: 3264 |
If you ask me the attack sounds as good in the 24 kHz file as the 48 kHz, and everything else sounds better. What do you make of it? Didn't want to deal with rapidshare, but I bet at such a low bitrate the savings is more than worthwhile, since lame recommends it. Probably not so much at 128k though. |
|
|
|
Oct 15 2012, 05:43
Post
#9
|
|
|
Group: Members Posts: 2257 Joined: 9-October 05 From: Dormagen, Germany Member No.: 25015 |
If you listen to tonal problems at 128 kbps, you'll find that resampling to 32 kHz helps a lot.
-------------------- lame3100i -V0.5+ --adbr_short 480
|
|
|
|
Oct 16 2012, 18:07
Post
#10
|
|
![]() Group: Members Posts: 734 Joined: 17-September 06 Member No.: 35307 |
Listen to these test files and tell me what you think (24 vs 48 kHz, both with 11.5 kHz lpf): QUOTE (halb27) pre-echo issues get worse If you ask me the attack sounds as good in the 24 kHz file as the 48 kHz, and everything else sounds better. What do you make of it? Counterintuitively, actually, that's a different situation to comparing 32 kHz and either 44.1 kHz or 48 kHz and the attack should be expected to sound as good! MPEG-1 layer 3 uses 1152 sample frames at either 32, 44.1 or 48 kHz sampling rates MPEG-2 layer 3 uses 576 sample frames at either 16, 22.05 or 24 kHz sampling rates Thus, the short-block duration (in milliseconds) to handle pre-echo and transients is the same for both 24 kHz and 48 kHz. The full frame durations are: 36 ms for 16 or 32 kHz sampling rates 26 ms for 22.05 or 44.1 kHz sampling rates 24 ms for 24 or 48 kHz sampling rates and this duration can be divided into three short blocks of 192 samples each (and a third of the duration) to handle transients with greater time resolution for the same bitrate at the expense of worse frequency resolution for the same bitrate (very high bitrate overcomes this, but you're talking about CBR) So, to get the maximum 50% difference in short-block length, compare 32 kHz against 48 kHz or 16 kHz against 24 kHz (or indeed 32 kHz (poorer short block) against 24 kHz (better short blocks)). Converesely, as halb27 says, tonal problem samples are helped by the longer frame durations. |
|
|
|
Oct 16 2012, 18:46
Post
#11
|
|
![]() LAME developer Group: Developer Posts: 761 Joined: 22-September 01 Member No.: 5 |
MPEG-1 Layer 3 frames consist of 2 granules a 576 samples. So a long block at 32 kHz has 18 ms duration, a short block 6 ms.
|
|
|
|
Oct 16 2012, 19:39
Post
#12
|
|
![]() Group: Members Posts: 734 Joined: 17-September 06 Member No.: 35307 |
Oops, thanks Robert. If I could edit my post now, I would. I missed that step out, it's the granules, not the frames that are divided by three. Nonetheless, the relative difference in short-block lengths is still that it's 50% greater length at 16 or 32 kHz than it is at 24 or 48 kHz and that it's the same for both 24 and 48 kHz.
|
|
|
|
Oct 17 2012, 19:31
Post
#13
|
|
|
Group: Members Posts: 30 Joined: 14-October 12 Member No.: 103838 |
|
|
|
|
Oct 17 2012, 21:48
Post
#14
|
|
![]() Group: Members Posts: 734 Joined: 17-September 06 Member No.: 35307 |
tonal problems I'm not really sure what those are. Could you explain that for me please? OK, lets do this rather thoroughly... You can consider sounds to be of I guess three types: tonal, transient and continuous noise. Tonal is like a whistle - a pure note, with a sharp frequency response and quite often overtones (harmonics) at multiples of the base frequency which give the timbre or character of the instrument's note. Vocally, vowel sounds are tonal and can be sung. A chord - multiple notes played at once - is also tonal. A transient is like a click, a cymbal or hi-hat hit or the breathy or plucked onset of a note from an instrument. (as an aside: often the type of onset gives the human as much information about the instrument as the timbre, which is why although first gen synthesizers tried mainly to reproduce timbre and overtones, later generations improved onset transients for more realism). In the frequency domain, most transients are spread over a wide spectrum (like noise) but in the time domain they are of short duration. Vocally, plosive consonant sounds and similar like p, b, f, k, t have transient nature. Continuous noise is largely uncorrelated to previous samples, it's an essentially random signal that has components over a broad frequency spectrum. While transients are noiselike in the frequency domain - a broad spectrum with little in the way of frequency peaks - they last only a short time. A brushed snare drum or tape hiss is a good example of continuous noise. Vocally, breath sounds such as sh, ss, ff, dh/th are continuous noiselike sounds. So the word COSINE for example starts with a sharp transient C with a clicking noise the long O is a tonal, singable vowel the S is noiselike and lasts longer than the transient C without a particularly sharp onset the I is a tonal, singable vowel and the N is mostly tonal with a fairly abrupt but not really transient ending. (the E is silent and modifies the vowel sound represented by letter I) To accurately match the frequency or pitch of a slow-varying tonal signal, a long block in a transform codec, provides more points in the frequency domain, each representing a narrower frequency band. The frequency resolution can be said to be good. Because of the long duration, the time resolution is poor. Imagine a grossly oversimplified example pretty much plucked from thin air: If you imagine a bunch of frequency components in a tonal signal represented as decimal integer numbers, in a long block lasting 12 ms a small selection of them (12 by coincidence only) might be: CODE 160Hz 240Hz 320Hz 400Hz 480Hz 560Hz 640Hz 720Hz 800Hz 880Hz 960Hz 1040Hz 30 119 475 879 3049 10234 4520 960 214 53 178 422 If you need to encode them only to the nearest 100, say, you might then get CODE 160Hz 240Hz 320Hz 400Hz 480Hz 560Hz 640Hz 720Hz 800Hz 880Hz 960Hz 1040Hz 0 100 500 900 3000 10200 4500 1000 200 100 200 400 and being all zero in the units and tens digits, we only need to send the higher digits (hundreds, thousands, tens-of-thousands etc). This is similar to how we save bitrate in lossy encoding compared to lossless. This gives a pretty good match for the frequency and amplitude of that tone when reconstructed, which our psychoacoustic model tells us is indistinguishable from the original on this occasion. In a short block, there might be only a third of the number of samples, and a third of the number of frequency bins, each representing a 3-times wider frequency band but over a shorter time (e.g. 4ms), so while the frequency resolution is poor, the time resolution to represent rapid changes in the signal is good. The same bunch of frequencies over the same 12 ms is now divided into three short blocks, but instead of 12 frequency components in one long block, there are just 4 frequency components, each of which is three times broader in bandwidth, in each of three short-time blocks, lasting 4ms each. CODE first 4ms | second 4ms | third 4ms 240Hz 480Hz 720Hz 960Hz | 240Hz 480Hz 720Hz 960Hz | 240Hz 480Hz 720Hz 960Hz 375 3849 2022 111 | 208 4721 1898 218 | 142 5431 1468 102 If the psychoacoustic model has detected that there's a transient and requested a short block, it might well assume that the frequency spectrum is fairly broad, which is true for purely transient noiselike signals like hi-hat cymbals, and might calculate that rounding to the nearest 100, say, is enough: CODE first 4ms | second 4ms | third 4ms 240Hz 480Hz 720Hz 960Hz | 240Hz 480Hz 720Hz 960Hz | 240Hz 480Hz 720Hz 960Hz 400 3800 2000 100 | 200 4700 1900 200 | 100 5400 1500 100 However, there are cases where there is both a strong transient component and a strong tonal component. One example I've tested a few times is the problem sample Angels Fall First. This has a close-microphone on the right-channel guitarist's pick, producing strong clicking sounds (transients) as each string is picked. The string's notes are strongly tonal and the first string continues to sound as the next string is picked. My guess is that the click of the pick triggers a short block to capture the short-duration sound. However, the bandwidth of each frequency bin in these three short blocks is a good deal broader now and if the same rounding accuracy (e.g. to nearest 100 in the above example) is provided, it sounds as though the frequency or amplitude of the continuous tones from the strings the sound throughout is wavering slightly. To encode both the short time of the transient and preserve the sharp frequency spectrum of the tonal part of the signal over the whole time, a very high bitrate is required to produce finer-than-usual rounding accuracy for these broad bandwidth frequency bins to still result in fine frequency precision. This is a large part of what halb27's lame3.99.5z version does in the -Vn+ and -V0+eco settings when a short block is triggered and I think it's why it solves the Angels Fall First problem sample. (The fact that there's a trade off between fine rounding accuracy and high time & frequency precision is a subtle mathematical point in the field of windowed overlapping Fourier Transforms, that's too advanced to explain in this context. There's some hope that the new Opus codec's band-by-band time/frequency preference will allow some frequency ranges to encode tonal components at low bitrate with poor time resolution while simultaneously providing good time resolution at low bitrate to other frequency bands.) |
|
|
|
Oct 17 2012, 22:19
Post
#15
|
|
|
Group: Members Posts: 2257 Joined: 9-October 05 From: Dormagen, Germany Member No.: 25015 |
I'm not really sure what those are. ... As an example for a real ugly tonal problem encode lead-voice using any VBR level you like and listen to the first 2 seconds. (BTW there's hope for the future: robert gave me a pre-3.100 version for testing which greatly improves upon samples like this.) This post has been edited by halb27: Oct 17 2012, 22:23 -------------------- lame3100i -V0.5+ --adbr_short 480
|
|
|
|
Oct 18 2012, 15:24
Post
#16
|
|
![]() Group: Members Posts: 452 Joined: 31-May 04 From: Czech Rep. Member No.: 14430 |
The problem is - how exactly do you resample a 44.1kHz signal to 32kHz so that you preserve everything up to 15999Hz perfectly?
-------------------- HD 238 Sansa Clip+ Vorbis q6; HD 380 Xonar DX FB2k FLAC
|
|
|
|
Oct 18 2012, 16:55
Post
#17
|
|
|
Group: Members Posts: 2257 Joined: 9-October 05 From: Dormagen, Germany Member No.: 25015 |
Sure you can't get 16 kHz bandwidth, but something like 15 kHz or a little bit more if you don't want to run into audible artifact issues due to resampling.
Not a big differerence however but a valid point. This post has been edited by halb27: Oct 18 2012, 16:55 -------------------- lame3100i -V0.5+ --adbr_short 480
|
|
|
|
![]() ![]() |
|
Lo-Fi Version | Time is now: 19th May 2013 - 16:37 |