LAME applying 16kHz cutoff at 320K CBR

Topic: LAME applying 16kHz cutoff at 320K CBR (Read 26805 times) previous topic - next topic

0 Members and 1 Guest are viewing this topic.

LAME applying 16kHz cutoff at 320K CBR

2009-10-04 12:19:12

I have a recording from cassette tape in FLAC. From the spectral I can see it has frequencies up to 21kHz, but when I encode it to MP3 (V0 or 320K) with LAME, there's a clear cut-off at 16kHz (except for at the very end of the track when there's no music, just tape hiss, there's no cut-off here).

I assume this is normal behaviour, I'd like to know why this happens.

I can post a clip / spectral screenshots if needed.

LAME applying 16kHz cutoff at 320K CBR

Reply #1 – 2009-10-04 12:51:55

I don't know if that's normal. But if you recorded cassette tapes with home equipment the frequency range probably doesn't even go over 16 kHz. So maybe it was just noise over 16 kHz which LAME cut-off?

LAME applying 16kHz cutoff at 320K CBR

Reply #2 – 2009-10-04 12:56:43

It's sfb21 problem.

Quote

The issue is simple: sfb21. Due to the compromised design of mp3 (search for sfb21 to read all about it), to maintain much information in the frequencies above 16kHz, you have to over encode all lower frequencies - i.e. store them with greater accuracy than the psychacoustic model believes is necessary [...] and hence the higher the bitrate.

So LAME can decide to discard frequencies above 16 kHz and to spend more bits to more important frequencies.

LAME applying 16kHz cutoff at 320K CBR

Reply #3 – 2009-10-04 13:30:55

Makes sense, thanks. There's no actual music above about 11kHz anyway, just noise.

LAME applying 16kHz cutoff at 320K CBR

Reply #4 – 2009-10-09 07:14:02

Quote from: lvqcl on 2009-10-04 12:56:43

It's sfb21 problem.

Quote
The issue is simple: sfb21. Due to the compromised design of mp3 (search for sfb21 to read all about it), to maintain much information in the frequencies above 16kHz, you have to over encode all lower frequencies - i.e. store them with greater accuracy than the psychacoustic model believes is necessary [...] and hence the higher the bitrate.

So LAME can decide to discard frequencies above 16 kHz and to spend more bits to more important frequencies.

Sorry to rake up a 5-day-old thread, but I suspect the mechanism cited is wrong.

The OP talked about using -V 0 and CBR 320kbps. The sfb21 problem is worked around at V0 and the extra bits are expended to encode as the psy-model deems necessary. Only at -V 3 and lower quality is the -Y switch used, which in any case doesn't discard high frequencies, it simply encodes them with coarser amplitude resolution. Only if they're below the lowest quantization level available at the time would they then be completed eliminated rather than rounded up or rounded down in amplitude to the nearest available level.

What I think is really happening with this tape is that the psymodel is determining that the very small HF content is fully masked by the music. (ABX experiments over many year confirm that in the presence of real music, a 16kHz lowpass is transparent for most people). When the music stops, it's no longer masked. Also, possibly, the ATH thresholds are lowered in the absence of sound for a reasonable amount of time.

LAME applying 16kHz cutoff at 320K CBR

Reply #5 – 2009-10-09 10:34:53

Earlier versions Lame 3.93, 3.92, 3.90, 3.88 do not cut frequency on 16kHz. They keep all spectrum of frequencies, up to 22,05 kHz.

LAME applying 16kHz cutoff at 320K CBR

Reply #6 – 2009-10-09 10:40:55

You can try -b320 -q0 -k -m s

LAME applying 16kHz cutoff at 320K CBR

Reply #7 – 2009-10-09 11:42:17

You would you want to preserve frequencies you cannot hear anyway? If you want to preserve stuff you don't hear, you should consider a lossless medium instead.

LAME applying 16kHz cutoff at 320K CBR

Reply #8 – 2009-10-09 12:54:10

Quote from: UVSM on 2009-10-09 10:40:55

You can try -b320 -q0 -k -m s

I thought -k was disabled in 3.98.2?

And why would anyone choose -m s? This can only compromise quality.

LAME applying 16kHz cutoff at 320K CBR

Reply #9 – 2009-10-09 13:01:11

We've had this discussion countless of times.

Use search.

In short:
1. Don't use parameters for mp3-encodings.
1b. And don't use ancient mp3-encoders.
2. Don't judge audio-quality by looking at spectrograms
3. Use lossless if in doubt

LAME applying 16kHz cutoff at 320K CBR

Reply #10 – 2009-10-09 14:53:32

Quote from: odyssey on 2009-10-09 11:42:17

You would you want to preserve frequencies you cannot hear anyway? If you want to preserve stuff you don't hear, you should consider a lossless medium instead.

I'm not complaining here. I'm not looking for a "workaround" either, just curious what's going on that's all

LAME applying 16kHz cutoff at 320K CBR

Reply #11 – 2009-10-10 07:00:03

what frontend was used from flac to mp3? what commandline was used for lame and what version? my gutshot here is if the transcode was downsampled to 32kHz then you'll just get a max of 16kHz.

LAME applying 16kHz cutoff at 320K CBR

Reply #12 – 2009-10-10 11:32:17

Quote from: Rio on 2009-10-10 07:00:03

my gutshot here is if the transcode was downsampled to 32kHz then you'll just get a max of 16kHz.

Quote from: PlazzTT on 2009-10-04 12:19:12

[...] there's a clear cut-off at 16kHz (except for at the very end of the track when there's no music, just tape hiss, there's no cut-off here).

Shot failed

The most probable reason is psychoacoustics. (coupled with the fact that this is mp3). And the reason why older releases of LAME did not do it is because this is in fact an improvement (getting more bits in other places).

LAME applying 16kHz cutoff at 320K CBR

Reply #13 – 2009-10-10 12:23:55

Quote from: PlazzTT on 2009-10-09 14:53:32

I'm not complaining here. I'm not looking for a "workaround" either, just curious what's going on that's all

It's good to be curious. You wonder why lame throws out frequencies. The reason are simple, as JAZ also mentioned: lame is a LOSSY codec - Meaning that it will try it's best to preserve as much human-audible quality as possible for any given bitrate. Even 320kbit may not be sufficient for certain samples to be encoded without artifacts, so a lowpass to filter out any frequencies you can't hear is a very good reason to give the encoder a better chance of encoding the audible samples with less problems.

Anyone who wants to preserve inaudible frequencies (or sonics or other imaginable things) should use a lossless codec. The advantage of using a lossless codec is that you will have an archive of your music without ANY loss, and can at any time create smaller encodings even when new codecs are released and not worry about re-ripping CD's.

LAME applying 16kHz cutoff at 320K CBR

Reply #14 – 2009-10-11 04:08:06

It just also got me curious here. Why did a -V0 or 320 LAME encode filter frequencies above 16kHz? Even if LAME is a lossy codec, this should not be the case. A -V2 encode filters at 18.5kHz, -V0 at 20.5, but this at 16kHz?!

Even if "the more important frequencies" are preserved, it is still mind boggling. Was it really encoded with just "-V0 %s %d" or "-b 320 %s %d"?

Assuming that we don't know the source of the FLAC file (assuming it's not from cassette tape), LAME should still not be behaving this way.

"The most probable reason is psychoacoustics. (coupled with the fact that this is mp3). And the reason why older releases of LAME did not do it is because this is in fact an improvement (getting more bits in other places)."

Much as I strongly agree with [JAZ] regarding psychoacoustics and improvement from previous versions, it still really puzzles me why LAME in this case just got more bits below 16kHz instead of up to 20.5kHz. All of a sudden LAME recognized the audio file as coming from a cassette tape?!

LAME applying 16kHz cutoff at 320K CBR

Reply #15 – 2009-10-11 18:44:57

Quote from: Rio on 2009-10-11 04:08:06

Assuming that we don't know the source of the FLAC file (assuming it's not from cassette tape), LAME should still not be behaving this way.

...

Much as I strongly agree with [JAZ] regarding psychoacoustics and improvement from previous versions, it still really puzzles me why LAME in this case just got more bits below 16kHz instead of up to 20.5kHz. All of a sudden LAME recognized the audio file as coming from a cassette tape?!

This is a little lengthy, I'm afraid... We DO know that it's from a cassette tape source, so we can think about that and how LAME may behave in response to a typical digitzed cassette signal when it isn't told that it's from tape or given any special commandline instructions other than -V0 or -b320.

Consider these three points then read on:

(a): Dolby NR may be used to dramatically lower noise level below inherent tape noise and to correct back to the intended EQ of the mastering engineer (i.e. reverse the HF boost applied at manufacturing which enables Dolby NR to be used without dulling the consumer's signal). When making a "correct" cassette transfer Dolby "ought" to be turned on if it was on the original tape (as it would be for nearly any commercial release).

(b): Normal or Chrome Cassette tapes (apart from expensive METAL tapes, perhaps) do not typically support signal content to any more than about 15 or 16 kHz thanks to the tape speed chosen and the size of magnetic domains in those materials, and indeed many players set for normal or chrome or their Dolby circuits may include a low-pass at this frequency to avoid passing additional noise (or bias tones) to people's speakers (esp sensitive tweeters).

©: LAME uses both an ATH model (dynamically varying according to the average signal volume, if I recall correctly) and a masking model in its psychoacoustics. MP3 splits the audio into many sub-bands and encodes a scalefactor and a set of DCT transform coefficients in each, and can choose the number of bits (hence quantization noise) for each sub-band, that number of bits being available to each transform coefficient (effectively a coded frequency component) within the subband. The psychoacoustic model's aim is to ensure that for each frequency, the quantization noise introduced by coding more coarsely remains below the masking threshold, and thus inaudible. For any MP3 Frame it may be that some bands contain no signal above the masking threshold (which even may be as low as the ATH or absolute threshold of hearing) in that frequency range and can be completely zeroed, which requires very few bits to encode.

I may be slightly oversimplifying the psychoacoustic model, but I don't think I've made any gross errors that should colour your understanding.

So the overall system we're looking at is something like the following, and will have quite probably feature all three aspects above or at least (b) and © if Dolby NR is turned off in the consumer cassette player.

main studio master -> cassette master (Dolby boost & LPF probably applied) -> consumer cassette tape
-> consumer cassette player -> Dolby NR circuit (reverses Dolby boost thereby removing much hiss introduced by the tape material)
-> soundcard ADC to create PCM -> PCM digital audio (encoded as losslessly as FLAC)
-> LAME MP3 encoder -> MP3 file -> PCM lossy audio being played back or analyzed via spectrogram.

We start with a full audio spectrum signal on the studio master with very low noise.

By the end of the first line (consumer cassette tape) we have a reduced signal bandwidth of about 15-16 kHz at most, I'd suggest. In addition to the desired signal we have noise arising from the magnetic materials of the cassette tape.

By the end of the second line Dolby NR circuit output, we have the same 15-16 kHz of signal bandwidth plus some tape noise reduced by the Dolby circuit, plus some broad band noise at a low level from the electrical circuits used.

Essentially the same signal bandwidth, noise bandwidth and noise amplitude will be present in the FLAC file's audio at the end of the third line.

On the fourth line, the LAME encoder will only filter at about 19 or 20kHz, thus preserving the 15-16 kHz signal bandwidth and most of the noise bandwidth before analysis. The resulting MP3 file represents the audio in time frames within which LAME stores what amount to frequency coefficients, quantizing as coarsely as it believes will be inaudible. The coarseness to which it rounds each coefficient, is able to vary across the frequency spectrum by virtue of using sub-bands, each of which can have its own overall gain and bit-depth (which determines the maximum amplitude and relative precision, known as quantization noise). Imagine a coefficient with amplitude of 7, while LAME decides that rounding to the nearest multiple of 32 is acceptable giving the calculated minimum masking level or absolute threshold of hearing in that sub-band. That will round the coefficient down to zero every time. It will still do so if the allowed coarseness reaches 16, but might round up to 8 if the allowed coarseness of quantization is 8, for example.

Looking at spectrogam views of my own tape transfers (from a rather good Walkman WM36 with Dolby B) in years gone by, usually on CoolEdit96 or 2000 (now sold as Adobe Audition), I've noticed some content above 17kHz, but in reality, it's in very dark colours, meaning it's probably 60-80 dB below full scale with a subtle tone at around 19 kHz if I recall correctly, and I don't recall anything like snares or cymbal hits or similar transients spiking above 16kHz at all when I was careful to avoid digital clipping, which is indicative of the low-pass filtering of the intended signal. Most of the hiss I can hear on those Dolby B tape transfers in quiet inter-track sections is surely in the 10-16 kHz range or thereabouts (and probably lower too), and I hear no hiss while reasonably loud music is playing.

If indeed the noise left after Dolby B NR is something like 60-80 dB below full scale, I'd expect any music averaging perhaps -20 dBFS to -14 dBFS to provide enough masking to make all the noise left after Dolby B NR inaudible except for very quiet passages, fade-outs and intentional silences (lead-in/out and inter-track). That's my experience of listening quite carefully to discern noise, including one lightly instrumented duet ballad track mastered intentionally quietly adjacent to a full-on psychedlic rock cover version, which I tried to denoise sensitively, and listened closely to over many repeats.

I'd have thought LAME's excellent psy-model would likely draw roughly the same conclusion about where the noise is audible, and may indeed mark the 16kHz+ sub-bands as containing no audible content (all hiss, since most cassettes have no signal content above 16kHz, only tape noise) during even the quiet passages as well as the loud ones. Only where the intended signal (that on the studio master) is virtually or completely silent, and noise is all that's left, is it likely that there's not enough low-and-mid-frequency content for LAME to be sure that the 16kHz+ sub-band contains nothing above the masking level or ATH level and so will then encode it at non-zero amplitude. This may be partly due to using (by my recollection) a dynamically-varying ATH model, which assumes you might be playing it with more amplification during extended periods of low signal level (which some people will, when trying to discern words or notes in the deep fade out or when using an automatic gain control or a similar DSP like foo_dsp_vlevel with suitably aggressive settings). It may also be due simply to the relatively white noise spectrum, where high frequencies are as prominent as lower frequencies and are those not eliminated entirely by frequency masking effects.

In summary, cassette tape with decent NR is a special case by virtue of both the low noise (but still detectable in 16-bit 44.1kHz PCM) and the restricted signal bandwidth (about 15-16 kHz, say). This may in many cases cause LAME to exhibit just the effects described while behaving exactly as it should in considering human hearing and masking effects without having applied a filter itself. The boundary of the nearest subband to the original signal's low-pass filter may cause the noise to vanish totally and look like a sharp and continuous digital low-pass filter applied by LAME. This apparent effect may then vanish when the genuine signal falls to zero (or to about the same as the noise floor at any rate) and reveal at least some low-level noise content above 16 kHz (however coarsely quantized the 16 kHz+ band might be).

It may be, in fact, that loud transients in the signal (hi-hat, cymbal, snare drum, clicking noises) will show up on the spectrogram as bright vertical lines that actually stop short of the apparent 16kHz cut-off introduced by LAME, thereby indicating that the true musical signal was indeed lowpass-filtered at slightly less than 16 kHz, but within the MP3 sub-band that ends at 16 kHz. A close look at a spectrogram of the original FLACs might reveal this more clearly.

LAME applying 16kHz cutoff at 320K CBR

Reply #16 – 2009-10-11 19:08:26

Quote from: Rio on 2009-10-11 04:08:06

It just also got me curious here. Why did a -V0 or 320 LAME encode filter frequencies above 16kHz? Even if LAME is a lossy codec, this should not be the case. A -V2 encode filters at 18.5kHz, -V0 at 20.5, but this at 16kHz?!

Can you ABX a difference? If you can then I think your notion of "should" might carry weight; otherwise...

LAME applying 16kHz cutoff at 320K CBR

Reply #17 – 2009-10-12 00:41:58

@Dynamic: Thank you for the detailed explanation on cassette tape to MP3. It has become clearer to me when it comes to cassette tape content.

@greynol: I'll ABX if I said "20.5kHz sounds better than 16kHz". Otherwise...

LAME applying 16kHz cutoff at 320K CBR

Reply #18 – 2009-10-12 20:24:22

Your evasion noted; your baseless notion of "should" ignored.

LAME applying 16kHz cutoff at 320K CBR

Reply #19 – 2009-10-13 01:23:50

When talking about behavior (both the encoder and your personal behavior), there are expectations, therefore the "should" qualifier.

LAME at CBR 320 is expected to filter at 20.5kHz, therefore it "should" filter at
20.5kHz. If it does not, one would think there is something wrong, that's why the OP asked the question, that's why I also asked it.

JAZ and Dynamic was kind and patient enough to enlighten things, while you try to burden by throwing ABX on me.

If I make a quality claim, you expect me that I "should" ABX.

I did not make any quality claims in this thread, ergo expect me that I "should" not ABX.

Therefore, I did not evade.

LAME applying 16kHz cutoff at 320K CBR

Reply #20 – 2009-10-13 09:45:34

Quote from: Rio on 2009-10-13 01:23:50

LAME at CBR 320 is expected to filter at 20.5kHz

Expected?

LAME applying 16kHz cutoff at 320K CBR

Reply #21 – 2009-10-13 09:50:33

Yeah, why would you ever expect a lossy encoder to encode inaudible frequencies? Anyone who expects such thing has miserably misunderstood the concept of lossy encoders.

LAME applying 16kHz cutoff at 320K CBR

Reply #22 – 2009-10-13 11:45:49

check under the hood while LAME is running... I think it's called polyphase filter...

LAME applying 16kHz cutoff at 320K CBR

Reply #23 – 2009-10-13 12:55:19

Polyphase filter is just the first step in the process. Frequencies that pass the polyphase filter are still not guaranteed to be included.

LAME applying 16kHz cutoff at 320K CBR

Reply #24 – 2009-10-13 14:38:34

Thanks for very interesting answers, I'll upload a small FLAC clip later so you can see what I mean.

(again, I'm not complaining. I can't ABX the LAME encode even at V2, so it's not a problem, some people seem to be interest in the sample though)

Notice