LAME applying 16kHz cutoff at 320K CBR
Reply #15 – 2009-10-11 18:44:57
Assuming that we don't know the source of the FLAC file (assuming it's not from cassette tape), LAME should still not be behaving this way. ... Much as I strongly agree with [JAZ] regarding psychoacoustics and improvement from previous versions, it still really puzzles me why LAME in this case just got more bits below 16kHz instead of up to 20.5kHz. All of a sudden LAME recognized the audio file as coming from a cassette tape?! This is a little lengthy, I'm afraid... We DO know that it's from a cassette tape source, so we can think about that and how LAME may behave in response to a typical digitzed cassette signal when it isn't told that it's from tape or given any special commandline instructions other than -V0 or -b320. Consider these three points then read on: (a): Dolby NR may be used to dramatically lower noise level below inherent tape noise and to correct back to the intended EQ of the mastering engineer (i.e. reverse the HF boost applied at manufacturing which enables Dolby NR to be used without dulling the consumer's signal). When making a "correct" cassette transfer Dolby "ought" to be turned on if it was on the original tape (as it would be for nearly any commercial release). (b): Normal or Chrome Cassette tapes (apart from expensive METAL tapes, perhaps) do not typically support signal content to any more than about 15 or 16 kHz thanks to the tape speed chosen and the size of magnetic domains in those materials, and indeed many players set for normal or chrome or their Dolby circuits may include a low-pass at this frequency to avoid passing additional noise (or bias tones) to people's speakers (esp sensitive tweeters). ©: LAME uses both an ATH model (dynamically varying according to the average signal volume, if I recall correctly) and a masking model in its psychoacoustics. MP3 splits the audio into many sub-bands and encodes a scalefactor and a set of DCT transform coefficients in each, and can choose the number of bits (hence quantization noise) for each sub-band, that number of bits being available to each transform coefficient (effectively a coded frequency component) within the subband. The psychoacoustic model's aim is to ensure that for each frequency, the quantization noise introduced by coding more coarsely remains below the masking threshold, and thus inaudible. For any MP3 Frame it may be that some bands contain no signal above the masking threshold (which even may be as low as the ATH or absolute threshold of hearing) in that frequency range and can be completely zeroed, which requires very few bits to encode. I may be slightly oversimplifying the psychoacoustic model, but I don't think I've made any gross errors that should colour your understanding. So the overall system we're looking at is something like the following, and will have quite probably feature all three aspects above or at least (b) and © if Dolby NR is turned off in the consumer cassette player. main studio master -> cassette master (Dolby boost & LPF probably applied) -> consumer cassette tape -> consumer cassette player -> Dolby NR circuit (reverses Dolby boost thereby removing much hiss introduced by the tape material) -> soundcard ADC to create PCM -> PCM digital audio (encoded as losslessly as FLAC) -> LAME MP3 encoder -> MP3 file -> PCM lossy audio being played back or analyzed via spectrogram. We start with a full audio spectrum signal on the studio master with very low noise. By the end of the first line (consumer cassette tape) we have a reduced signal bandwidth of about 15-16 kHz at most, I'd suggest. In addition to the desired signal we have noise arising from the magnetic materials of the cassette tape. By the end of the second line Dolby NR circuit output, we have the same 15-16 kHz of signal bandwidth plus some tape noise reduced by the Dolby circuit, plus some broad band noise at a low level from the electrical circuits used. Essentially the same signal bandwidth, noise bandwidth and noise amplitude will be present in the FLAC file's audio at the end of the third line. On the fourth line, the LAME encoder will only filter at about 19 or 20kHz, thus preserving the 15-16 kHz signal bandwidth and most of the noise bandwidth before analysis. The resulting MP3 file represents the audio in time frames within which LAME stores what amount to frequency coefficients, quantizing as coarsely as it believes will be inaudible. The coarseness to which it rounds each coefficient, is able to vary across the frequency spectrum by virtue of using sub-bands, each of which can have its own overall gain and bit-depth (which determines the maximum amplitude and relative precision, known as quantization noise). Imagine a coefficient with amplitude of 7, while LAME decides that rounding to the nearest multiple of 32 is acceptable giving the calculated minimum masking level or absolute threshold of hearing in that sub-band. That will round the coefficient down to zero every time. It will still do so if the allowed coarseness reaches 16, but might round up to 8 if the allowed coarseness of quantization is 8, for example. Looking at spectrogam views of my own tape transfers (from a rather good Walkman WM36 with Dolby B) in years gone by, usually on CoolEdit96 or 2000 (now sold as Adobe Audition), I've noticed some content above 17kHz, but in reality, it's in very dark colours, meaning it's probably 60-80 dB below full scale with a subtle tone at around 19 kHz if I recall correctly, and I don't recall anything like snares or cymbal hits or similar transients spiking above 16kHz at all when I was careful to avoid digital clipping, which is indicative of the low-pass filtering of the intended signal. Most of the hiss I can hear on those Dolby B tape transfers in quiet inter-track sections is surely in the 10-16 kHz range or thereabouts (and probably lower too), and I hear no hiss while reasonably loud music is playing. If indeed the noise left after Dolby B NR is something like 60-80 dB below full scale, I'd expect any music averaging perhaps -20 dBFS to -14 dBFS to provide enough masking to make all the noise left after Dolby B NR inaudible except for very quiet passages, fade-outs and intentional silences (lead-in/out and inter-track). That's my experience of listening quite carefully to discern noise, including one lightly instrumented duet ballad track mastered intentionally quietly adjacent to a full-on psychedlic rock cover version, which I tried to denoise sensitively, and listened closely to over many repeats. I'd have thought LAME's excellent psy-model would likely draw roughly the same conclusion about where the noise is audible, and may indeed mark the 16kHz+ sub-bands as containing no audible content (all hiss, since most cassettes have no signal content above 16kHz, only tape noise) during even the quiet passages as well as the loud ones. Only where the intended signal (that on the studio master) is virtually or completely silent, and noise is all that's left, is it likely that there's not enough low-and-mid-frequency content for LAME to be sure that the 16kHz+ sub-band contains nothing above the masking level or ATH level and so will then encode it at non-zero amplitude. This may be partly due to using (by my recollection) a dynamically-varying ATH model, which assumes you might be playing it with more amplification during extended periods of low signal level (which some people will, when trying to discern words or notes in the deep fade out or when using an automatic gain control or a similar DSP like foo_dsp_vlevel with suitably aggressive settings). It may also be due simply to the relatively white noise spectrum, where high frequencies are as prominent as lower frequencies and are those not eliminated entirely by frequency masking effects.In summary , cassette tape with decent NR is a special case by virtue of both the low noise (but still detectable in 16-bit 44.1kHz PCM) and the restricted signal bandwidth (about 15-16 kHz, say). This may in many cases cause LAME to exhibit just the effects described while behaving exactly as it should in considering human hearing and masking effects without having applied a filter itself. The boundary of the nearest subband to the original signal's low-pass filter may cause the noise to vanish totally and look like a sharp and continuous digital low-pass filter applied by LAME. This apparent effect may then vanish when the genuine signal falls to zero (or to about the same as the noise floor at any rate) and reveal at least some low-level noise content above 16 kHz (however coarsely quantized the 16 kHz+ band might be). It may be, in fact, that loud transients in the signal (hi-hat, cymbal, snare drum, clicking noises) will show up on the spectrogram as bright vertical lines that actually stop short of the apparent 16kHz cut-off introduced by LAME, thereby indicating that the true musical signal was indeed lowpass-filtered at slightly less than 16 kHz, but within the MP3 sub-band that ends at 16 kHz. A close look at a spectrogram of the original FLACs might reveal this more clearly.