Help - Search - Members - Calendar
Full Version: LAME applying 16kHz cutoff at 320K CBR
Hydrogenaudio Forums > Lossy Audio Compression > MP3 > MP3 - Tech
PlazzTT
I have a recording from cassette tape in FLAC. From the spectral I can see it has frequencies up to 21kHz, but when I encode it to MP3 (V0 or 320K) with LAME, there's a clear cut-off at 16kHz (except for at the very end of the track when there's no music, just tape hiss, there's no cut-off here).

I assume this is normal behaviour, I'd like to know why this happens.

I can post a clip / spectral screenshots if needed.
Big_Berny
I don't know if that's normal. But if you recorded cassette tapes with home equipment the frequency range probably doesn't even go over 16 kHz. So maybe it was just noise over 16 kHz which LAME cut-off?
lvqcl
It's sfb21 problem.

QUOTE (2Bdecided)
The issue is simple: sfb21. Due to the compromised design of mp3 (search for sfb21 to read all about it), to maintain much information in the frequencies above 16kHz, you have to over encode all lower frequencies - i.e. store them with greater accuracy than the psychacoustic model believes is necessary [...] and hence the higher the bitrate.


So LAME can decide to discard frequencies above 16 kHz and to spend more bits to more important frequencies.
PlazzTT
Makes sense, thanks. There's no actual music above about 11kHz anyway, just noise.
Dynamic
QUOTE (lvqcl @ Oct 4 2009, 12:56) *
It's sfb21 problem.

QUOTE (2Bdecided)
The issue is simple: sfb21. Due to the compromised design of mp3 (search for sfb21 to read all about it), to maintain much information in the frequencies above 16kHz, you have to over encode all lower frequencies - i.e. store them with greater accuracy than the psychacoustic model believes is necessary [...] and hence the higher the bitrate.


So LAME can decide to discard frequencies above 16 kHz and to spend more bits to more important frequencies.


Sorry to rake up a 5-day-old thread, but I suspect the mechanism cited is wrong.

The OP talked about using -V 0 and CBR 320kbps. The sfb21 problem is worked around at V0 and the extra bits are expended to encode as the psy-model deems necessary. Only at -V 3 and lower quality is the -Y switch used, which in any case doesn't discard high frequencies, it simply encodes them with coarser amplitude resolution. Only if they're below the lowest quantization level available at the time would they then be completed eliminated rather than rounded up or rounded down in amplitude to the nearest available level.

What I think is really happening with this tape is that the psymodel is determining that the very small HF content is fully masked by the music. (ABX experiments over many year confirm that in the presence of real music, a 16kHz lowpass is transparent for most people). When the music stops, it's no longer masked. Also, possibly, the ATH thresholds are lowered in the absence of sound for a reasonable amount of time.
UVSM
Earlier versions Lame 3.93, 3.92, 3.90, 3.88 do not cut frequency on 16kHz. They keep all spectrum of frequencies, up to 22,05 kHz.
UVSM
You can try -b320 -q0 -k -m s
odyssey
You would you want to preserve frequencies you cannot hear anyway? If you want to preserve stuff you don't hear, you should consider a lossless medium instead.
pdq
QUOTE (UVSM @ Oct 9 2009, 05:40) *
You can try -b320 -q0 -k -m s

I thought -k was disabled in 3.98.2?

And why would anyone choose -m s? This can only compromise quality.
odyssey
We've had this discussion countless of times.

Use search.

In short:
1. Don't use parameters for mp3-encodings.
1b. And don't use ancient mp3-encoders.
2. Don't judge audio-quality by looking at spectrograms
3. Use lossless if in doubt
PlazzTT
QUOTE (odyssey @ Oct 9 2009, 11:42) *
You would you want to preserve frequencies you cannot hear anyway? If you want to preserve stuff you don't hear, you should consider a lossless medium instead.


I'm not complaining here. I'm not looking for a "workaround" either, just curious what's going on that's all smile.gif
Rio
what frontend was used from flac to mp3? what commandline was used for lame and what version? my gutshot here is if the transcode was downsampled to 32kHz then you'll just get a max of 16kHz.
[JAZ]
QUOTE (Rio @ Oct 10 2009, 08:00) *
my gutshot here is if the transcode was downsampled to 32kHz then you'll just get a max of 16kHz.


QUOTE (PlazzTT @ Oct 4 2009, 13:19) *
[...] there's a clear cut-off at 16kHz (except for at the very end of the track when there's no music, just tape hiss, there's no cut-off here).



Shot failed wink.gif

The most probable reason is psychoacoustics. (coupled with the fact that this is mp3). And the reason why older releases of LAME did not do it is because this is in fact an improvement (getting more bits in other places).

odyssey
QUOTE (PlazzTT @ Oct 9 2009, 15:53) *
I'm not complaining here. I'm not looking for a "workaround" either, just curious what's going on that's all smile.gif

It's good to be curious. You wonder why lame throws out frequencies. The reason are simple, as JAZ also mentioned: lame is a LOSSY codec - Meaning that it will try it's best to preserve as much human-audible quality as possible for any given bitrate. Even 320kbit may not be sufficient for certain samples to be encoded without artifacts, so a lowpass to filter out any frequencies you can't hear is a very good reason to give the encoder a better chance of encoding the audible samples with less problems.

Anyone who wants to preserve inaudible frequencies (or sonics or other imaginable things) should use a lossless codec. The advantage of using a lossless codec is that you will have an archive of your music without ANY loss, and can at any time create smaller encodings even when new codecs are released and not worry about re-ripping CD's.
Rio
It just also got me curious here. Why did a -V0 or 320 LAME encode filter frequencies above 16kHz? Even if LAME is a lossy codec, this should not be the case. A -V2 encode filters at 18.5kHz, -V0 at 20.5, but this at 16kHz?!

Even if "the more important frequencies" are preserved, it is still mind boggling. Was it really encoded with just "-V0 %s %d" or "-b 320 %s %d"?

Assuming that we don't know the source of the FLAC file (assuming it's not from cassette tape), LAME should still not be behaving this way.

"The most probable reason is psychoacoustics. (coupled with the fact that this is mp3). And the reason why older releases of LAME did not do it is because this is in fact an improvement (getting more bits in other places)."

Much as I strongly agree with [JAZ] regarding psychoacoustics and improvement from previous versions, it still really puzzles me why LAME in this case just got more bits below 16kHz instead of up to 20.5kHz. All of a sudden LAME recognized the audio file as coming from a cassette tape?!
Dynamic
QUOTE (Rio @ Oct 11 2009, 04:08) *
Assuming that we don't know the source of the FLAC file (assuming it's not from cassette tape), LAME should still not be behaving this way.

...

Much as I strongly agree with [JAZ] regarding psychoacoustics and improvement from previous versions, it still really puzzles me why LAME in this case just got more bits below 16kHz instead of up to 20.5kHz. All of a sudden LAME recognized the audio file as coming from a cassette tape?!


This is a little lengthy, I'm afraid... We DO know that it's from a cassette tape source, so we can think about that and how LAME may behave in response to a typical digitzed cassette signal when it isn't told that it's from tape or given any special commandline instructions other than -V0 or -b320.

Consider these three points then read on:

(a): Dolby NR may be used to dramatically lower noise level below inherent tape noise and to correct back to the intended EQ of the mastering engineer (i.e. reverse the HF boost applied at manufacturing which enables Dolby NR to be used without dulling the consumer's signal). When making a "correct" cassette transfer Dolby "ought" to be turned on if it was on the original tape (as it would be for nearly any commercial release).

(b): Normal or Chrome Cassette tapes (apart from expensive METAL tapes, perhaps) do not typically support signal content to any more than about 15 or 16 kHz thanks to the tape speed chosen and the size of magnetic domains in those materials, and indeed many players set for normal or chrome or their Dolby circuits may include a low-pass at this frequency to avoid passing additional noise (or bias tones) to people's speakers (esp sensitive tweeters).

©: LAME uses both an ATH model (dynamically varying according to the average signal volume, if I recall correctly) and a masking model in its psychoacoustics. MP3 splits the audio into many sub-bands and encodes a scalefactor and a set of DCT transform coefficients in each, and can choose the number of bits (hence quantization noise) for each sub-band, that number of bits being available to each transform coefficient (effectively a coded frequency component) within the subband. The psychoacoustic model's aim is to ensure that for each frequency, the quantization noise introduced by coding more coarsely remains below the masking threshold, and thus inaudible. For any MP3 Frame it may be that some bands contain no signal above the masking threshold (which even may be as low as the ATH or absolute threshold of hearing) in that frequency range and can be completely zeroed, which requires very few bits to encode.

I may be slightly oversimplifying the psychoacoustic model, but I don't think I've made any gross errors that should colour your understanding.

So the overall system we're looking at is something like the following, and will have quite probably feature all three aspects above or at least (b) and © if Dolby NR is turned off in the consumer cassette player.

main studio master -> cassette master (Dolby boost & LPF probably applied) -> consumer cassette tape
-> consumer cassette player -> Dolby NR circuit (reverses Dolby boost thereby removing much hiss introduced by the tape material)
-> soundcard ADC to create PCM -> PCM digital audio (encoded as losslessly as FLAC)
-> LAME MP3 encoder -> MP3 file -> PCM lossy audio being played back or analyzed via spectrogram.

We start with a full audio spectrum signal on the studio master with very low noise.

By the end of the first line (consumer cassette tape) we have a reduced signal bandwidth of about 15-16 kHz at most, I'd suggest. In addition to the desired signal we have noise arising from the magnetic materials of the cassette tape.

By the end of the second line Dolby NR circuit output, we have the same 15-16 kHz of signal bandwidth plus some tape noise reduced by the Dolby circuit, plus some broad band noise at a low level from the electrical circuits used.

Essentially the same signal bandwidth, noise bandwidth and noise amplitude will be present in the FLAC file's audio at the end of the third line.

On the fourth line, the LAME encoder will only filter at about 19 or 20kHz, thus preserving the 15-16 kHz signal bandwidth and most of the noise bandwidth before analysis. The resulting MP3 file represents the audio in time frames within which LAME stores what amount to frequency coefficients, quantizing as coarsely as it believes will be inaudible. The coarseness to which it rounds each coefficient, is able to vary across the frequency spectrum by virtue of using sub-bands, each of which can have its own overall gain and bit-depth (which determines the maximum amplitude and relative precision, known as quantization noise). Imagine a coefficient with amplitude of 7, while LAME decides that rounding to the nearest multiple of 32 is acceptable giving the calculated minimum masking level or absolute threshold of hearing in that sub-band. That will round the coefficient down to zero every time. It will still do so if the allowed coarseness reaches 16, but might round up to 8 if the allowed coarseness of quantization is 8, for example.



Looking at spectrogam views of my own tape transfers (from a rather good Walkman WM36 with Dolby B) in years gone by, usually on CoolEdit96 or 2000 (now sold as Adobe Audition), I've noticed some content above 17kHz, but in reality, it's in very dark colours, meaning it's probably 60-80 dB below full scale with a subtle tone at around 19 kHz if I recall correctly, and I don't recall anything like snares or cymbal hits or similar transients spiking above 16kHz at all when I was careful to avoid digital clipping, which is indicative of the low-pass filtering of the intended signal. Most of the hiss I can hear on those Dolby B tape transfers in quiet inter-track sections is surely in the 10-16 kHz range or thereabouts (and probably lower too), and I hear no hiss while reasonably loud music is playing.

If indeed the noise left after Dolby B NR is something like 60-80 dB below full scale, I'd expect any music averaging perhaps -20 dBFS to -14 dBFS to provide enough masking to make all the noise left after Dolby B NR inaudible except for very quiet passages, fade-outs and intentional silences (lead-in/out and inter-track). That's my experience of listening quite carefully to discern noise, including one lightly instrumented duet ballad track mastered intentionally quietly adjacent to a full-on psychedlic rock cover version, which I tried to denoise sensitively, and listened closely to over many repeats.

I'd have thought LAME's excellent psy-model would likely draw roughly the same conclusion about where the noise is audible, and may indeed mark the 16kHz+ sub-bands as containing no audible content (all hiss, since most cassettes have no signal content above 16kHz, only tape noise) during even the quiet passages as well as the loud ones. Only where the intended signal (that on the studio master) is virtually or completely silent, and noise is all that's left, is it likely that there's not enough low-and-mid-frequency content for LAME to be sure that the 16kHz+ sub-band contains nothing above the masking level or ATH level and so will then encode it at non-zero amplitude. This may be partly due to using (by my recollection) a dynamically-varying ATH model, which assumes you might be playing it with more amplification during extended periods of low signal level (which some people will, when trying to discern words or notes in the deep fade out or when using an automatic gain control or a similar DSP like foo_dsp_vlevel with suitably aggressive settings). It may also be due simply to the relatively white noise spectrum, where high frequencies are as prominent as lower frequencies and are those not eliminated entirely by frequency masking effects.

In summary, cassette tape with decent NR is a special case by virtue of both the low noise (but still detectable in 16-bit 44.1kHz PCM) and the restricted signal bandwidth (about 15-16 kHz, say). This may in many cases cause LAME to exhibit just the effects described while behaving exactly as it should in considering human hearing and masking effects without having applied a filter itself. The boundary of the nearest subband to the original signal's low-pass filter may cause the noise to vanish totally and look like a sharp and continuous digital low-pass filter applied by LAME. This apparent effect may then vanish when the genuine signal falls to zero (or to about the same as the noise floor at any rate) and reveal at least some low-level noise content above 16 kHz (however coarsely quantized the 16 kHz+ band might be).

It may be, in fact, that loud transients in the signal (hi-hat, cymbal, snare drum, clicking noises) will show up on the spectrogram as bright vertical lines that actually stop short of the apparent 16kHz cut-off introduced by LAME, thereby indicating that the true musical signal was indeed lowpass-filtered at slightly less than 16 kHz, but within the MP3 sub-band that ends at 16 kHz. A close look at a spectrogram of the original FLACs might reveal this more clearly.
greynol
QUOTE (Rio @ Oct 10 2009, 20:08) *
It just also got me curious here. Why did a -V0 or 320 LAME encode filter frequencies above 16kHz? Even if LAME is a lossy codec, this should not be the case. A -V2 encode filters at 18.5kHz, -V0 at 20.5, but this at 16kHz?!

Can you ABX a difference? If you can then I think your notion of "should" might carry weight; otherwise...
Rio
@Dynamic: Thank you for the detailed explanation on cassette tape to MP3. It has become clearer to me when it comes to cassette tape content.

@greynol: I'll ABX if I said "20.5kHz sounds better than 16kHz". Otherwise...
greynol
Your evasion noted; your baseless notion of "should" ignored.
Rio
When talking about behavior (both the encoder and your personal behavior), there are expectations, therefore the "should" qualifier.

LAME at CBR 320 is expected to filter at 20.5kHz, therefore it "should" filter at
20.5kHz. If it does not, one would think there is something wrong, that's why the OP asked the question, that's why I also asked it.

JAZ and Dynamic was kind and patient enough to enlighten things, while you try to burden by throwing ABX on me.

If I make a quality claim, you expect me that I "should" ABX.

I did not make any quality claims in this thread, ergo expect me that I "should" not ABX.

Therefore, I did not evade. lalala.gif
greynol
QUOTE (Rio @ Oct 12 2009, 17:23) *
LAME at CBR 320 is expected to filter at 20.5kHz

Expected?
odyssey
Yeah, why would you ever expect a lossy encoder to encode inaudible frequencies? Anyone who expects such thing has miserably misunderstood the concept of lossy encoders.
Rio
check under the hood while LAME is running... I think it's called polyphase filter...
pdq
Polyphase filter is just the first step in the process. Frequencies that pass the polyphase filter are still not guaranteed to be included.
PlazzTT
Thanks for very interesting answers, I'll upload a small FLAC clip later so you can see what I mean.

(again, I'm not complaining. I can't ABX the LAME encode even at V2, so it's not a problem, some people seem to be interest in the sample though)
rentzu
Above 16KHz is a very small part of the highest audible octave. Most people probably cant even hear it. In many cases, filtering out these frequencies will make many recordings sound better, even if not accurate... if you even notice..

Anyway, FLAC ftw.
Rio
@pdq I agree with your post, especially after the responses from [JAZ] and Dynamic

@greynol LAME is expected to filter out frequencies above 20.5kHz @ 320kbps since it's default lowpass is set at 20.5, and to an extent keep frequencies up to 20.5kHz. Shouldn't it? (Is that what was called begging the question?)

I guess you're trying to throw the flak with your responses by answering a question with another question, when in fact, an inquirer expects an declarative answer rather than with another question. The ABX thing was a quick one, but I wouldn't buy it. Begging the question? Much as that argument discusses logical fallacy, what I'm more concerned of is the logic behind the "if you can ABX the diff...otherwise" question. The motive behind your question is actually clear and is valid, but it is a wrong response to the question at hand, and maybe a right answer to another question.

@odyssey I wonder the question to me "why ever expect a lossy encoder to encode inaudible frequencies?" Who was expecting? Certainly it wasn't me. Is it the LAME devs, because they set the lowpass of 320kbps at 20.5?

Why not add the --lowpass 16000? (Then someone would reply, leave the switches alone) Now that's something getting everyone in circles.
pdq
The LAME devs tried to make reasonable tradeoffs between available bits and lowpass frequency in such a way that for most people the loss in accuracy due frequency cutoff is approximately equal to the loss in accuracy due to limited bits. Thus the more bits you have available, the higher frequencies you will get.

This doesn't mean that these are the optimum tradeoffs for everyone. Personally, anything above 12 kHz would be wasted on me because I can't hear that high. In fact, I would venture to say that there are many people in my situation, where a lower lowpass than the default would be a better choice.

That said, the next time I rip my collection, it will be to lossless, because I can now easily afford the disc space.
PlazzTT
Here's a 7 second clip from the end of the track in question: http://www.hydrogenaudio.org/forums/index....showtopic=75486 (24-bit FLAC).
greynol
QUOTE (Rio @ Oct 13 2009, 07:57) *
Shouldn't it?

Not necessarily, no. In this case the answer is clearly no, it should not.

QUOTE (Rio @ Oct 13 2009, 07:57) *
Is that what was called begging the question?

Your stating that it should as fact is an erroneous assumption that is begging the question.

Perhaps when you design your own mp3 codec you can make sure that it spends bytes on data that people will not hear. Hopefully it won't come at the expense of compromising parts that people can hear. Perhaps you'll get nice spectral plots like Blade without the glaring artifacts which often exist even at 320kbits with that codec. In the meantime I'd not second guess those the Lame developers unless you can cite specific examples where a lack of high frequencies that you feel should exist in the encode are responsible for it not giving transparent results. Well scratch that, you can second guess them all you like, just don't expect that people will simply take your word as gospel.

QUOTE (Rio @ Oct 13 2009, 07:57) *
The motive behind your question is actually clear and is valid, but it is a wrong response to the question at hand, and maybe a right answer to another question.

...as if you had the right response to the question at hand. rolleyes.gif
greynol
QUOTE (PlazzTT @ Oct 15 2009, 19:49) *
Here's a 7 second clip from the end of the track in question: http://www.hydrogenaudio.org/forums/index....showtopic=75486 (24-bit FLAC).

Thanks for the clip. I really wouldn't be surprised by the output you're getting. I think the explanations cited by JAZ and Dynamic contain the answers you seek.
PlazzTT
QUOTE (greynol @ Oct 16 2009, 05:20) *
QUOTE (PlazzTT @ Oct 15 2009, 19:49) *
Here's a 7 second clip from the end of the track in question: http://www.hydrogenaudio.org/forums/index....showtopic=75486 (24-bit FLAC).

Thanks for the clip. I really wouldn't be surprised by the output you're getting. I think the explanations cited by JAZ and Dynamic contain the answers you seek.


Yep, thanks for the answers.

I just thought it was interesting.
Snash
QUOTE (pdq @ Oct 13 2009, 11:42) *
...Personally, anything above 12 kHz would be wasted on me because I can't hear that high. In fact, I would venture to say that there are many people in my situation, where a lower lowpass than the default would be a better choice...


Is there a way the user can "lower the lowpass" when using LAME? This could be great if the result is that I can then select more compression and not lose any audible quality that I'd notice. (I suppose my kids would then think all my music sounds "flat").

If there is not a lowpass setting, then is there a tool to run the audio through prior to LAME?
pdq
QUOTE (Snash @ Oct 16 2009, 10:06) *
QUOTE (pdq @ Oct 13 2009, 11:42) *
...Personally, anything above 12 kHz would be wasted on me because I can't hear that high. In fact, I would venture to say that there are many people in my situation, where a lower lowpass than the default would be a better choice...


Is there a way the user can "lower the lowpass" when using LAME? This could be great if the result is that I can then select more compression and not lose any audible quality that I'd notice. (I suppose my kids would then think all my music sounds "flat").

If there is not a lowpass setting, then is there a tool to run the audio through prior to LAME?

--lowpass <frequency in Hz>
Rio
[/quote]
...as if you had the right response to the question at hand. rolleyes.gif
[/quote]

Well, me not having the right response does not exonerate you of your wrong "can you ABX" response as well. I may have a wrong answer, but not rude like yours.
odyssey
QUOTE (Rio @ Oct 16 2009, 17:34) *
Well, me not having the right response does not exonerate you of your wrong "can you ABX" response as well. I may have a wrong answer, but not rude like yours.

Well for once, I don't think greynol provided a rude answer. The answer was already given on the first page of this thread - You didn't understand that, so he had to cut into pieces what the consequences would be for a developer of a lossy codec. The blade example was great. AFAIR it doesn't have a psy-model at all, and thus your spectrograms should look nice - Give it a go.
greynol
Rio, the "can you ABX" was pointed at you young padawan, not the OP. You seem to think that you know better than the Lame developers and that the codec is somehow broken. Provide some evidence to the effect before telling people how things ought to work. If you feel that I'm being rude, so be it.
chrizoo
QUOTE (PlazzTT @ Oct 4 2009, 11:19) *
I have a recording from cassette tape in FLAC. From the spectral I can see it has frequencies up to 21kHz, but when I encode it to MP3 (V0 or 320K) with LAME, there's a clear cut-off at 16kHz (except for at the very end of the track when there's no music, just tape hiss, there's no cut-off here).

I assume this is normal behaviour, I'd like to know why this happens.

I can post a clip / spectral screenshots if needed.


I got this from Mike Giacomelli, if that's of any help here:

http://wiki.hydrogenaudio.org/index.php?ti...cal_information
Rio
@chrizoo, that table is not updated (last version was 3.97, not applicable for 3.98.x), plus the fact that the lowpass filters there for -V1 and -V2 settings are identical (I think it was a typo since they should have different settings). The discrepancy started at -V5, which should not be 16538 Hz - 17071 Hz.
chrizoo
QUOTE (Rio @ Oct 20 2009, 07:58) *
@chrizoo, that table is not updated (last version was 3.97, not applicable for 3.98.x), plus the fact that the lowpass filters there for -V1 and -V2 settings are identical (I think it was a typo since they should have different settings). The discrepancy started at -V5, which should not be 16538 Hz - 17071 Hz.

thanks for pointing that out. Do you have a link to up-to-date material you could post here ? Or the correct values ?
Rio
QUOTE (chrizoo @ Oct 20 2009, 15:29) *
thanks for pointing that out. Do you have a link to up-to-date material you could post here ? Or the correct values ?

I'm sorry I don't have a link for the updated values. It so happened that while LAME is encoding, I noticed the lowpass filter in the DOS box is different than that in the posted table (for LAME 3.97 on the affected -Vx setting).

For LAME 3.98.x, it has a different setting vis-a-vis to it's 3.97 -Vx counterpart. It is somewhat higher, and may need to adjust the -V setting (ex. -V5 for LAME 3.97 is approximated to -V5.7 in 3.98.x to have the same filter at 16kHz).
This is a "lo-fi" version of our main content. To view the full version with more information, formatting and images, please click here.
Invision Power Board © 2001-2009 Invision Power Services, Inc.