Help - Search - Members - Calendar
Full Version: Lame resamples at 64 kbps
Hydrogenaudio Forums > Lossy Audio Compression > MP3 > MP3 - General
knackers
I'm using Lame v3.96.1, built by moi on Windows.

I'm converting standard CD wav files using "lame -h -b 64" and Lame insists upon resampling from 44.1 to 24 kHz. If I attempt to override using "-s", still no joy. If I let Lame use its default 128 kbps, it leaves the sampling at 44.1.

Can anyone explain this, and tell me how to force 44.1 kHz?

Why would I want 64 kbps? The spoken word, which is still not dead. Thanks for any advice.
clintb
Question: Why do you need a 44.1KHz sampling rate for voice/speech?
knackers
QUOTE (clintb @ Dec 3 2005, 07:37 PM)
Question: Why do you need a 44.1KHz sampling rate for voice/speech?
*


Short answer: a somewhat knowledgable friend told me that the standard for audiobooks is 44.1 kHz and 64 kbps, and I wanted to fit with that.

I am not expert in audio encoding, but in addition to meeting the convention, my thinking is:
1. The kbps determines the file size.
2. I need smallish file size, hence 64 kbps.
3. Why not encode using the existing 44.1 if that will give even a tiny improvement?
4. I have read that sampling is best at multiples of 11, so 24 kHz irked me.

Please enlighten me if I am mistaken! Thanks.
Cyaneyes
Lame wouldn't do the resampling for no reason. At that bitrate, 24khz audio is easier to handle than the original 44.1. The lowpass is going to be lower than 12khz anyway, so a higher sampling rate will not improve quality, and would just be wasting precious bits.
Mike Giacomelli
QUOTE (knackers @ Dec 3 2005, 07:27 PM)
QUOTE (clintb @ Dec 3 2005, 07:37 PM)
Question: Why do you need a 44.1KHz sampling rate for voice/speech?
*


Short answer: a somewhat knowledgable friend told me that the standard for audiobooks is 44.1 kHz and 64 kbps, and I wanted to fit with that.

*



Theres a standard for encoding audio books? Why?

QUOTE
I am not expert in audio encoding, but in addition to meeting the convention, my thinking is:
1. The kbps determines the file size.
2. I need smallish file size, hence 64 kbps.
3. Why not encode using the existing 44.1 if that will give even a tiny improvement?
4. I have read that sampling is best at multiples of 11, so 24 kHz irked me.

Please enlighten me if I am mistaken!  Thanks.


Regarding 3, you're assuming that a higher sampleing rate is better. Thats only true if you have bitrate to encode it properly, and something in those higher frequency worth encoding. Spoken words are generally < 5KHz. Using a 44.1KHz is tremendously higher then required. Worse, it will waste your limited bitrate.

Regarding 4, theres no reason to worry about that. However, given that 24KHz is probably higher then you need, 22 or 11Khz might be a better choice. Although I'm not sure how well Lame will work at those sampleing rates because I've never tried it. Might be worth doing a search here for more info.

Edit: brackets
ErikS
You would most likely get a better result if you encode the material as mono. Add "-mm" switch to your command line. If it still resamples, then also try adding "--resample 44"
ErikS
Btw, the "-s" switch only tells lame the frequency of the input file - not what you want in your output. Only use this when encoding from raw files or if you otherwise have problematic wav files which don't have the sampling frequency set properly in the header.

CODE
   -s sfreq        sampling frequency of input file (kHz) - default 44.1 kHz
abasher
Think "--resample 22" would make more sense for you. That's what all audio-books I have uses. It wouldn't hurt the quality, since voice is at most up to 3.5kHz, covered by the 11.025kHz range of that sampling rate. And it would also probably be more compatible with players than 24kHz. Also easier for LAME to resample.

No need to go 44.1, as mentioned by others.
ErikS
QUOTE (abasher @ Dec 4 2005, 02:40 PM)
since voice is at most up to 3.5kHz
*

Where have you got this info from?

I thought 3.5 kHz was a limit where you should keep everything up until, or else the voice was severely distorted. Saying that sounds produced by human voice can't go above 3.5 is not what it said iirc. Fricatives should produce pretty much energy at frequencies much higher than that... And I'm pretty sure I can back that up by abx-tests, if I only had some voice samples here...
Alex B
The original question lacks some information. It was not mentioned what the source exactly contains. Is it just plain speech or does it contain music or sound effects too? Is it mono (two identical channels in a stereo wave file) or is it stereo (with more or less channel separation between the channels)? If it is stereo with perhaps meaningless channel separation would it be fine to encode it in mono mode? Also, the purpose of the encoding was not specified. How is it going to be used? Is it going to be listened to personally or is it going to be distributed and needs to meet some specifications stated elsewhere.

As mentioned before some switches can change the LAME behavior. I would recommend trying a few different switches and testing the audio quality by listening to the files. Also, the intended usage should be tried. For example, how a portable player works with the files.

Here are some possible switch combinations:

-b 64 -h --resample 44
As said before, this switch would keep the original 44.1 kHz sample rate. If the file has stereo content with channel separation the overall audio quality is likely to be lower than without the resample switch.

-b 64 -h -m m
This switches to mono encoding. The encoder will not change the original 44.1 kHz sample rate.
(It "assumes" the mono files have more available space for audio data at the same bitrate, which is not exactly true if the original stereo wave file is actually 2x mono because LAME uses the joint stereo mode by default and can effectively combine the two identical channels automatically.)

-V8, -V9, -V8 -m m and -V9 -m m
Low bitrate VBR switches, perhaps worth of trying. Again, the -m m switch makes the files mono.

--abr 64 or --abr 64 -m m
ABR mode. I would try these first when seeking the best quality/size ratio for speech. If this produces a lower bitrate than 64 kbps with speech (would that be unwanted?) the value can be changed. For example, --abr 89 -m m is a valid switch.

Edit: a couple of typos
abasher
QUOTE (ErikS @ Dec 4 2005, 02:35 PM)
QUOTE (abasher @ Dec 4 2005, 02:40 PM)
since voice is at most up to 3.5kHz
*

Where have you got this info from?

I thought 3.5 kHz was a limit where you should keep everything up until, or else the voice was severely distorted. Saying that sounds produced by human voice can't go above 3.5 is not what it said iirc. Fricatives should produce pretty much energy at frequencies much higher than that... And I'm pretty sure I can back that up by abx-tests, if I only had some voice samples here...
*



Yeah, you're right. I put it the wrong way. Should have said "where human voice is mostly kept within".
Higher frequencies are indeed produced, but not a lot, when it comes to regular talking.

But you would agree that human voice fits snuggly into the 11.025Hz range, right? Especially when it's not singing, just talk.
knackers
Guys,

Thanks a million to all of you. I read all the responses carefully, and now understand things well enough. I think I will accept the Lame defaults such as 24 kHz sampling. When I said that 44.1 was "standard" for audiobooks I only meant that I was told that it was a convention. But I am now skeptical and will check with my source...

QUOTE (Alex B @ Dec 4 2005, 08:16 AM)
The original question lacks some information. It was not mentioned what the source exactly contains. Is it just plain speech or does it contain music or sound effects too?

The usual sort of dilemma. Mostly just a mono voice, which might span 20 regular CDs. However, right now I'm listening to a reading from the BBC that has some stereo music and I'd hate to lose any quality...

QUOTE (Alex B @ Dec 4 2005, 08:16 AM)
Here are some possible switch combinations:

Tried 'em. I think I'll use "--abr 64 -h" and leave well-enough alone with the 24 kHz sampling.

Thanks again.
ErikS
QUOTE (abasher @ Dec 4 2005, 04:35 PM)
But you would agree that human voice fits snuggly into the 11.025Hz range, right? Especially when it's not singing, just talk.
*

I agree that you can transmit voice over a channel with 3.5 kHz bandwidth. Telephone is one example of that. Most of the time you can hear a voice clear enough through the telephone, but I would also say that it's pretty far from good quality. All the way up to a ~10kHz bandwidth cap is imho easily detected. That would of course translate to using a sampling frequency over 20kHz according to the Shannon/Nyquist theorem. Voice which is bandwidth limited to >10kHz may also be possible to detect, but since I can't try it right now I better not swear on it. smile.gif
Daijoubu
How about using a codec more suited for voice or low bandwidth?
Speex/Vorbis/(HE)AAC
This is a "lo-fi" version of our main content. To view the full version with more information, formatting and images, please click here.
Invision Power Board © 2001-2009 Invision Power Services, Inc.