david_rones
Sep 1 2002, 10:39
Hi.
I'm needing some help for a project converting PowerPoint narrations from .wav to .mp3. Basically, I'd like to know the best settings to use in LAME for speech.
I'd use something like SPEEX, but I need to ultimately stream the files via Flash as .swf files. .SWF only supports .mp3. Also, .SWF files do not support Joint Stereo or VBR. I also will be targeting 24kbps/22050Hz/Mono. Many users will be listening to the .swf's streamed via dial-up modems.
So given those paramaters, I have two questions. First, what PCM setting is the best to start with when we do the narration recordings on the PC? I don't want to capture at too high a quality, because disk space may be at somewhat of a premium, especially since captured streams can run up to an hour in length. So the smallest I can get away with is preferred.
Second, what are the best LAME switches to use, given this will only be speech/voice-based audio. (and also considering the limitations of .swf I stated above.) Again, I'm thinking I want to encode to 24kbps/22050Hz/Mono, but I really don't know what to set the other options to.
Thanks so much!
David
rjamorim
Sep 1 2002, 11:00
If you are going to encode to 22.050Hz, I suggest you use some FhG encoder. Lame is only optimized for 44.100 Hz encodings.
david_rones
Sep 1 2002, 11:08
QUOTE(rjamorim @ Sep 1 2002 - 09:00 AM)
I suggest you use some FhG encoder
Yes, but this will be an app that we distribute, and we would like to stay open-source to avoid license fees.
Portable compatibility? MPEG1 layer3 only or is MPEG2 layer 3 or MPEG2.5 layer 3 also ok? Other requirements? Speed requirements?
24kbps mp3 will be MPEG-2 layer III.
Well, there really aren't that much to tweak given the restrictions.. cbr/mono/24kbps.
Basic switch would be something like:
lame -a -h -b 24 --nspsytune --resample 22 --lowpass 7
I'll try testing few more settings quickly...
With --lowpass 8 you pretty much need --nspsytune, otherwise it starts to sound too distorted to me.
So, maybe:
lame -a -h -b 24 --nspsytune --resample 22 --lowpass 7 --athtype 2
Another option I found pretty good is:
lame -a -h -b 24 --resample 22 --lowpass 7 -X 1 --athtype 2
The above nspsytune line is a bit muffier. Below (default gpsycho) is maybe a bit clearer but has some higher freq swishing. I guess it's a matter of taste..
Using "higher quality" quantization noise shaping made it just worse: -q0 or -q1, but especially -q0 is known to be broken in Lame 3.90-3.92. Both nspsytune and gpsyho lines are using -X1 quantization noise measurement method option (nspsytune by default), which gives better results than gpsycho's default (-X0).
I guess it depends also from the source. I used 44khz/stereo speech.
[noticed that you couldn't use speex, so edited that away]
rjamorim
Sep 1 2002, 12:59
QUOTE(JohnV @ Sep 1 2002 - 03:36 PM)
stereo/mono? portable compatibility? MPEG1 layer3 only or is MPEG2 layer 3 or MPEG2.5 layer 3 also ok? Other requirements?
Check out also speex. It's open source speech codec.
http://speex.sourceforge.net/Check out Windows binaries from rarewares
http://www.inf.ufpr.br/~rja00/ (of course down atm).
Winamp alpha speex plugin:
http://www.saunalahti.fi/~cse/Speex/in_speex.ziphttp://www.saunalahti.fi/~cse/Speex/in_speex_src.zipYou can get speex binaries temporarily from here:
http://audio.ciara.us/rarewares/speexbundle.zipLet's hope I stop being a lazy guy and finish setting up the mirror.
david_rones
Sep 1 2002, 15:00
John,
Thanks for your detailed response. And you are right, .swf supports only MPEG-2 layer III.
I wonder how much of a hit the sound will take capturing the original speech at a lower quality, say 22khz, 16bit, mono? I have to consider the user's available disk space on their local machine for this project.
What are you thoughts about the right capture quality setting? I'm attempting to balance original PCM file size with quality of source before encoding.
On other thing to add...there tends to be a lot of background "hiss" in these micropone recordings. Sorry that this newbie doesn't know the technical name for this.

Any setting in particular that will help with that?
David
Hmm, I don't believe there's much quality loss if you capture with 22khz/16bit/mono and encode.
Can't say for sure though.. If you put few short sample .wavs online, I could do some testing.
It could also help to tweak the settings, especially if there are lots of background noise, and you are not gonna do any noise removal process with 3rd party software first. Some settings will definitely sound better than others with lots of background noise...
Delirium
Sep 1 2002, 20:57
"Noise removal" algorithms tend to be fairly complex, but if you do some searching you may find some open-source implementations (if something like that isn't too CPU-intensive to use in your app). Most are essentially glorified dynamic bandpass filters -- they subdivide the spectrum into small frequency ranges, look for the ones which "look like hiss" (I'm not entirely sure how this is recognized; perhaps too constant of a sound) and then filter out that region of the spectrum. If you're only using one computer/mic combo for the encoding, you can simplify this process by recording some silence and spectrum-analyzing it to find out where the hiss is concentrated and then just filter out that frequency range. If you need to work on arbitrary computer/mic setups, you'll have to do the more in-depth dynamic analysis though to figure out at runtime where the hiss is located.
kennedyb4
Sep 1 2002, 21:23
Thats a great idea. You could also try pre-processing the wavs with the more complex routines of cool edit or other good noise reduction programme.
There is absolutely nothing to waste at 24kbps.
Gabriel
Sep 2 2002, 02:03
--alt-preset 24 -m m
QUOTE(Gabriel @ Sep 2 2002 - 11:03 AM)
--alt-preset 24 -m m
Hmm, in order to comply with the requirements it would have to be 22khz/24kbps cbr.
This sounds pretty decent:
--alt-preset cbr 24 -a --resample 22 --lowpass 7
It's otherwise exactly the same nspsytune-line I mentioned before, but adds --ns-bass -3. It's increasing the quality, so the above line is better than my first nspsytune suggestion. Imo the alt-preset 24's default lowpass (4khz) sounds pretty muffled, even for speech. -m m and -a switches do the same thing (downsample to mono).
PatchWorKs
Sep 2 2002, 06:35
My vorbis @ 22 KHz, mono sounds better if i preprocess them with Soundprobe: DC offset, resample [22,mono], Expander, Normalization... try yourself !
david_rones
Oct 1 2002, 16:15
Hi again.
This was really great advice, and as such, we are using the following settings as our "main" settings for our application:
--alt-preset cbr 24 -a --resample 22 --lowpass 7
I'm hoping you can help me a little more.
We record the .WAV's at 16bit 22050khz mono. We never have control over the microphone or the environment, as any user can use the software.
We want to be able to offer an even lower quaility/lower bandwidth setting as well...for those users who view the presentations on very low bandwidth connections. Our options given the limitation of the Flash MP3 format are:

So I have two questions:
1) Which bit rate/frequency can we go to that will give us the best reduction is size vs. the main setting above, while still producing a relatively decent sounding result. (It's looking, based on the chart that 16kbps/11025khz/Mono is our best choice with a 33% reduction in required bandwidth. The next step down (8/11/Mono) sounds really bad, unless anyone has some good settings to try.)
2) And what would be the best corresponding settings to use?
If you could help with this, I'd really be very greatful.

I'm happy to email you a sample .wav file if you like. (But it is just speech, so if you want to just use your own...either way)

Thanks again!
David
Ok, 8kbps is too low for Lame..
This is a line which was the best with my speech samples:
--alt-preset cbr 16 -a --resample 11 --lowpass 5 -Z
Yes, some people might wonder why use -Z (here noiseshaping type 1). Obviously the bitrate is so low, that what is logical at a bit higher bitrates does not apply to extreme low bitrate. Using -Z made especially my female speech sample sound better.
david_rones
Oct 3 2002, 13:09
> --alt-preset cbr 16 -a --resample 11 --lowpass 5 -Z
Awesome! Thanks so much!
David
Well, gotta say that I like this even better:
-b 16 -a --resample 11 --lowpass 5 --athtype 2
It's considerably less noisy, but a bit more metallic. So that's my best line so far..
david_rones
Oct 3 2002, 21:41
Ok. Now I'm gonna change the requirement completely. Now we want to add a higher quality setting above our main setting. So to review, our main setting is at:
--alt-preset cbr 24 -a --resample 22 --lowpass 7
And we capture the .wav at 16bit 22050khz mono. So if we looked at 32, 40, and 48kbps, (based on the chart above of supported .swf mp3 formats) where do we get the most bang for our bandwidth buck, and what settings work best there?
Thanks as always!
David
Heh, I just noticed that you can increase the 24kbps quality still quite nicely if you add -Z to the 24kbps line.
I've never until this thread actually tested anything this low bitrate, so it's surprising to notice how some features function totally opposite compared to what one might think..
These are my recommendations so far for:
24kbps speech:
--alt-preset cbr 24 -a --resample 22 --lowpass 7 -Z
16kbps speech:
-b 16 -a --resample 11 --lowpass 5 --athtype 2 -X3
Should these low bitrate settings be added to the 'List of recommended LAME settings' thread once agreed upon.
There's really nothing below 80 kbps in that list.
BTW Can these tweaks be integrated in the (alt-)presets, to make the statement '--alt-preset (CBR) xx give best quality at bitrate xx' true for all bitrates, also the low end.
Notice that I've only tested mono speech here.. Could be that music needs lower lowpass in order to sound even half decent.
And I hope that some other people tries to test these also, in order to verify or better my findings.
takehiro
Oct 4 2002, 11:54
just FYI: LAME before 3.93 has a bug on preecho-prevention when mono mode.
I recommend you to use the latest LAME, if you want to use mono.
PS. I think the last problem on the 3.93 is --preset fast standard.
QUOTE(takehiro @ Oct 4 2002 - 08:54 PM)
just FYI: LAME before 3.93 has a bug on preecho-prevention when mono mode.
I recommend you to use the latest LAME, if you want to use mono.
Hmm, yeah Lame3.93a is marginally better here with very sharp syllables. Overall, considering the speech quality, the improvement is very minor.
Of course with music coding and with a bit higher quality, this is more important issue.
QUOTE(takehiro @ Oct 4 2002 - 10:54 AM)
PS. I think the last problem on the 3.93 is --preset fast standard.
Sorry for the off topic post, but if the rest of the developers are waiting for me to fix the fast presets in 3.93 before releasing, I suggest you just go ahead and release now. What I'd prefer would happen is that the fast settings are just disabled in 3.93 with a notice that if people want to use them, that they should use 3.92 instead. Right now, I'm just so busy.. I don't really have time to work on LAME at the moment.
david_rones
Oct 4 2002, 15:58
QUOTE
These are my recommendations so far for:
24kbps speech:
--alt-preset cbr 24 -a --resample 22 --lowpass 7 -Z
16kbps speech:
-b 16 -a --resample 11 --lowpass 5 --athtype 2 -X3
Thank you. And we're getting this into our code today. Now, could I hit you up for the best speech settings at 32, 40, and 48kbps (all 22050Hz and Mono). I promise we'll be done, and both me and my boss will be very greatful!
B)
DerEber
Oct 25 2002, 11:53
One thing I do for getting Speech file as smal as possible is to aply a noisegate depending on how agressive you are using it you can get files a lot smaller by setting all the breaks bewseen two words to zero.
david_rones
Nov 5 2002, 09:40
So I have a client that is hell bent on wanting their speech encoding with LAME at 8kbps. Again, since we are going to Flash, it needs to be 8kbps, 11KHZ Mono.
JohnV or anyone else, would love for you to experiment with the best settings at these parameters. I know it won't be pretty, but I'm sure on this board we'll be able to find the best possible setting.
Thanks!
David
Well, to tell you the truth, 8kbps is almost hopeless..
Simple line after some testing:
-b 8 -a --resample 11 --lowpass 4.0
I really couldn't get it noticeably better than that with any switch tweaking... maybe somebody else can try?
Artemis3
Nov 9 2002, 12:22
8kbps on mp3 is not realistically possible, no matter what you do, it sounds too ugly. The best i could do was 16kbps at 8Khz (allowing full 4khz dinamic range)
Maybe you could try lowpass 3 or something less, and still keep the samplerate at 11khz.
At least vorbis handles it better, but speex is the right tool for this. Speex will have to be supported in the near future in hardware for this kind of use. (speech recording/mini tape replacement).
pantheranddawg
Nov 9 2002, 15:05
JohnV,
Just wanted to add my thanks for your work in testing these low bitrates for speech. I'll definitely use the 16kb/s line for audiobooks, lectures, etc. that I had previously been encoding with lame at 48kb/s. I think Speex at ~9-10 kb/s compares favorably with your setting and I may go that route eventually, but for current hardware compatibility, this is excellent.
QUOTE(pantheranddawg @ Nov 9 2002 - 11:05 PM)
JohnV,
Just wanted to add my thanks for your work in testing these low bitrates for speech. I'll definitely use the 16kb/s line for audiobooks, lectures, etc. that I had previously been encoding with lame at 48kb/s. I think Speex at ~9-10 kb/s compares favorably with your setting and I may go that route eventually, but for current hardware compatibility, this is excellent.
Well, you will get even better result if you use those settings with --abr instead of cbr-coding. cbr was only used because it was needed by David because of the flash-implementation.

24kbps speech:
--alt-preset 24 -a --resample 22 --lowpass 7 -Z
16kbps speech:
--abr 16 -a --resample 11 --lowpass 5 --athtype 2 -X3
Also the result should be a bit better, if you use Takehiro's 3.94a:
http://static.hydrogenaudio.org/extra/LAME...-394-alpha2.zipedit: fixed link to 3.94a and added abr-lines
Beside: I asked the developers of
hoeren.zeit.de: They are using a Frauenhofer Codec with 24kbps, 16Khz (Stream) / 96kbps, 96Khz (Download), both mono.
I'm no pro at all, but the following comand-line also worked well:
-b 24 -m m -h --abr 24 -B 64 --resample 16 --lowpass 12 -a --nspsytune --highpass 0.06 --highpass-width 0.1 --athtype 2
But I still like JohnV's commend line better (smaller file-size, little less qualy)
--abr 16 -a --resample 11 --lowpass 5 --athtype 2 -X3
Some questions:
- do I need -m m when I'm using -a ?
- Do most portable MP3-Player accept files with 16kbps ABR and 11Khz (or similar...) - or must I stick to mpeg1 layer3 to stay compatible?
- What settings would you recomend for the strongest mpeg
1, Layer III compression (I guess 32Khz, 32kbps)?
- Is there a great difference between Lame an a low-bitrate-optimized Encoder?
QUOTE
24kbps speech:
--alt-preset 24 -a --resample 22 --lowpass 7 -Z
16kbps speech:
--abr 16 -a --resample 11 --lowpass 5 --athtype 2 -X3
- Why are you using '--alt-preset' for 24 and '--abr' for 16kbps ?
Thanks, .lu
QUOTE(.lu @ Jul 14 2003, 02:11 AM)
Some questions:
Ok, it's some time since I attended this discussion but I try to answer.
QUOTE
- do I need -m m when I'm using -a ?
No, you can use -m m or -a, but -a is obviosly a bit shorter.
QUOTE
- Do most portable MP3-Player accept files with 16kbps ABR and 11Khz (or similar...) - or must I stick to mpeg1 layer3 to stay compatible?
Hmm.. I'm afraid to say anything certain to this. I'd guess that most portables support this.
QUOTE
- What settings would you recomend for the strongest mpeg1, Layer III compression (I guess 32Khz, 32kbps)?
I haven't tested 32kbps at all.. so can't say.
QUOTE
- Is there a great difference between Lame an a low-bitrate-optimized Encoder?
Well, lame is not considered especially good at low bitrates. FhG encoders may do better.
QUOTE
QUOTE
24kbps speech:
--alt-preset 24 -a --resample 22 --lowpass 7 -Z
16kbps speech:
--abr 16 -a --resample 11 --lowpass 5 --athtype 2 -X3
- Why are you using '--alt-preset' for 24 and '--abr' for 16kbps ?
IIRC I liked GPsycho better at 16kbps. --alt-preset is using NSPsytune model.
QUOTE
Ok, it's some time since I attended this discussion
I have to admit, I chose a really old one ...
QUOTE
portable MP3-Player
Finally, I've found a list of MP3 Players with some basic technical information (played Bitrates, VBR)
Reinhard Hofmann : Portable Mp3 Players (engl) |
(ger.)This might be a little help, but if anybody has more experience, I would be grateful. Is mono a problem? Do Mp3-Files get larger if I use JointStereo, but the source *.wav is mono?
QUOTE
Well, lame is not considered especially good at low bitrates. FhG encoders may do better.
I heared about it - but is it a
great difference
? Are there different FhG Encoders? I read about Fastencc (not the Radium hack). Whould you suggest this one? How much is it?
Surely, it is exciting to bring speech-compression to it's limit. Nevertheless, this is quiete theoretical in my case - for high compression I have to use a stream-compatible mp3-file (so no VBR/ABR is possible). Currently I'am using lame 3.90.3 with the comand-line '-b 24 -q1 -c -a --resample 16 --lowpass 8 --nspsytune'. This is doing all right, may be it still can be improved?!
Now, I want to make a second MP3-File, which sounds much better (none of the 24kbps / 16khz files really sounds wonderful B) ), but still is acceptable for normal modem/ISDN users (do not download (much) longer than the playtime is with
56K.. ). And they should easily be able to listen to the files with a portable player, or burn them as a audio-CD...
Currently, I'm playing around with some 64 or 56kbps abr files... And thats most ipmortant to me: What command line would you (all, out there) prefer
?Whats about?
-b 32 -B 160 -a --abr 64 -F --resample 32 --lowpass 16 -h -c ?
(this is still Mpeg
1).
Ok., thats quite a lot, right now.
thanx, .lu
(voice testfiles @
www.tnt.uni-hannover.de/project/mpeg/audio/sqam/ )
getID3()
Jul 14 2003, 13:52
QUOTE(.lu @ Jul 14 2003, 10:36 AM)
Is mono a problem? Do Mp3-Files get larger if I use JointStereo, but the source *.wav is mono?
For CBR/ABR the bitrate is pretty much fixed, so size won't change, but any kind of stereo, joint or not, takes more space to describe than mono, even if both channels are identical, so for VBR size will increase for fixed quality, and for CBR quality will decrease for fixed filesize. Although testing with LAME seems to indicate it's smarter than that - even if you use
-m j on a mono source file, it outputs a mono MP3. Other MP3 encoders may allow you to create joint-stereo output from a mono input.
h.tuehn
Aug 11 2003, 20:25
This topic often gets referenced (or at least Dibrom references it) as the definitive statement on voice-only encoding. It seems, however, entirely focused on the lowest possible kbps settings and the best settings given the bandwidth limitations of dial-up modems.
So, I'm wondering about more ideal/transparent settings:
1. What are the best settings if you don't have to be concerned about modem speeds? What do you want to encode with if you simply want to stick a 4 CD audiobook onto 1 CD for your MP3 player and want near transparency? For the purposes of this topic, the content is strictly vocal and mono, though it may be a singing voice at times (such as an opera singer reading Hamlet). For this, imagine it's your favorite singer reading your favorite book and there is occasionally singing, whispering, sighs, etc. and backup voices for the main characters.
2. There's always talk of resampling in previous discussions as opposed to leaving it at the original sample rate and lowpassing. With a CD source, is the quality better, for example, at resample 22.05 lowpass 11 than 44.1 and 11? What's the point of the resampling if you can just lowpass (since neither size and encoding speed improve and some decoders may even have more difficulty with non 44.1 rates)?
hi h.tuehn ,
perhaps my following test can partly help you:
I encoded a voice file by
sqam and a selfmade one in two ways:
1. 32Khz, 80Kbps, CBR, mono with FhG's MP3Enc (3.1 Demo -
Download, 218KB)
2. 32Khz, 64Kbps, CBR, mono with LAME
With Lame I re-decoded the MP3s to WAV and burned them as an audio CD (Nero). Four persons (non experts) tried to identify the higher compressed one (using excelent speakers) . They really struggeld, sometimes they 'guessed' right.
To sum up: If you want to compress 4 CD WAVs (with voice, visper and singing) to 1 CD MP3, I think you are much above a difficult filesize. I guess, you would be doing well with standart presets, you also use for normal music. What do the otehr think (as I only guess).. .lu
Old Nick
Aug 28 2003, 14:06
QUOTE
What's the point of the resampling if you can just lowpass (since neither size and encoding speed improve and some decoders may even have more difficulty with non 44.1 rates)?
I don't know. People keep saying that Lame is optimized for 44.1 KHz and that you shouldn't use other sampling rates, so it seems logical to just lowpass instead of resampling.
I've tried encoding the "Lord of the Rings" audiobook using this command line:
--alt-preset standard -a --lowpass 10 -b 32
The average bitrate is around 60-70 kbps and it sounds great, so I'll be using this command line with all my audiobooks from now on.
QUOTE
QUOTE
- Do most portable MP3-Player accept files with 16kbps ABR and 11Khz (or similar...) - or must I stick to mpeg1 layer3 to stay compatible?
Hmm.. I'm afraid to say anything certain to this. I'd guess that most portables support this.
I tried a lot: Nearly all current MP3-Players DO play 16kbps ABR and 11Khz (or similar...) mp3-files.
PS:
Still looking for best lame settings @ 80 kbps...
QUOTE(JohnV @ Nov 9 2002, 10:43 PM)
--alt-preset 24 -a --resample 22 --lowpass 7 -Z
Hi JohnV
I tried this line out and it didn't work???
DavidHart
Dec 16 2003, 19:48
Sounds silly but has anybody tried:
--alt-preset voice
I did and it didn't sound too bad but I'm a newbie
DavidHart:
--alt-preset voice
is equal to
--resample 24 --lowpass 12 --noshort yes -mm -b56
(Source: lame.exe --preset longhelp)
it works all right, most commandlines discussed here, were for less than 56kbps (as used in --alt-preset voice).
DavidHart
Dec 18 2003, 11:47
.lu,
Entering
QUOTE
C:\WINDOWS\lame>lame.exe --preset longhelp
gave me:
QUOTE
LAME version 3.90.3 MMX (http://www.mp3dev.org/)
Error: You did not enter a valid profile and/or options with --preset
Available profiles are:
<fast> standard
<fast> extreme
insane
<cbr> (ABR Mode) - The ABR Mode is implied. To use it,
simply specify a bitrate. For example:
"--preset 185" activates this
preset and uses 185 as an average kbps.
Some examples:
or "C:\WINDOWS\LAME\LAME.EXE --preset fast standard <input file> <output file>"
or "C:\WINDOWS\LAME\LAME.EXE --preset cbr 192 <input file> <output file>"
or "C:\WINDOWS\LAME\LAME.EXE --preset 172 <input file> <output file>"
or "C:\WINDOWS\LAME\LAME.EXE --preset extreme <input file> <output file>"
For further information try: "C:\WINDOWS\LAME\LAME.EXE --preset help"
C:\WINDOWS\lame>
what am I doing wrong or is it simply the version of lame that I'm using?
This is a "lo-fi" version of our main content. To view the full version with more information, formatting and images, please
click here.