IPB

Welcome Guest ( Log In | Register )

2 Pages V   1 2 >  
Reply to this topicStart new topic
Lame Settings For Speech?
david_rones
post Sep 1 2002, 17:39
Post #1





Group: Members
Posts: 8
Joined: 1-September 02
Member No.: 3261



Hi.

I'm needing some help for a project converting PowerPoint narrations from .wav to .mp3. Basically, I'd like to know the best settings to use in LAME for speech.

I'd use something like SPEEX, but I need to ultimately stream the files via Flash as .swf files. .SWF only supports .mp3. Also, .SWF files do not support Joint Stereo or VBR. I also will be targeting 24kbps/22050Hz/Mono. Many users will be listening to the .swf's streamed via dial-up modems.

So given those paramaters, I have two questions. First, what PCM setting is the best to start with when we do the narration recordings on the PC? I don't want to capture at too high a quality, because disk space may be at somewhat of a premium, especially since captured streams can run up to an hour in length. So the smallest I can get away with is preferred.

Second, what are the best LAME switches to use, given this will only be speech/voice-based audio. (and also considering the limitations of .swf I stated above.) Again, I'm thinking I want to encode to 24kbps/22050Hz/Mono, but I really don't know what to set the other options to.

Thanks so much!

David
Go to the top of the page
 
+Quote Post
rjamorim
post Sep 1 2002, 18:00
Post #2


Rarewares admin


Group: Members
Posts: 7515
Joined: 30-September 01
From: Brazil
Member No.: 81



If you are going to encode to 22.050Hz, I suggest you use some FhG encoder. Lame is only optimized for 44.100 Hz encodings.


--------------------
Get up-to-date binaries of Lame, AAC, Vorbis and much more at RareWares:
http://www.rarewares.org
Go to the top of the page
 
+Quote Post
david_rones
post Sep 1 2002, 18:08
Post #3





Group: Members
Posts: 8
Joined: 1-September 02
Member No.: 3261



QUOTE (rjamorim @ Sep 1 2002 - 09:00 AM)
I suggest you use some FhG encoder

Yes, but this will be an app that we distribute, and we would like to stay open-source to avoid license fees.
Go to the top of the page
 
+Quote Post
JohnV
post Sep 1 2002, 19:36
Post #4





Group: Developer
Posts: 2797
Joined: 22-September 01
Member No.: 6



Portable compatibility? MPEG1 layer3 only or is MPEG2 layer 3 or MPEG2.5 layer 3 also ok? Other requirements? Speed requirements?

24kbps mp3 will be MPEG-2 layer III.

Well, there really aren't that much to tweak given the restrictions.. cbr/mono/24kbps.

Basic switch would be something like:
lame -a -h -b 24 --nspsytune --resample 22 --lowpass 7

I'll try testing few more settings quickly...

With --lowpass 8 you pretty much need --nspsytune, otherwise it starts to sound too distorted to me.

So, maybe:
lame -a -h -b 24 --nspsytune --resample 22 --lowpass 7 --athtype 2
Another option I found pretty good is:
lame -a -h -b 24 --resample 22 --lowpass 7 -X 1 --athtype 2

The above nspsytune line is a bit muffier. Below (default gpsycho) is maybe a bit clearer but has some higher freq swishing. I guess it's a matter of taste..

Using "higher quality" quantization noise shaping made it just worse: -q0 or -q1, but especially -q0 is known to be broken in Lame 3.90-3.92. Both nspsytune and gpsyho lines are using -X1 quantization noise measurement method option (nspsytune by default), which gives better results than gpsycho's default (-X0).

I guess it depends also from the source. I used 44khz/stereo speech.

[noticed that you couldn't use speex, so edited that away]


--------------------
Juha Laaksonheimo
Go to the top of the page
 
+Quote Post
rjamorim
post Sep 1 2002, 19:59
Post #5


Rarewares admin


Group: Members
Posts: 7515
Joined: 30-September 01
From: Brazil
Member No.: 81



QUOTE (JohnV @ Sep 1 2002 - 03:36 PM)
stereo/mono? portable compatibility? MPEG1 layer3 only or is MPEG2 layer 3 or MPEG2.5 layer 3 also ok? Other requirements?

Check out also speex. It's open source speech codec.
http://speex.sourceforge.net/
Check out Windows binaries from rarewares http://www.inf.ufpr.br/~rja00/ (of course down atm).
Winamp alpha speex plugin:
http://www.saunalahti.fi/~cse/Speex/in_speex.zip
http://www.saunalahti.fi/~cse/Speex/in_speex_src.zip

You can get speex binaries temporarily from here:

http://audio.ciara.us/rarewares/speexbundle.zip

Let's hope I stop being a lazy guy and finish setting up the mirror.


--------------------
Get up-to-date binaries of Lame, AAC, Vorbis and much more at RareWares:
http://www.rarewares.org
Go to the top of the page
 
+Quote Post
david_rones
post Sep 1 2002, 22:00
Post #6





Group: Members
Posts: 8
Joined: 1-September 02
Member No.: 3261



John,

Thanks for your detailed response. And you are right, .swf supports only MPEG-2 layer III.

I wonder how much of a hit the sound will take capturing the original speech at a lower quality, say 22khz, 16bit, mono? I have to consider the user's available disk space on their local machine for this project.

What are you thoughts about the right capture quality setting? I'm attempting to balance original PCM file size with quality of source before encoding.

On other thing to add...there tends to be a lot of background "hiss" in these micropone recordings. Sorry that this newbie doesn't know the technical name for this. smile.gif Any setting in particular that will help with that?

David
Go to the top of the page
 
+Quote Post
JohnV
post Sep 1 2002, 23:44
Post #7





Group: Developer
Posts: 2797
Joined: 22-September 01
Member No.: 6



Hmm, I don't believe there's much quality loss if you capture with 22khz/16bit/mono and encode.
Can't say for sure though.. If you put few short sample .wavs online, I could do some testing.

It could also help to tweak the settings, especially if there are lots of background noise, and you are not gonna do any noise removal process with 3rd party software first. Some settings will definitely sound better than others with lots of background noise...


--------------------
Juha Laaksonheimo
Go to the top of the page
 
+Quote Post
Delirium
post Sep 2 2002, 03:57
Post #8





Group: Members
Posts: 300
Joined: 3-January 02
From: Claremont, CA, USA
Member No.: 891



"Noise removal" algorithms tend to be fairly complex, but if you do some searching you may find some open-source implementations (if something like that isn't too CPU-intensive to use in your app). Most are essentially glorified dynamic bandpass filters -- they subdivide the spectrum into small frequency ranges, look for the ones which "look like hiss" (I'm not entirely sure how this is recognized; perhaps too constant of a sound) and then filter out that region of the spectrum. If you're only using one computer/mic combo for the encoding, you can simplify this process by recording some silence and spectrum-analyzing it to find out where the hiss is concentrated and then just filter out that frequency range. If you need to work on arbitrary computer/mic setups, you'll have to do the more in-depth dynamic analysis though to figure out at runtime where the hiss is located.
Go to the top of the page
 
+Quote Post
kennedyb4
post Sep 2 2002, 04:23
Post #9





Group: Members
Posts: 715
Joined: 3-October 01
Member No.: 180



Thats a great idea. You could also try pre-processing the wavs with the more complex routines of cool edit or other good noise reduction programme.

There is absolutely nothing to waste at 24kbps. sad.gif
Go to the top of the page
 
+Quote Post
Gabriel
post Sep 2 2002, 09:03
Post #10


LAME developer


Group: Developer
Posts: 2950
Joined: 1-October 01
From: Nanterre, France
Member No.: 138



--alt-preset 24 -m m
Go to the top of the page
 
+Quote Post
JohnV
post Sep 2 2002, 11:51
Post #11





Group: Developer
Posts: 2797
Joined: 22-September 01
Member No.: 6



QUOTE (Gabriel @ Sep 2 2002 - 11:03 AM)
--alt-preset 24 -m m

Hmm, in order to comply with the requirements it would have to be 22khz/24kbps cbr.
This sounds pretty decent:

--alt-preset cbr 24 -a --resample 22 --lowpass 7

It's otherwise exactly the same nspsytune-line I mentioned before, but adds --ns-bass -3. It's increasing the quality, so the above line is better than my first nspsytune suggestion. Imo the alt-preset 24's default lowpass (4khz) sounds pretty muffled, even for speech. -m m and -a switches do the same thing (downsample to mono).


--------------------
Juha Laaksonheimo
Go to the top of the page
 
+Quote Post
PatchWorKs
post Sep 2 2002, 13:35
Post #12





Group: Members
Posts: 497
Joined: 2-October 01
Member No.: 168



My vorbis @ 22 KHz, mono sounds better if i preprocess them with Soundprobe: DC offset, resample [22,mono], Expander, Normalization... try yourself !
Go to the top of the page
 
+Quote Post
david_rones
post Oct 1 2002, 23:15
Post #13





Group: Members
Posts: 8
Joined: 1-September 02
Member No.: 3261



Hi again.

This was really great advice, and as such, we are using the following settings as our "main" settings for our application:

--alt-preset cbr 24 -a --resample 22 --lowpass 7

I'm hoping you can help me a little more.

We record the .WAV's at 16bit 22050khz mono. We never have control over the microphone or the environment, as any user can use the software.

We want to be able to offer an even lower quaility/lower bandwidth setting as well...for those users who view the presentations on very low bandwidth connections. Our options given the limitation of the Flash MP3 format are:



So I have two questions:

1) Which bit rate/frequency can we go to that will give us the best reduction is size vs. the main setting above, while still producing a relatively decent sounding result. (It's looking, based on the chart that 16kbps/11025khz/Mono is our best choice with a 33% reduction in required bandwidth. The next step down (8/11/Mono) sounds really bad, unless anyone has some good settings to try.)

2) And what would be the best corresponding settings to use?

If you could help with this, I'd really be very greatful. smile.gif

I'm happy to email you a sample .wav file if you like. (But it is just speech, so if you want to just use your own...either way) smile.gif

Thanks again!

David
Go to the top of the page
 
+Quote Post
JohnV
post Oct 3 2002, 19:40
Post #14





Group: Developer
Posts: 2797
Joined: 22-September 01
Member No.: 6



Ok, 8kbps is too low for Lame..

This is a line which was the best with my speech samples:
--alt-preset cbr 16 -a --resample 11 --lowpass 5 -Z

Yes, some people might wonder why use -Z (here noiseshaping type 1). Obviously the bitrate is so low, that what is logical at a bit higher bitrates does not apply to extreme low bitrate. Using -Z made especially my female speech sample sound better.


--------------------
Juha Laaksonheimo
Go to the top of the page
 
+Quote Post
david_rones
post Oct 3 2002, 20:09
Post #15





Group: Members
Posts: 8
Joined: 1-September 02
Member No.: 3261



> --alt-preset cbr 16 -a --resample 11 --lowpass 5 -Z

Awesome! Thanks so much!

David
Go to the top of the page
 
+Quote Post
JohnV
post Oct 3 2002, 22:12
Post #16





Group: Developer
Posts: 2797
Joined: 22-September 01
Member No.: 6



Well, gotta say that I like this even better:
-b 16 -a --resample 11 --lowpass 5 --athtype 2

It's considerably less noisy, but a bit more metallic. So that's my best line so far.. tongue.gif


--------------------
Juha Laaksonheimo
Go to the top of the page
 
+Quote Post
david_rones
post Oct 4 2002, 04:41
Post #17





Group: Members
Posts: 8
Joined: 1-September 02
Member No.: 3261



Ok. Now I'm gonna change the requirement completely. Now we want to add a higher quality setting above our main setting. So to review, our main setting is at:

--alt-preset cbr 24 -a --resample 22 --lowpass 7

And we capture the .wav at 16bit 22050khz mono. So if we looked at 32, 40, and 48kbps, (based on the chart above of supported .swf mp3 formats) where do we get the most bang for our bandwidth buck, and what settings work best there?

Thanks as always!

David
Go to the top of the page
 
+Quote Post
JohnV
post Oct 4 2002, 13:43
Post #18





Group: Developer
Posts: 2797
Joined: 22-September 01
Member No.: 6



Heh, I just noticed that you can increase the 24kbps quality still quite nicely if you add -Z to the 24kbps line.
I've never until this thread actually tested anything this low bitrate, so it's surprising to notice how some features function totally opposite compared to what one might think..

These are my recommendations so far for:

24kbps speech:
--alt-preset cbr 24 -a --resample 22 --lowpass 7 -Z

16kbps speech:
-b 16 -a --resample 11 --lowpass 5 --athtype 2 -X3


--------------------
Juha Laaksonheimo
Go to the top of the page
 
+Quote Post
Hanky
post Oct 4 2002, 14:28
Post #19





Group: Members (Donating)
Posts: 531
Joined: 18-November 01
From: The Netherlands
Member No.: 481



Should these low bitrate settings be added to the 'List of recommended LAME settings' thread once agreed upon.
There's really nothing below 80 kbps in that list.
BTW Can these tweaks be integrated in the (alt-)presets, to make the statement '--alt-preset (CBR) xx give best quality at bitrate xx' true for all bitrates, also the low end.
Go to the top of the page
 
+Quote Post
JohnV
post Oct 4 2002, 14:58
Post #20





Group: Developer
Posts: 2797
Joined: 22-September 01
Member No.: 6



Notice that I've only tested mono speech here.. Could be that music needs lower lowpass in order to sound even half decent.
And I hope that some other people tries to test these also, in order to verify or better my findings.


--------------------
Juha Laaksonheimo
Go to the top of the page
 
+Quote Post
takehiro
post Oct 4 2002, 18:54
Post #21


LAME developer


Group: Developer
Posts: 74
Joined: 18-May 02
From: Japan
Member No.: 2067



just FYI: LAME before 3.93 has a bug on preecho-prevention when mono mode.
I recommend you to use the latest LAME, if you want to use mono.

PS. I think the last problem on the 3.93 is --preset fast standard.


--------------------
May the source be with you! // Takehiro TOMINAGA
Go to the top of the page
 
+Quote Post
JohnV
post Oct 4 2002, 20:48
Post #22





Group: Developer
Posts: 2797
Joined: 22-September 01
Member No.: 6



QUOTE (takehiro @ Oct 4 2002 - 08:54 PM)
just FYI: LAME before 3.93 has a bug on preecho-prevention when mono mode.
I recommend you to use the latest LAME, if you want to use mono.

Hmm, yeah Lame3.93a is marginally better here with very sharp syllables. Overall, considering the speech quality, the improvement is very minor.

Of course with music coding and with a bit higher quality, this is more important issue.


--------------------
Juha Laaksonheimo
Go to the top of the page
 
+Quote Post
Dibrom
post Oct 4 2002, 21:26
Post #23


Founder (In Absentia)


Group: Admin
Posts: 2938
Joined: 26-August 02
From: Portland, OR
Member No.: 1



QUOTE (takehiro @ Oct 4 2002 - 10:54 AM)
PS. I think the last problem on the 3.93 is --preset fast standard.

Sorry for the off topic post, but if the rest of the developers are waiting for me to fix the fast presets in 3.93 before releasing, I suggest you just go ahead and release now. What I'd prefer would happen is that the fast settings are just disabled in 3.93 with a notice that if people want to use them, that they should use 3.92 instead. Right now, I'm just so busy.. I don't really have time to work on LAME at the moment.


--------------------
I do not read PM. Please contact another admin if you need help.
Go to the top of the page
 
+Quote Post
david_rones
post Oct 4 2002, 22:58
Post #24





Group: Members
Posts: 8
Joined: 1-September 02
Member No.: 3261



QUOTE
These are my recommendations so far for:
24kbps speech:
--alt-preset cbr 24 -a --resample 22 --lowpass 7 -Z

16kbps speech:
-b 16 -a --resample 11 --lowpass 5 --athtype 2 -X3


Thank you. And we're getting this into our code today. Now, could I hit you up for the best speech settings at 32, 40, and 48kbps (all 22050Hz and Mono). I promise we'll be done, and both me and my boss will be very greatful!
B)
Go to the top of the page
 
+Quote Post
DerEber
post Oct 25 2002, 18:53
Post #25





Group: Members
Posts: 12
Joined: 25-October 02
Member No.: 3624



One thing I do for getting Speech file as smal as possible is to aply a noisegate depending on how agressive you are using it you can get files a lot smaller by setting all the breaks bewseen two words to zero.
tongue.gif
Go to the top of the page
 
+Quote Post

2 Pages V   1 2 >
Reply to this topicStart new topic
1 User(s) are reading this topic (1 Guests and 0 Anonymous Users)
0 Members:

 



RSS Lo-Fi Version Time is now: 22nd November 2009 - 03:08