Help - Search - Members - Calendar
Full Version: encoding the spoken word - settings?
Hydrogenaudio Forums > Lossy Audio Compression > MP3 > MP3 - Tech
ardea
I have a fair number of CD audio books, and also many language samples (converstions on audio CDs) for teaching purposes. Most appear to be in stereo, but with a central audio image. I want to encode these for portable player / car use. Smaller file size would be helpful. Apart from encoding to mono, can anyone suggest space saving parameters for speech only files?

In asking for your help I am assuming portable, clear speech may be less demanding of bits than portable music, for which I use --alt-preset medium with modified v3.90.3, as per the 'recommended' page.

I see v3.96.1 has a 'voice' parameter.......?

Thanks in anticipation

esa372
QUOTE(ardea @ Jun 30 2005, 07:28 AM)
Apart from encoding to mono, can anyone suggest space saving parameters for speech only files?
I recently did a project similar to this. It took a little trial and error, but the settings I ended up with seem to be the best for me.

I started with 16-bit mono WAV files at 22050Hz and converted them to MP3 using LAME 3.96.1 with the following parameters:

-V3 --vbr-new --lowpass 8

These settings create an MP3 file with a bit-rate around 48kbps and an 8kHz low-pass filter, which seems fine for speech.

A typical 45 minute speech will reduce from ~115M (WAV) to ~15M (MP3) in about 35 seconds on my computer (P4 2.8GHz, 1G RAM, Windows XP).

Hope this helps...

~esa
a_aa
Two useful links:

http://www.hydrogenaudio.org/forums/index.php?showtopic=3270

http://www.hydrogenaudio.org/forums/index....showtopic=23282


My favourite, if you require really low bitrates, is: --alt-preset 24 -a --resample 22 --lowpass 7

These should give you more than enough options tongue.gif ?
Sunhillow
As many portable mp3 players do not support sample rates < 32 kHz (mine does not even support 32 kHz), you shoult encode some snippets with different sample rates and test them.
I got quite good results using 3.96.1 and 3.97 (alpha!) with -V7 --resample 44100
a_aa
I might be wrong, but --resample 44100 has no effect if you rip a normal audio book CD - it' IS 44.1 kHz already... I also think that standard -V7 actually should DOWNsample to 32 kHz without switches. blink.gif

Furthermore, you shuld concider using -a, (mono) as it is unlikely that there are any stereo worth taking care of on an audio book. If you downsample to 22 kHz, you shold also be aware of the usefullness of a lowpass at 11 kHz or lower.

I really support your idea on testing the first files on relevant hardware before making GBs of potentially unplayable files, and as you say - there are limitations here! wink.gif

Sunhillow
QUOTE(a_aa @ Jun 30 2005, 08:48 PM)
I might be wrong, but --resample 44100 has no effect if you rip a normal audio book CD - it' IS 44.1 kHz already... I also think that standard -V7 actually should DOWNsample to 32 kHz without switches. blink.gif
*



Yes, --V7 does downsample to 32 kHz, but my player only wants 44.1 or 48 kHz. That's why I need the resample switch.
Especially when listening with headphones, I don't like mono encodings. Spacial reverberation which needs stereo makes the whole thing sound more natural.
a_aa
Sorry, Sunhillow, I didn't read your post properly rolleyes.gif, and got mixed up concerning bitrate and sample rate wink.gif Given the limitation of your hardware, and your preferences with regards to stereo, your setting seems OK to me, now...
But your hardware isn't actually a mp3-player (as I'm sure you're aware of), since it's not in compliance with the mp3 standard....

So, ardea, I'll think that you should test what sample rate and bit rate your hardware support, then you will have to decide on quality vs filesize - that is other words mono vs stereo, lowpass filter/cutoff and chosen bitrate or quality setting... Good luck!!! laugh.gif

Edit: PS - the --voice setting is available also in a 3.90.3 compile - I think it gives mono, 24 kHz sample rate, 12 kHz lowpass and result in something like 56 kbps bitrate
ardea
Thanks for your helpful suggestions and links. I had tried a search on just about every word for 'verbals' except 'speech'!

There's plenty here to start me off in the right direction. LAME is powerful software, supported by many hours of skilled labour - mere thanks does not seem to do this justice! However, for the aging erudite tyro like me, it can appear rather esoteric. With a large selection to choose from, I guess anticipatory knob twiddling can be fun (there, I've said it), but reading too much about it can cause the innocent to miss the on switch for the oscillating grommet widget (if you see what I mean)

Thanks again!
This is a "lo-fi" version of our main content. To view the full version with more information, formatting and images, please click here.
Invision Power Board © 2001-2008 Invision Power Services, Inc.