I have been fiddling around with Lame a bit, and find that the presets for speech / spoken word work for hi-quality speech (56k) and medium-quality speech (25k). In these examples, your_podcast.wav is the raw wav file of an audio file consisting mainly of spoken word
To make a high-bandwidth (56k) podcast:
lame --preset voice your_podcast.wav your_podcast.mp3
This will make an excellent-sounding mp3; this will even sound good with music in the background.
This will make a mp3, alas, which can not be listened to in real time over dialup. To work around this, we can encode it at around 25 kbps:
lame --preset sw your_podcast.wav your_podcast.mp3
"sw", I presume, stands for shortwave. This won't be quite as clear as the above 56k mp3, but will be perfectly listenable (even with music in the background), and has the advantage of being downloadable in real-time over dialup (as long as the user is able to dial in at a fast speed).
Lame also has a preset for encoding around 16k, which has a lot of compression artifacts. I don't reccomend this for anything besides spoken voice without any background music:
lame --preset phone your_podcast.wav your_podcast.mp3
Now, I have been fiddling around with Lame, and got a setting which is even more compact for just spoken word, and gives one telephone-quality audio (even audio with background music, to boot). The sound of an mp3 encoded this way reminds me of those old 1950 transitior radios. The incantation is:
lame --abr 12 -a --resample 11 --lowpass 2.5 --highpass .2 -B 16 your_podcast.wav podcast.mp3
The bitrate of something encoded with this is between 12kbps and 13kbps. The audio doesn't sound very good, but is still comprehensible. This is offset by the fact that an hour of audio takes less than six megabytes, and can be downloaded in real-time over a dialup with enough bandwidth to spare to allow the user to surf the web or what not.
In English, the settings mean:
--abr 12: "An average bitrate of 12kbps" (Lame actually makes a file a little over 12kbps in size; --abr is a lowball figure at low bitrates)
-a: Mono audio (this actually isn't needed; Lame knows to downsample down to mono at this bitrate)
--resample 11: Lame doesn't know to downsample to a sampling rate lower than 16khz; this forces Lame to downsample down to 11khz
--lowpass 2.5: We have a low-pass filter with a cutoff of 2.5 khz. In other words, no audio higher than 2.5 khz is let through. This muffles the sound a bit, but mostly eliminates the metallic sound of low bitrate mp3s.
--highpass .2: This is a high pass filter with a cutoff of 200hz. This gets rids of the lower frequencies so there are less frequencies to encode; the human brain knows how to reconstruct the lower frequencies.
-B 16: Do not have any frames larger than 16kbps. This reduces the size of the file by about 10%-20%. While it makes to have larger frames when encoding music without any audible artifacts,
it doesn't make sense when trying to make an spoken voice file as small as possible.
--
Lame has a bug where it cuts off the last second or so of audio; while not annoying when encoding music, it is very annoying when recording audio at low bitrates. The amount of audio cut off depends on the bit rate of the mp3; more audio is cut off at lower bitrates. I work around this bug by adding 1.25 seconds of silence at the end of a file before encoding it; this can be (with a bit of trouble) be automated:
sox file.wav file.raw
rm file.wav
dd bs=44100 count=5 if=/dev/zero of=silence.raw
cat silence.raw >> file.raw
rm silence.raw
sox -r 44100 -c 2 -w -s file.raw file.wav
rm file.raw
- Sam
