Help - Search - Members - Calendar
Full Version: Audiobook Encoding From CD
Hydrogenaudio Forums > Lossy Audio Compression > MP3 > MP3 - Tech
Pamel
After searching around the forums for a while and such, I came up with some settings for trying to encode audiobooks. Unfortunately, when I tried them, the results where not at all what I expected, so I'm looking for help to understand them and refine my options.

The HA Wiki. says that -abr is best for the lower bitrates/voice, and that the "--preset voice" maps to "-abr 56 -mm".

Most of the forum posts about this are pretty old and focus either on older options, or options to work around bugs in older versions. There are also several suggestions about hardware limitations that require specific settings. Still, they almost all seem to suggest a -vbr where possible. There are also various --lowpass, mono, and resample suggestions.

I am using an MP3 player, but I have yet to come upon any limitation of it playing files, so that isn't a concern to me. However, it will only play MP3, no Vorbis, Speex, etc. My audiobooks are just people talking, so they are pretty simple. As it is a single person, I'm perfectly fine with mono. The time to encode is not a concern to me, and they are coming from CD so the starting sample rate is 44.1kHz. I want the audio to be as close to the audio on the CD as possible, where distinguishing the two is difficult under normal circumstances, but have the space as small as possible (within that constraint).

I downloaded and installed foobar2000_0.9.4.2.exe and the 3.97 LAME binary from Rarewares. CDs were ripped using fb2k strait into LAME. I used the ABX utility in fb2k to confirm what I heard easily with my ears.

I used a base LAME command line of:
-S --noreplaygain -V 3 --vbr-new --lowpass 8 -mm --resample 22.05 -q 0 - %d

This resulted in a somewhat muffled sound. Where the man's voice hissed, such as where the letter "S" was used, was much less 'hissy'. I noticed it right away. I also tried the -abr option listed in the wiki, but it sounded the same.

I found that both the resample option and the lowpass option could both individually create that muffled sound. If I used neither, the sound went away. I did not try varying their values though. I also found that after removing those options, I could not tell the difference between -V0 and -V6. I found that I could use this command line:
-S --noreplaygain -V6 --vbr-new -mm -q 0 - %d

And I can't tell the difference between that and the CD; and strangely it is even almost the same bitrate (57 kbps versus 55 kbps from my original command line). I'm looking for ways to improve this though. Unfortunately the permutations of various switches is to great for me to just try them all so I need suggestions of what should work best.

Here is a sample of audio from my CD (4MB). You could see what I mean in the first 15 seconds of that.

Also, is there a way to just set a quality setting? I know with Vorbis you have the option of setting a specific quality, and it will encode to that quality no matter what the resulting bitrate.
Megaman
Look at this:

http://www.hydrogenaudio.org/forums/index.php?showtopic=5716

Personal answer from glen was:

Thank you!
I tried your settings for voice encoding in fastencc.exe:
-dm -hq -br 64000
and they give very nice results, almost perfect compared to the original.


Probably there's something better nowadays, but fastencc.exe worked pretty well 4 years ago.
AndyH-ha
I've done hundreds of hours worth of spoke audio this way. People who have listened to the results find no fault in clarity, pleasantness, or ease of listening. One has to listen very closely to find any difference from the original audio CD.

-V 8 --vbr-new --resample 22 --lowpass 11 --noreplaygain
2Bdecided
-m m -V 2 --vbr-new

should already give comparatively low bitrates on speech.

Just increase the V value (decrease the quality) as you want.

Throw in a --resample if you want to lower the quality still further, but note that lower quality -V settings resample anyway.

AndyH-ha - I think the --lowpass in your command line is redundant. Lame already cuts well below nyquist at V2, never mind at V8!!!!

Cheers,
David.
Pamel
After a little more testing, I managed to narrow down where I can tell the difference. I started with this as the base:
-V6 --vbr-new -mm -q 0 --noreplaygain

As mentioned before, I can't tell the difference between this and the CD. Then I adjusted only the V setting to 7, 8, and 9. Making sure with ABX testing, 7 was indistinguishable for me, and 8 was distinguishable 100%, but it was a pretty small difference. 9 was an obviously different sound. According to this chart, which is for LAME 3.95.1 but I assume still holds for 3.97, these are the auto adjustments:
CODE

Switch               target  lowpass resample
-V 6 --vbr-new         115    16000
-V 7 --vbr-new         100    14900   32000
-V 8 --vbr-new          85    12500   32000
-V 9 --vbr-new          65    10000   24000


So I can't tell the difference if the audio is --resampled to 32kHz, but as I mentioned in my first post I can tell easily at 22.05kHz. I also can't tell a --lowpass 14900, but I can at 12500.

To test a little further, I compared standard -V8 to -V8--lowpass 10 and the difference was very pronounced to me. So it appears that my threshold where I can't tell a difference on this sample is about --resample 32 --lowpass 15.

I guess that leaves three questions:

1. Was there a problem with my methodologies or conclusions?
2. Were there some other settings that I should have tried?
3. Why does the HA Wiki suggest -abr for speech if not a single person in the forums suggests it?
AndyH-ha
Actually, I produce exactly what I want in the WAV file before I give it over to encoding: 22050Hz mono, which of course has the Nyquist limit of 11025Hz. I use the switches to prevent LAME from making any changes. Results might be comparable if I just let LAME have at it, but I know what I want, and I like the results.

Doing this, when I decode the mp3, I can see that the higher frequencies (i.e. 11kHz) have not been reduced significantly.
2Bdecided
If you don't want lame to do the filtering that it believes is beneficial, you should/could use -k.

Cheers,
David.
This is a "lo-fi" version of our main content. To view the full version with more information, formatting and images, please click here.
Invision Power Board © 2001-2008 Invision Power Services, Inc.