Best tweaks for encoding speech with Vorbis |
![]() ![]() |
Best tweaks for encoding speech with Vorbis |
Oct 20 2008, 17:26
Post
#1
|
|
|
Group: Members Posts: 2 Joined: 14-April 08 Member No.: 52781 |
The title says it already: I want to encode a lot of spoken messages (mainly just human speech, mono) with Vorbis. I want to go as low as possible, but still not suffer too much from artifact of lossy compression.
My question is: has anybody found a relatively optimal oggenc tweaks to get a nice-sounding audio at low bitrate, but not suffering from lossy artifact? In the past I have typically used 48kbps compression: oggenc -o out.ogg --bitrate=48 --downmix src.wav Something like that. It yields sound that is better than MP3 (in my opinion; I can be wrong since now there are so many more MP3 encoders), but as I listen more often, I realize there is a kind of strange "echo" here and there, especially when there is rich sound like American "are". The strange "echo" is somewhat like the "robot" sound in movies. I can upload a sample Vorbis stream to point that out (please let me know how to upload it, I am new to this forum). I have been using oggenc version 1.0.2 provided by Ubuntu 7.04. The original stream has 44kHz sampling rate. I tried a simple tweak by compiling aotuv beta 5.5 (b5.5_20080330) and use its shared library in place of the stock liboggenc, by invoking this kind of script (Bourne shell script): #!/bin/sh export LD_LIBRARY_PATH=/usr/local/aotuv-b5.5_20080330/lib exec oggenc "$@" Still, the artifact is there. As another attempt, I tried to reduce the bitrate using "ssrc", then invoking oggenc. Here's what I got for oggenc-ing the data stream: CODE Encoding speech: TEST 04 Subdir: /data1/wirawan/test/vorbis/speech04 Sample: pet_30.flac The original filename was cut from LS Peter radio message #30 (1 minute length). Sample Bitrate File size Filename rate Nominal Avg Inflation Actual Inflation (kHz) (kbps) (kbps) (%) (bytes) (%) 16khz/oggenc-32kbps.ogg 16 32 29.81 -6.85 226969 -41.41 16khz/oggenc-48kbps.ogg 16 48 38.84 -19.09 294674 -23.93 16khz/oggenc-64kbps.ogg 16 64 48.52 -24.18 367650 -5.1 16khz/oggenc-80kbps.ogg 16 80 61.93 -22.59 468206 20.86 22khz/oggenc-32kbps.ogg 22 32 39.03 21.96 296110 -23.56 22khz/oggenc-48kbps.ogg 22 48 59.89 24.76 452540 16.82 22khz/oggenc-64kbps.ogg 22 64 75.60 18.12 570709 47.32 22khz/oggenc-80kbps.ogg 22 80 91.83 14.79 692450 78.74 32khz/oggenc-32kbps.ogg 32 32 37.83 18.22 287232 -25.86 Very robotic 32khz/oggenc-48kbps.ogg 32 48 55.48 15.59 419620 8.32 OK, but second man's voice is not great 32khz/oggenc-64kbps.ogg 32 64 65.38 2.16 493593 27.41 32khz/oggenc-80kbps.ogg 32 80 74.31 -7.12 560914 44.79 44khz/oggenc-32kbps.ogg 44 32 37.65 17.64 285854 -26.21 44khz/oggenc-48kbps.ogg 44 48 51.18 6.64 387396 Baseline 44khz/oggenc-64kbps.ogg 44 64 63.96 -0.06 482875 24.65 44khz/oggenc-80kbps.ogg 44 80 70.87 -11.41 534853 38.06 Inflation is the percent kbps inflation of the avg kpbs in comparison to the nominal (target) kbps. File size inflation is against the "baseline" of 44khz/48kbps encoding. Interesting! At lower sampling freq (22 and 32kHz), actually the file size is larger (at 48, 64, 80 kbps). Now this can be a topic on its own, but my main question remains: how to optimize the compression-vs-quality? For your notes, this may be relevant: the original audio may not be directly from a raw source (I mean, like recorded directly, or from faithful CD-quality recording). In the case above, it is actually from a high-quality MP3 mono stream (which I guess is 80kbps mono stream). Linux "file" utility yields the following information (filename is different, but they are of the same kind) for the original file: /data1/wirawan/test/vorbis/speech04 $ file /d/temp/ls/luk/Luke_01.mp3 /d/temp/ls/luk/Luke_01.mp3: MPEG ADTS, layer III, v1, 160 kBits, 44.1 kHz, Monaural Any help and pointer will be appreciated. Unfortunately I don't have time to deeply study this matter, so it is best to go to the point, and point the deeper explanation (web pages, wiki) as a "side note". Wirawan |
|
|
|
Oct 20 2008, 21:38
Post
#2
|
|
|
Group: Members Posts: 116 Joined: 28-September 04 From: Germany Member No.: 17360 |
Is there a special reason why you want to use Vorbis?
Speex http://www.speex.org/ is specifically designed for voice recordings. http://en.wikipedia.org/wiki/Speex |
|
|
|
Oct 21 2008, 02:46
Post
#3
|
|
|
Group: Members Posts: 2 Joined: 14-April 08 Member No.: 52781 |
I did try speex a little bit, but I did not find it very satisfactory. probably I wasn't trying seriously. Another problem, as many other members already point out, is that speex is not widely available on systems other than "computer". It is not yet supported on small hardware like portable audio players. I want to create a copy of OGG file which can be played both on computers and portable audio players alike.
|
|
|
|
Oct 21 2008, 05:02
Post
#4
|
|
![]() Group: Members Posts: 1593 Joined: 24-March 02 From: Revere, MA Member No.: 1607 |
QUOTE I did try speex a little bit, but I did not find it very satisfactory. probably I wasn't trying seriously. Did you try ulta-wideband mode? Speex also has echo cancellation. QUOTE Another problem, as many other members already point out, is that speex is not widely available on systems other than "computer". It supported on the Rockbox open-source firmware, which is used by many DAP. Take a look at the website: http://www.rockbox.org/twiki/bin/view/Main/WhyRockbox This post has been edited by HotshotGG: Oct 21 2008, 05:08 -------------------- College student/IT Assistant
|
|
|
|
Oct 21 2008, 22:28
Post
#5
|
|
![]() Group: Members Posts: 143 Joined: 29-December 05 Member No.: 26719 |
If you're still open to the idea of using mp3 for your application, try LAME. I find the following parameters to provide amazingly small files that are transparent for me:
CODE lame -V8 -m m --resample 24 If you have the time, try it and let us know what you think. |
|
|
|
Dec 2 2008, 16:48
Post
#6
|
|
![]() Group: Members (Donating) Posts: 1448 Joined: 11-February 03 From: Vermont Member No.: 4955 |
FWIW, I have some vorbis files.. don't recall the options used, but they show as mono, 44.1 khz sampling, 30 kbps.
They play ok in DBpoweramp player and my Rockbox Sansa, but won't play in foobar2000 or winamp. If I recall correctly, when I first started playing with mono, DBpoweramp played it back at double speed (like it split the available mono samples between the L and R channels,) but Spoon fixed it promptly when I reported the problem. |
|
|
|
![]() ![]() |
|
Lo-Fi Version | Time is now: 19th June 2013 - 07:35 |