Help - Search - Members - Calendar
Full Version: ogg Vorbis and Speech
Hydrogenaudio Forums > Lossy Audio Compression > Ogg Vorbis > Ogg Vorbis - General
AlexanderTG
Hi

What's the best quality setting using ogg Vorbis for speech encoding?

Thanks

Ax
SebastianG
best quality ?
-q10
AlexanderTG
How about with the smallest file size possible, but still being able to clearly hear the spoken words? I am not interested in transparency at all, as long I can clearly hear every word, and the file size should be as small as possible.

Ax
pepoluan
You really should try it yourself.

Since you want it to be as small as possible, I recommend starting from -q 1 down to -q -2

Note: Negative -q values are supported only on aoTuV versions.
kjoonlee
-1 is in Xiph Vorbis.
AlexanderTG
I use the following with lame (mp3) for mono speech recording:
--alt-preset cbr 24 -a --resample 22 --lowpass 7

Is it possible to do something similar with ogg Vorbis?
gameplaya15143
oggenc2 -q -2 --resample 22050 --advanced-encode-option lowpass_frequency=7

^^ that *might* be similar. I forget what the downmix option is... you'll have to use oggenc's help for that one
de Mon
Just use "q" switch with [-2] for space saving and about [0] to [1] to get NOT annoying encodings (I assume you are not going to ABX hard samples - just want to get not hissing and gurgling sound) . Nothing more.

Any additional switches (such as lowpass and resample) would be useless. Just let the encoder do its job. It is tuned pretty fine even for such kind of sound (thanks to Aoyumi).
kennedyb4
You may wish to look at this

http://www.speex.org/

It is optimized for speech.
pieterdewever
I'm using the following setting for this:
-q -2 --resample 32000 --downmix
(IIRC, should be something like that anyway)
Granted, resampling to 32000 Hz and downmixing to mono won't help much, but they do help. After all, it doesn't have to pass any ABX-testing, it only has to be understandable and obvious-artifact-free. Main goal is small filesize. Well, this works for me.
AlexanderTG
Thanks for the info, will try them out as soon as I can.

I would love to use speex but for some reason I can't seem to create a wrapper for it!?

I have been able to create a wrapper program for oggenc, so have decided to go with oggenc if I can get a decent encoding setting for speech.

Ax
mixminus1
QUOTE(de Mon @ Apr 13 2006, 03:23 PM) *

Any additional switches (such as lowpass and resample) would be useless. Just let the encoder do its job. It is tuned pretty fine even for such kind of sound (thanks to Aoyumi).

blink.gif

This is nonsense. Vorbis is *not* optimized for speech at low bitrates - the amount of metallic/swishing artifacts on solo spoken word at -q -2 (or -q -1) with aoTuVb4.51 makes it unlistenable, aside from dramatically altering the sound of the person's voice. Applying a lowpass (at around 6-8 kHz), in particular, helps tremendously in reducing artifacts. Downmixing to mono also helps, as does resampling to 22.050 kHz.

Settings that ended up working quite well in OggDropXPd were:

-q 0
Lowpass @ 6000
Downmix to mono
Resample to 22.050 with Medium quality

This gave an average bitrate of around 25 kbps. There are still some noticeable artifacts (the severity of which varies depending on the voice), but intelligibility is excellent, and the basic character of the person's voice is retained. Setting -q to -1 pushed the bitrate down to around 19 kbps, and while artifacts did increase, intelligibility was still good.

Equivalent command line in oggenc:

-q 0 --resample 22050 -S 1 --downmix --advanced-encode-option lowpass_frequency=6 infile.wav

(Note that "-S 1" selects the Medium resampling algorithm...not sure what the default is if you don't specify.)

Just noticed that this is pretty similar to what gameplaya recommended, so consider this some positive affirmation. wink.gif
pepoluan
QUOTE(mixminus1 @ Apr 15 2006, 12:36 AM) *
-q 0 --resample 22050 -S 1 --downmix --advanced-encode-option lowpass_frequency=6 infile.wav
Hmm... this is quite new to me. If I use -q -1 (and the rest of your command line), will it also retain the speakers voice characteristics?

Oh and I think yes downmixing to mono will definitely help.
mixminus1
QUOTE(pepoluan @ Apr 14 2006, 10:54 AM) *

Hmm... this is quite new to me. If I use -q -1 (and the rest of your command line), will it also retain the speakers voice characteristics?

At -q -1, it becomes more dependent on the voice itself. -q 0 is quite a bit more..."robust" I guess would be the best word - it maintains a fairly consistent level of quality with a wide variety of voices under a variety of recording conditions (close-miked with no room tone, distant mic with lots of room reverb, etc.).
AlexanderTG
Thanks for the info.

Looks like most people here are recommending oggenc2. I went to rarewares.org and have noticed 3 different versions of oggenc2.82. Which one would be best for speech encoding? Or is there a more optimised version which I should be using?

Thanks

Ax
mixminus1
QUOTE(AlexanderTG @ Apr 14 2006, 01:30 PM) *

Thanks for the info.

Looks like most people here are recommending oggenc2. I went to rarewares.org and have noticed 3 different versions of oggenc2.82. Which one would be best for speech encoding? Or is there a more optimised version which I should be using?

Thanks

Ax

aoTuVb4.51 is widely considered (at least on HA smile.gif ) to be the highest-quality version of Vorbis currently available, and particularly at low bitrates, so you would want either "Oggenc2.82 using aoTuVb4.51" from Rarewares, or the optimized Lancer version available from Blacksword's page. The Lancer versions are highly-optimized compiles of the latest version of the aoTuV code - they encode approximately twice as fast.
foxyshadis
QUOTE(AlexanderTG @ Apr 14 2006, 07:51 AM) *

Thanks for the info, will try them out as soon as I can.

I would love to use speex but for some reason I can't seem to create a wrapper for it!?

I have been able to create a wrapper program for oggenc, so have decided to go with oggenc if I can get a decent encoding setting for speech.

Ax

What kind of wrapper are you looking for? Just a command line encoder? Or a encoder dll/directshow decoder/winamp plugin/all of the above? tongue.gif There's a lot of different wrappers, but several exist on http://rarewares.org/others.html
AlexanderTG
I'm looking for almost anything which can expose it's properties in the .Net platform. As I have not been able to find such a thing yet, I decided to create my own wrapper. It's not fully functional yet, but it's getting there! biggrin.gif
de Mon
QUOTE(mixminus1 @ Apr 14 2006, 09:36 AM) *

QUOTE(de Mon @ Apr 13 2006, 03:23 PM) *

Any additional switches (such as lowpass and resample) would be useless. Just let the encoder do its job. It is tuned pretty fine even for such kind of sound (thanks to Aoyumi).

blink.gif

This is nonsense. Vorbis is *not* optimized for speech at low bitrates - the amount of metallic/swishing artifacts on solo spoken word at -q -2 (or -q -1) with aoTuVb4.51 makes it unlistenable, aside from dramatically altering the sound of the person's voice. Applying a lowpass (at around 6-8 kHz), in particular, helps tremendously in reducing artifacts. Downmixing to mono also helps, as does resampling to 22.050 kHz.

Settings that ended up working quite well in OggDropXPd were:

-q 0
Lowpass @ 6000
Downmix to mono
Resample to 22.050 with Medium quality

This gave an average bitrate of around 25 kbps. There are still some noticeable artifacts (the severity of which varies depending on the voice), but intelligibility is excellent, and the basic character of the person's voice is retained. Setting -q to -1 pushed the bitrate down to around 19 kbps, and while artifacts did increase, intelligibility was still good.

Equivalent command line in oggenc:

-q 0 --resample 22050 -S 1 --downmix --advanced-encode-option lowpass_frequency=6 infile.wav

(Note that "-S 1" selects the Medium resampling algorithm...not sure what the default is if you don't specify.)

Just noticed that this is pretty similar to what gameplaya recommended, so consider this some positive affirmation. wink.gif


I am talking about TRANSPARENT speech ONLY recordings. How are you going to get transparent encodings of speech with your settings?
By the way, we have Aoyumi's oppinion:
http://www.hydrogenaudio.org/forums/index....ndpost&p=374733
http://www.hydrogenaudio.org/forums/index....ndpost&p=375111

kjoonlee
If you do a lowpass at 6kHz, there's no point in using a sampling rate higher than 12kHz, is there?
Garf
QUOTE(kjoonlee @ Apr 15 2006, 01:08 PM) *
If you do a lowpass at 6kHz, there's no point in using a sampling rate higher than 12kHz, is there?


Well, yes: time resolution is still better with a higher sampling rate. (This is only meaningful when using a transform codec, but Vorbis is one).
HotshotGG
Vorbis shouldn't even be used for Speech. That's why Speex was developed. I see no sense in using these super tweaks. I personally would just settle for -q 0 and leave it at that though. wink.gif
mixminus1
QUOTE(de Mon @ Apr 15 2006, 03:58 AM) *

I am talking about TRANSPARENT speech ONLY recordings.


...but the original poster isn't - from post #3 in this thread:

QUOTE
How about with the smallest file size possible, but still being able to clearly hear the spoken words? I am not interested in transparency at all, as long I can clearly hear every word, and the file size should be as small as possible.

Scroll bars are your friend...
AlexanderTG
QUOTE(HotshotGG @ Apr 15 2006, 04:34 PM) *

Vorbis shouldn't even be used for Speech. That's why Speex was developed. I see no sense in using these super tweaks. I personally would just settle for -q 0 and leave it at that though. wink.gif


I did originally want to use speex, but I can't figure out how to capture its percentage complete when encoding, which is something I HAVE been able to do with oggenc2.exe

Thanks for the suggestion though!

Ax
slks
This is about as low as you can go:

CODE
-q -2 --resample 11025


In my test file, the result was about 14 kbps. The result didn't sound nice, but the spoken words were still intelligible.

You could resample down to 8 kHz, which would give you a bitrate of 8-10 kbps, but I found that this made the words harder to understand, as the "s" sounds were cut off.

You might need an aoTuV encoder for these settings, I'm not sure if vanilla libvorbis supports -q -2 or not.

And this is all assuming you don't care about artifacts, only intelligibility. You can go up to -q -1 or -q 0, and raise the sampling rate to 22 or 32kHz to get better quality.

edit: yes, sampling rate.
Firon
You mean raise the sampling rate to 22 or 32KHz, because that's one hell of a lowpass.
And yes, you do need to use aoTuV or Lancer, vanilla libvorbis only goes to -1.
kjoonlee
QUOTE(AlexanderTG @ Apr 16 2006, 07:36 PM) *

QUOTE(HotshotGG @ Apr 15 2006, 04:34 PM) *

Vorbis shouldn't even be used for Speech. That's why Speex was developed. I see no sense in using these super tweaks. I personally would just settle for -q 0 and leave it at that though. wink.gif


I did originally want to use speex, but I can't figure out how to capture its percentage complete when encoding, which is something I HAVE been able to do with oggenc2.exe

Thanks for the suggestion though!

Ax

Why not change the speex encoder to print the percentage?
AlexanderTG
QUOTE(kjoonlee @ Apr 17 2006, 06:46 AM) *

Why not change the speex encoder to print the percentage?


How do you do that? Do you mean change the source code for speex? If yes, then I can't do that as I dont know C++ smile.gif

Thanks

Ax
kjoonlee
Yes, that's what I mean.

It's in C, so if you know C, not C++, then you have a chance.
AlexanderTG
Looks like i have no chance then as I read that C is no where near as friendly as C++! Anyone else up for the challenge?
junglemike
QUOTE
Vorbis shouldn't even be used for Speech. That's why Speex was developed. I see no sense in using these super tweaks. I personally would just settle for -q 0 and leave it at that though.

I listen to audio books on Iriver player, that can only understand only mp3 or ogg (and ugly wma) - but not speex.
jarsonic
can Speex be inserted into an ogg container?


edit: ok, nevermind. apparently it already is in an ogg container. If it's not, feel free to correct me. smile.gif
This is a "lo-fi" version of our main content. To view the full version with more information, formatting and images, please click here.
Invision Power Board © 2001-2008 Invision Power Services, Inc.