Help - Search - Members - Calendar
Full Version: Encoding voice and speech
Hydrogenaudio Forums > Lossy Audio Compression > Ogg Vorbis > Ogg Vorbis - Tech
gusdabo
Hi everyone,

i made a search but haven't find what i need.
Here is the stuff: I'd like to encode speech and funny sketches. I'd like them to take the minimum space as possible, what should i do?
I have heard about reducing the frequency: what does that do exactely (what are the benefits and the losses)?

thank you very much

Gus
john33
QUOTE(gusdabo @ Feb 4 2003 - 09:05 PM)
Hi everyone,

i made a search but haven't find what i need.
Here is the stuff: I'd like to encode speech and funny sketches. I'd like them to take the minimum space as possible, what should i do?
I have heard about reducing the frequency: what does that do exactely (what are the benefits and the losses)?

thank you very much

Gus

If you're really only dealing with speech, you should take a look at using 'speex'. You can start by visiting http://www.speex.org/ and browsing. There are encoders, decoders and there's a Winamp 2.x plugin as well.
fenterbug
Speaking of encoding speech, I ripped my new Stone Sour CD and noticed that the last track (13... Omega) was just a spoken poem. So I thought I'd get fancy and encode using speex. I get a nasty metallic hum that's quite loud. I ripped with EAC, which grabs a 44.1kHz sample. I tried resampling to 35kHz which speex should like better but could not lose the buzz. Have I missed something obvious?
gusdabo
QUOTE(john33 @ Feb 4 2003 - 11:20 PM)
If you're really only dealing with speech, you should take a look at using 'speex'. You can start by visiting http://www.speex.org/ and browsing. There are encoders, decoders and there's a Winamp 2.x plugin as well.

Yes, i've heard about speex...
But i will buy a portable deevice as soon as ogg vorbis is decoded by one of these hard disk based players, so i'd like to put the files on it and ensure the maximum compatibility.

So, wich settings could be the best with vorbis?
fenterbug
Personally, I use -q 6 because, according to posts on this mb, that's where transparent stereo coupling kicks in. It produces a file slightly smaller than lame -aps, on average. The most important question, though, is how good does it sound to you, and how much storage space are you willing/able to spare for your collection?
DickD
I think Speex and other speech codecs like GSM will never sound perfect because they attempt to achieve maximum compression by modelling the human vocal tract to achieve intelligibility at very low bitrates, and usually with very low latency (encoding delay) for telecommunications, but not to make the encoding audibly transparent.

For a long listen this can be quite seriously annoying. You may be after something more intelligible and hopefully compatible with Ogg hardware players (though they'd need to support low bitrates)

Take a look at this message on the Vorbis mailing list to find some suitable settings for Ogg Vorbis, which was better than practically anything else tested, including speex.

To summarise:

oggenc --downmix --resample 8000 -q -1.00

was very good - about 10-11 kbps. Some artifacts on applause, but very intelligible. (Follow the link to the website for an example)

oggenc --downmix --resample 11000 -q -0.60

retained the full bandwidth of the recording mentioned, eliminating artifacts and hit about 20 kbps.

In trying these settings, I'd say they're both very good for this sort of material. If you want to retain higher frequencies you could probably use around -q -0.60 with --resample 22000 or something for something around 30 kbps.

Dick Darlington
jmvalin
QUOTE(DickD @ Feb 5 2003 - 11:49 AM)
I think Speex and other speech codecs like GSM will never sound perfect because they attempt to achieve maximum compression by modelling the human vocal tract to achieve intelligibility at very low bitrates, and usually with very low latency (encoding delay) for telecommunications, but not to make the encoding audibly transparent.

You've obviously never tried Speex at 24.6 kbps in narrowband (8 kHz) or 42 kbps in wideband (16 kHz). I bet you can't tell the difference for anything other than music.
This is a "lo-fi" version of our main content. To view the full version with more information, formatting and images, please click here.
Invision Power Board © 2001-2009 Invision Power Services, Inc.