Help - Search - Members - Calendar
Full Version: speech codecs overview
Hydrogenaudio Forums > Lossy Audio Compression > Speech Codecs
byvaly
Hello,

could be someone so kind and fill in this speech codecs overview, please?

IMHO, it would be G R E A T help for all forum readers. I'll start fill in
with the only one i know, the Speex. Especially Jmvalin's and John33's posts
would be appreciated.

Speex http://www.speex.org/ open source
CELP
HVXC
PureVoice
VoxWare
Twin VQ
LPC

Thank You!
jmvalin
QUOTE (byvaly @ Oct 13 2003, 04:16 AM)
Speex http://www.speex.org/ open source
CELP
HVXC
PureVoice
VoxWare
Twin VQ
LPC

CELP (Code Excited Linear Prediction) is not a codec itself, but a widely used speech coding technique. Speex uses CELP like many other codecs use either CELP or a variant (ACELP being the most common).
I don't know anything about HVXC, PureVoice and VoxWare.
AFAIK, Twin VQ is for audio/music and not speech
LPC (Linear Prediction Coefficients/Linear Predictive Coding) again is not a codec but a (general) technique. There's a (mulitary) standard called LPC10 (LPC vocoder) though. It runs at 2.4 kbps and the quality is very bad.
rjamorim
QUOTE (byvaly @ Oct 13 2003, 06:16 AM)
VoxWare

Voxware isn't properly a codec. It doesn't encode the speech samples, it simulates them. That's why it's called "VoxWare Meta Sound"

QUOTE
Twin VQ


Like jmvalin already posted, TwinVQ is not a speech codec.

Also, you can add to the list:

-GSM (several variants, actually)
-g729 (lots of people are raving about it as the next best thing)
-DSP Group True Speech

There are others, but these are the ones I can remember from the top of my head.

I'm still planning on a vocodec test to be started by november (that is, if jmvalin is still interested in helping me out with this one wink.gif )
getID3()
QUOTE (rjamorim @ Oct 13 2003, 04:03 PM)
Voxware isn't properly a codec. It doesn't encode the speech samples, it simulates them. That's why it's called "VoxWare Meta Sound"

This intrigues me. Can you explain that further?
I remember way back (early 1997) I used to use Voxware for some speech encodings and it seemed pretty good at the time (not having much to compare it to) but RealAudio v3 came out later that year and eclipsed the quality so I switched.
rjamorim
QUOTE (getID3() @ Oct 13 2003, 08:34 PM)
This intrigues me. Can you explain that further?

Finding information about MetaVoice (VoxWare's voice compression technology) these days is a nightmare. VoxWare itself has stopped marketing it's compression technology and tools like ToolVox, TeleVox, Etc.

I saved this page from a mirror a long time ago, I can't find that mirror's location anymore:
http://www.rjamorim.com/rrw/metavoice/metavoice.html

To summarize things, MetaVoice models the components of the human voice ("resonance, pitch, timbre, timing, and character") and stores this modelling inside the stream, instead of just compressing the speech like most codecs do.

Don't forget to click the "Compression Analogy" link, it's quite funny.

Regards;

Roberto.
getID3()
@rjamorim
Thanks for the link on Metavoice! I'd always wondered how you could mess with the playback speed so easily with minimal side effects on the quality of the output.
jmvalin
QUOTE (byvaly @ Oct 13 2003, 04:16 AM)
could be someone so kind and fill in this speech codecs overview, please?

IMHO, it would be G R E A T help for all forum readers. I'll start fill in
with the only one i know, the Speex. Especially Jmvalin's and John33's posts
would be appreciated.

In case it is of interest to someone, these are the speech codecs I do know about

ITU-T:
G.711 PCM 64 kbps (u-law, A-law)
G.721 ADPCM
G.722 ADPCM 48-64 kbps wideband codec
G.722.1 wideband codec by Picturetel
G.722.2 ACELP multi-rate wideband codec (aka AMR-WB), targeted at cell phones
G.723 ADPCM (renamed to G.726 I think)
G.723.1 ACELP 5.3 kbps, 6.3 kbps (used mostly in VoIP)
G.726 ADPCM
G.728 LD-CELP 16 kbps (low-delay CELP)
G.729 CS-ACELP 8 kbps (used mostly in VoIP)

GSM:
GSM-FR (full rate) (RPE-LTP) 13.2 kbps "old" GSM codec for which there's a free implementation, used in many free VoIP apps. This is what most people call the "GSM codec"
GSM-HF (half rate) (VSELP?) low bit-rate GSM codec (used in GSM cell phones)
GSM-EFR (ACELP) ~12 kbps. Latest GSM codec with much better quality than GSM-FR (no free implementation though)

Misc:
AMR-NB 4.7-12 kbps collection of narrowband codecs with possibility to switch bit-rate depending on error rate, used for cell phones
IS-54 VSELP 8 kbps codec used in TDMA cell phones
iLBC 13-15 kbps by GIPS. Free license but not open-source.
DoD MELP 2.4 kbps military standard with decent quality at very low bit-rate
LPC10 2.4 kbps military standard with very poor quality

For a comparison of these codecs with Speex in terms of features (no quality tests yet), see this

This is the meaning of the acronyms:
LPC: Linear Prediction Coefficients / Linear Predictive Coding
CELP: Code Excited Linear Prediction
ACELP: Algebraic CELP
CS-ACELP: Conjugate Structure - ACELP
LD-CELP: Low-Delay CELP
RPE-LTP: Regular Pulse Excitation - Long-Term Prediction
AMR-NB/WB: Adaptive Multi-Rate (narrowband, wideband)
VSELP: Vector-Sum Excited Linear Prediction
MELP: Mixed Excitation Linear Prediction
PCM: Pulse Code Modulation
ADPCM: Adaptive Differential Pulse Code Modulation
magic75
Some additions and minor corrections...
AMR-NB => 4.75-12.2 kbps, 8 codecs.
AMR-WB => 6.6-23.85 kbps, 5 codecs (mentioned as G.722.2 ACELP)
AMR codecs are used in GSM so they could have been mentioned under that heading as well. But they are also used in other cellular standards. I think NB is the same type as GSM-EFR, i.e ACELP. NB cuts of somewhere at 3-4 kHz, and WB somewhere at 7 kHz. The quality difference is huge...

GSM-FR => 13 kbps
GSM-HR => 6.5 kbps (not 100% sure)
GSM-EFR => 12.2 kbps
I am not sure that the speech codec (EFR) itself is much better than FR, but it uses slightly less bits on the radio interface. Bits that instead are used to protect the data from errors in a better way. EFR is the most widely used nowadays. You would have to have a really old phone to be using FR. HR is hardly used at all.
byvaly
Thank you all! I didn't expected so huuuuge response!

Especially i would like to thanks jmvalin and Skymmer for comprehensive help!
jmvalin
QUOTE (magic75 @ Oct 14 2003, 02:18 AM)
I think NB is the same type as GSM-EFR, i.e ACELP. NB cuts of somewhere at 3-4 kHz, and WB somewhere at 7 kHz. The quality difference is huge...

[...]

I am not sure that the speech codec (EFR) itself is much better than FR, but it uses slightly less bits on the radio interface. Bits that instead are used to protect the data from errors in a better way. EFR is the most widely used nowadays. You would have to have a really old phone to be using FR. HR is hardly used at all.

AFAIK, GSM-EFR is the same (or very similar) as the highest AMR-NB bit-rate. As for the quality difference between GSM-FR and GSM-EFR, it is *huge*. I haven't done any formal testing, but to give you an idea, I find (my ear only) GSM-FR to be roughly equivalent to Speex @ 8 kbps, while GSM-EFR is almost as good as Speex @ 15 kbps.
magic75
QUOTE (jmvalin @ Oct 14 2003, 07:41 AM)
AFAIK, GSM-EFR is the same (or very similar) as the highest AMR-NB bit-rate. As for the quality difference between GSM-FR and GSM-EFR, it is *huge*. I haven't done any formal testing, but to give you an idea, I find (my ear only) GSM-FR to be roughly equivalent to Speex @ 8 kbps, while GSM-EFR is almost as good as Speex @ 15 kbps.

I haven't done any listening myself, just picked up what I have heard from colleagues, so you are probably right. Anyway if the difference of the speech codecs themselves is huge, then the difference due to improved error protection on the radio interface is even bigger. (getting a bit OT here, sorry..)

GSM-EFR and AMR-NB 12.2 is almost the same, only minor corrections was made in AMR-NB 12.2.
Bushman
At last it looks like i found someone with knowledge smile.gif

i have a umax DVcamera that uses a dvi_adpcm (0x0011) Intel Corporation codec

i have found a lot of threads all about it but can't find the codec anywhere

please tell me you know where i can get hold of it
wkwai
QUOTE (rjamorim @ Oct 13 2003, 03:03 PM)
Like jmvalin already posted, TwinVQ is not a speech codec.

Twin VQ may not qualify as a speech coder but at 8 kbps, sampling rate 8kHz, speech with background music such as those you often hear on radio broadcasts, can be encoded very well.. In fact it would outperform many LPC based speech codec..

However, the analysis window length of Twin VQ is about 4096 samples and the need for window length switching caused unexceptable encoder delays.. Too long for realtime communications.. However for non-realtime applications Twin VQ would even outperform AAC at those bitrates..
tepples
Any idea how much CPU the common speech codecs use for encoding and decoding, particularly the allegedly patent-free ones?
jmvalin
QUOTE (tepples @ Jan 22 2004, 10:18 PM)
Any idea how much CPU the common speech codecs use for encoding and decoding, particularly the allegedly patent-free ones?

I can only give an idea for Speex. The amount of CPU required depends a lot on the sampling rate and bit-rate used. For 8 kHz/8 kbps, I can encode in real-time on my Pentium-M with about 2-3% CPU. With the fixed-point port, I've been able to do real-time encoding+decoding with about 50% CPU on a 140 MHz StrongArm.
This is a "lo-fi" version of our main content. To view the full version with more information, formatting and images, please click here.
Invision Power Board © 2001-2009 Invision Power Services, Inc.