speech codecs overview |
![]() ![]() |
speech codecs overview |
Oct 13 2003, 10:16
Post
#1
|
|
|
Group: Members Posts: 21 Joined: 13-October 03 Member No.: 9279 |
Hello,
could be someone so kind and fill in this speech codecs overview, please? IMHO, it would be G R E A T help for all forum readers. I'll start fill in with the only one i know, the Speex. Especially Jmvalin's and John33's posts would be appreciated. Speex http://www.speex.org/ open source CELP HVXC PureVoice VoxWare Twin VQ LPC Thank You! |
|
|
|
Oct 13 2003, 23:12
Post
#2
|
|
|
Xiph.org Speex developer Group: Developer Posts: 303 Joined: 21-August 02 Member No.: 3134 |
QUOTE (byvaly @ Oct 13 2003, 04:16 AM) CELP (Code Excited Linear Prediction) is not a codec itself, but a widely used speech coding technique. Speex uses CELP like many other codecs use either CELP or a variant (ACELP being the most common). I don't know anything about HVXC, PureVoice and VoxWare. AFAIK, Twin VQ is for audio/music and not speech LPC (Linear Prediction Coefficients/Linear Predictive Coding) again is not a codec but a (general) technique. There's a (mulitary) standard called LPC10 (LPC vocoder) though. It runs at 2.4 kbps and the quality is very bad. |
|
|
|
Oct 14 2003, 00:03
Post
#3
|
|
![]() Rarewares admin Group: Members Posts: 7515 Joined: 30-September 01 From: Brazil Member No.: 81 |
QUOTE (byvaly @ Oct 13 2003, 06:16 AM) VoxWare Voxware isn't properly a codec. It doesn't encode the speech samples, it simulates them. That's why it's called "VoxWare Meta Sound" QUOTE Twin VQ Like jmvalin already posted, TwinVQ is not a speech codec. Also, you can add to the list: -GSM (several variants, actually) -g729 (lots of people are raving about it as the next best thing) -DSP Group True Speech There are others, but these are the ones I can remember from the top of my head. I'm still planning on a vocodec test to be started by november (that is, if jmvalin is still interested in helping me out with this one -------------------- Get up-to-date binaries of Lame, AAC, Vorbis and much more at RareWares:
http://www.rarewares.org |
|
|
|
Oct 14 2003, 00:34
Post
#4
|
|
|
getID3() developer Group: Developer Posts: 252 Joined: 20-September 02 From: Kingston, ON Member No.: 3413 |
QUOTE (rjamorim @ Oct 13 2003, 04:03 PM) Voxware isn't properly a codec. It doesn't encode the speech samples, it simulates them. That's why it's called "VoxWare Meta Sound" This intrigues me. Can you explain that further? I remember way back (early 1997) I used to use Voxware for some speech encodings and it seemed pretty good at the time (not having much to compare it to) but RealAudio v3 came out later that year and eclipsed the quality so I switched. -------------------- getID3() = PHP audio & video metadata parser: http://getid3.sourceforge.net
Current version: v1.7.0 (released January 19, 2004) |
|
|
|
Oct 14 2003, 01:22
Post
#5
|
|
![]() Rarewares admin Group: Members Posts: 7515 Joined: 30-September 01 From: Brazil Member No.: 81 |
QUOTE (getID3() @ Oct 13 2003, 08:34 PM) This intrigues me. Can you explain that further? Finding information about MetaVoice (VoxWare's voice compression technology) these days is a nightmare. VoxWare itself has stopped marketing it's compression technology and tools like ToolVox, TeleVox, Etc. I saved this page from a mirror a long time ago, I can't find that mirror's location anymore: http://www.rjamorim.com/rrw/metavoice/metavoice.html To summarize things, MetaVoice models the components of the human voice ("resonance, pitch, timbre, timing, and character") and stores this modelling inside the stream, instead of just compressing the speech like most codecs do. Don't forget to click the "Compression Analogy" link, it's quite funny. Regards; Roberto. This post has been edited by rjamorim: Dec 7 2003, 23:44 -------------------- Get up-to-date binaries of Lame, AAC, Vorbis and much more at RareWares:
http://www.rarewares.org |
|
|
|
Oct 14 2003, 01:25
Post
#6
|
|
![]() Group: Members Posts: 109 Joined: 11-June 03 Member No.: 7132 |
Here is some links that may help you:
http://www.speech.cs.cmu.edu/comp.speech/S...peechlinks.html http://www.phon.ucl.ac.uk/home/shl10/andy2/&wei.htm http://fife.speech.cs.cmu.edu/comp.speech/ -------------------- Gabber, Jazz and IDM
|
|
|
|
Oct 14 2003, 01:40
Post
#7
|
|
|
getID3() developer Group: Developer Posts: 252 Joined: 20-September 02 From: Kingston, ON Member No.: 3413 |
@rjamorim
Thanks for the link on Metavoice! I'd always wondered how you could mess with the playback speed so easily with minimal side effects on the quality of the output. -------------------- getID3() = PHP audio & video metadata parser: http://getid3.sourceforge.net
Current version: v1.7.0 (released January 19, 2004) |
|
|
|
Oct 14 2003, 06:37
Post
#8
|
|
|
Xiph.org Speex developer Group: Developer Posts: 303 Joined: 21-August 02 Member No.: 3134 |
QUOTE (byvaly @ Oct 13 2003, 04:16 AM) could be someone so kind and fill in this speech codecs overview, please? IMHO, it would be G R E A T help for all forum readers. I'll start fill in with the only one i know, the Speex. Especially Jmvalin's and John33's posts would be appreciated. In case it is of interest to someone, these are the speech codecs I do know about ITU-T: G.711 PCM 64 kbps (u-law, A-law) G.721 ADPCM G.722 ADPCM 48-64 kbps wideband codec G.722.1 wideband codec by Picturetel G.722.2 ACELP multi-rate wideband codec (aka AMR-WB), targeted at cell phones G.723 ADPCM (renamed to G.726 I think) G.723.1 ACELP 5.3 kbps, 6.3 kbps (used mostly in VoIP) G.726 ADPCM G.728 LD-CELP 16 kbps (low-delay CELP) G.729 CS-ACELP 8 kbps (used mostly in VoIP) GSM: GSM-FR (full rate) (RPE-LTP) 13.2 kbps "old" GSM codec for which there's a free implementation, used in many free VoIP apps. This is what most people call the "GSM codec" GSM-HF (half rate) (VSELP?) low bit-rate GSM codec (used in GSM cell phones) GSM-EFR (ACELP) ~12 kbps. Latest GSM codec with much better quality than GSM-FR (no free implementation though) Misc: AMR-NB 4.7-12 kbps collection of narrowband codecs with possibility to switch bit-rate depending on error rate, used for cell phones IS-54 VSELP 8 kbps codec used in TDMA cell phones iLBC 13-15 kbps by GIPS. Free license but not open-source. DoD MELP 2.4 kbps military standard with decent quality at very low bit-rate LPC10 2.4 kbps military standard with very poor quality For a comparison of these codecs with Speex in terms of features (no quality tests yet), see this This is the meaning of the acronyms: LPC: Linear Prediction Coefficients / Linear Predictive Coding CELP: Code Excited Linear Prediction ACELP: Algebraic CELP CS-ACELP: Conjugate Structure - ACELP LD-CELP: Low-Delay CELP RPE-LTP: Regular Pulse Excitation - Long-Term Prediction AMR-NB/WB: Adaptive Multi-Rate (narrowband, wideband) VSELP: Vector-Sum Excited Linear Prediction MELP: Mixed Excitation Linear Prediction PCM: Pulse Code Modulation ADPCM: Adaptive Differential Pulse Code Modulation |
|
|
|
Oct 14 2003, 08:18
Post
#9
|
|
|
Group: Members Posts: 511 Joined: 2-December 02 Member No.: 3959 |
Some additions and minor corrections...
AMR-NB => 4.75-12.2 kbps, 8 codecs. AMR-WB => 6.6-23.85 kbps, 5 codecs (mentioned as G.722.2 ACELP) AMR codecs are used in GSM so they could have been mentioned under that heading as well. But they are also used in other cellular standards. I think NB is the same type as GSM-EFR, i.e ACELP. NB cuts of somewhere at 3-4 kHz, and WB somewhere at 7 kHz. The quality difference is huge... GSM-FR => 13 kbps GSM-HR => 6.5 kbps (not 100% sure) GSM-EFR => 12.2 kbps I am not sure that the speech codec (EFR) itself is much better than FR, but it uses slightly less bits on the radio interface. Bits that instead are used to protect the data from errors in a better way. EFR is the most widely used nowadays. You would have to have a really old phone to be using FR. HR is hardly used at all. |
|
|
|
Oct 14 2003, 11:23
Post
#10
|
|
|
Group: Members Posts: 21 Joined: 13-October 03 Member No.: 9279 |
Thank you all! I didn't expected so huuuuge response!
Especially i would like to thanks jmvalin and Skymmer for comprehensive help! |
|
|
|
Oct 14 2003, 16:41
Post
#11
|
|
|
Xiph.org Speex developer Group: Developer Posts: 303 Joined: 21-August 02 Member No.: 3134 |
QUOTE (magic75 @ Oct 14 2003, 02:18 AM) I think NB is the same type as GSM-EFR, i.e ACELP. NB cuts of somewhere at 3-4 kHz, and WB somewhere at 7 kHz. The quality difference is huge... [...] I am not sure that the speech codec (EFR) itself is much better than FR, but it uses slightly less bits on the radio interface. Bits that instead are used to protect the data from errors in a better way. EFR is the most widely used nowadays. You would have to have a really old phone to be using FR. HR is hardly used at all. AFAIK, GSM-EFR is the same (or very similar) as the highest AMR-NB bit-rate. As for the quality difference between GSM-FR and GSM-EFR, it is *huge*. I haven't done any formal testing, but to give you an idea, I find (my ear only) GSM-FR to be roughly equivalent to Speex @ 8 kbps, while GSM-EFR is almost as good as Speex @ 15 kbps. |
|
|
|
Oct 15 2003, 08:09
Post
#12
|
|
|
Group: Members Posts: 511 Joined: 2-December 02 Member No.: 3959 |
QUOTE (jmvalin @ Oct 14 2003, 07:41 AM) AFAIK, GSM-EFR is the same (or very similar) as the highest AMR-NB bit-rate. As for the quality difference between GSM-FR and GSM-EFR, it is *huge*. I haven't done any formal testing, but to give you an idea, I find (my ear only) GSM-FR to be roughly equivalent to Speex @ 8 kbps, while GSM-EFR is almost as good as Speex @ 15 kbps. I haven't done any listening myself, just picked up what I have heard from colleagues, so you are probably right. Anyway if the difference of the speech codecs themselves is huge, then the difference due to improved error protection on the radio interface is even bigger. (getting a bit OT here, sorry..) GSM-EFR and AMR-NB 12.2 is almost the same, only minor corrections was made in AMR-NB 12.2. |
|
|
|
Oct 20 2003, 23:00
Post
#13
|
|
|
Group: Members Posts: 3 Joined: 20-October 03 Member No.: 9397 |
At last it looks like i found someone with knowledge
i have a umax DVcamera that uses a dvi_adpcm (0x0011) Intel Corporation codec i have found a lot of threads all about it but can't find the codec anywhere please tell me you know where i can get hold of it |
|
|
|
Dec 2 2003, 09:30
Post
#14
|
|
|
MPEG4 AAC developer Group: Developer Posts: 398 Joined: 1-June 03 Member No.: 6943 |
QUOTE (rjamorim @ Oct 13 2003, 03:03 PM) Like jmvalin already posted, TwinVQ is not a speech codec. Twin VQ may not qualify as a speech coder but at 8 kbps, sampling rate 8kHz, speech with background music such as those you often hear on radio broadcasts, can be encoded very well.. In fact it would outperform many LPC based speech codec.. However, the analysis window length of Twin VQ is about 4096 samples and the need for window length switching caused unexceptable encoder delays.. Too long for realtime communications.. However for non-realtime applications Twin VQ would even outperform AAC at those bitrates.. |
|
|
|
Jan 23 2004, 04:18
Post
#15
|
|
|
Group: Members Posts: 3 Joined: 23-January 04 Member No.: 11452 |
Any idea how much CPU the common speech codecs use for encoding and decoding, particularly the allegedly patent-free ones?
|
|
|
|
Jan 24 2004, 08:07
Post
#16
|
|
|
Xiph.org Speex developer Group: Developer Posts: 303 Joined: 21-August 02 Member No.: 3134 |
QUOTE (tepples @ Jan 22 2004, 10:18 PM) Any idea how much CPU the common speech codecs use for encoding and decoding, particularly the allegedly patent-free ones? I can only give an idea for Speex. The amount of CPU required depends a lot on the sampling rate and bit-rate used. For 8 kHz/8 kbps, I can encode in real-time on my Pentium-M with about 2-3% CPU. With the fixed-point port, I've been able to do real-time encoding+decoding with about 50% CPU on a 140 MHz StrongArm. |
|
|
|
![]() ![]() |
|
Lo-Fi Version | Time is now: 22nd November 2009 - 10:53 |