IPB

Welcome Guest ( Log In | Register )

 
Reply to this topicStart new topic
Speex Wide-band Mode, how speex breaks up WB/UWB speech..???
audio_geek
post Jan 9 2006, 14:52
Post #1





Group: Members
Posts: 71
Joined: 17-August 05
From: Bangalore
Member No.: 23959



Hi,
can someone tell me how speex breaks up the Wideband and Ultra-Wideband speech into low band and high band(s).
I understand that it breaks WB speech(16KHz) into two bands i.e. 0-4KHz(LOW) and 4-8KHz(HIGH).

Can somebody tell me how it breaks UWB speech(32KHz).
Does it breaks UWB speech into three bands i.e. 0-4KHz(LOW), 4-8KHz(HIGH) and 8-16KHz(HIGH).


please reply asap

bye
Go to the top of the page
+Quote Post
audio_geek
post Jan 9 2006, 15:01
Post #2





Group: Members
Posts: 71
Joined: 17-August 05
From: Bangalore
Member No.: 23959



if what i wrote above is true then please let me know how the LSFs are quantized...??
it quantize LOW band LSFs with lsp_quant_nb() function.
what about the two HIGH bands ??
does it quantize both the high bands with the function lsp_quant_high() ??
or there are two functions for WB and UWB parts ??
Go to the top of the page
+Quote Post
SebastianG
post Jan 9 2006, 15:14
Post #3





Group: Developer
Posts: 1317
Joined: 20-March 04
From: Göttingen (DE)
Member No.: 12875



answer to first post:

Two-channel critical sampled subband filterbank using a QMF (quadrature mirror filter).
Roughly speaking: You filter the signal twice (in paralell, once with a low pass filter and once with a high pass filter to get 2 signals each containing a half of the spectrum). The more general nyquist theorem tells us, we only need a sampling rate of at least twice the signal bandwidth (the band does not have to start at 0). So we can subsample the two filtered versions. We get one subband sample for each of both bands for two original samples.

Of course it's not possible nor desirable to implement a "perfect brickwall" filter, so, due to the subsampling aliasing will be introduced in both subbands. If special care has been taken during the filter design, this aliasing will cancel itself during the inverse operation almost perfectly.

This is done twice on the 32 kHz signal.
1x32 kHz signal ----split---> 2x16 kHz signal ---split lower band---> 2x8kHz + 1x16 kHz signal
(BTW: ATRAC3 does this three times)

BTW2: There are different classes of filters with different properties. But it's usually not so easy to design them :
QMF -> near orthogonal mapping, linear phase filters, no perfect reconstruction possible (though the error can be made as small as you want at the cost of longer filter kernels)
orthogonal Wavelets -> orthogonal mapping, non-linear phase filters, perfect reconstruction, nice amplitude responses for short filters
bi-orthogonal Wavelets -> NOT orthogonal, linear phase filters, perfect reconstruction

(Appearently you can't have all the goodies at once)


Sebi

This post has been edited by SebastianG: Jan 9 2006, 16:37
Go to the top of the page
+Quote Post
jmvalin
post Jan 13 2006, 00:12
Post #4


Xiph.org Speex developer


Group: Developer
Posts: 473
Joined: 21-August 02
Member No.: 3134



QUOTE (audio_geek @ Jan 9 2006, 10:52 PM)
Hi,
can someone tell me how speex breaks up the Wideband and Ultra-Wideband speech into low band and high band(s).
I understand that it breaks WB speech(16KHz) into two bands i.e. 0-4KHz(LOW) and 4-8KHz(HIGH).

Can somebody tell me how it breaks UWB speech(32KHz).
Does it breaks UWB speech into three bands i.e. 0-4KHz(LOW), 4-8KHz(HIGH) and 8-16KHz(HIGH).


please reply asap

bye
*


For ultra-wideband (0-16 kHz, 32 kHz sampling), I first split the band into wideband (0-8 kHz) and "very high band" (8-16 kHz). Then, the wideband itself is split into low (0-4 kHz) and high (4-8 kHz) band. So there's a total of 3 bands encoded separately. In practice the very high band (8-16 kHz) has only the rough shape of its spectrum encoded with 1.8 kbps.
Go to the top of the page
+Quote Post
audio_geek
post Jan 13 2006, 08:30
Post #5





Group: Members
Posts: 71
Joined: 17-August 05
From: Bangalore
Member No.: 23959



Basically I wanted to know how speex breaks up WB/UWB speech and you answered it well.

Now I have some more questions:
1) I want to play around by changing LPC order of the encoder. Basically I tried encoding and decoding audio(44.1KHz,16bits) with speex and it gives quite good waveform on reconstruction(mind it, I am interested in waveform closeness to original waveform), So I want to change its order and want to see the results. Please tell me how can I change the LPC order and do some testing.

2) Please describe me how can I change the frame size for sppex. And also tell me in which way it should be changed to get a better waveform for CD quality Audio(I know that Speex is not optimized for audio).

3) What I understand is, Speex uses LPC order=8 for high bands(for WB/UWB), is it true ? why doesnt it use same order i.e. 10.
Go to the top of the page
+Quote Post
SebastianG
post Jan 13 2006, 16:24
Post #6





Group: Developer
Posts: 1317
Joined: 20-March 04
From: Göttingen (DE)
Member No.: 12875



What makes one 'waveform' better than another compared to the original 'waveform' ? You know, Speex makes use of an error weighting that's supposed to sound nice. How do you measure 'waveform closeness' ?

Changing LPC orders also requires training/designing new codebooks and acounting for them in the encoder/decoder!

Why should Speex use order 10 LPC filters for higher bands ? the spectral envelope doesn't need to be that accurate (in terms of frequency resolution) for higher frequencies and 8 seems like an okay-choice if not a bit too high -- 6 may also work fine.

Sebi
Go to the top of the page
+Quote Post
audio_geek
post Jan 23 2006, 06:53
Post #7





Group: Members
Posts: 71
Joined: 17-August 05
From: Bangalore
Member No.: 23959



The higher order I was talking about is wrt audio signal. you might not need it in the case of speech but for audio, what I think is, it will be helpful.

Regarding closeness of the waveform, what I mean is the smaller error/residual signal, which I encode with entropy coding and get the losslessly compressed bitstream.


bbye
Go to the top of the page
+Quote Post
SebastianG
post Jan 23 2006, 09:19
Post #8





Group: Developer
Posts: 1317
Joined: 20-March 04
From: Göttingen (DE)
Member No.: 12875



So, that's what you're trying to achieve ? Let me tell you why this is a bad idea:

- You need to ensure that your Speex decoder behaves deterministically on every platform you want to support which is really hard work (Different implementations of the Speex decoder probably produce slightly different outputs. The floating-point decoder gives you the best approximation to your original I suppose. The integer decoder works deterministically. If you don't account for that, the whole thing won't be lossless).

- Encoding a signal via the sum of two signals (quantized + error) is usually a bad idea. Liebchen (the main guy behind MPEG4 ALS) tried that in his diploma thesis (LTAC). I wonder why he ditched that and worked LPAC afterwards .... ;-)

- I really see no advantage of your idea over the standard FLAC/WAVPACK/MPEG4-ALS/LPAC/... approach. These also use short-term decorrelation filters like Speex and code the prediction residual losslessly (except for WavPack hybrid mode).

- Regarding "waveform closeness", it's obvious that there's a trade-off between closeness and bitrate. But it's also worth thinking about the color of the quantization noise you are willing to allow. That's what I ment by weighting. The reason why Speex sounds good is that it uses a special error weighting. But it'll also make your error/residual signal correlated which you should account for/exploit. Simple entropy coding the residual's samples like it was a memoryless source won't do the job.


Sebi

This post has been edited by SebastianG: Jan 26 2006, 14:05
Go to the top of the page
+Quote Post
SebastianG
post Jan 26 2006, 14:10
Post #9





Group: Developer
Posts: 1317
Joined: 20-March 04
From: Göttingen (DE)
Member No.: 12875



QUOTE (audio_geek @ Jan 23 2006, 06:53 AM)
The higher order I was talking about is wrt audio signal. you might not need it in the case of speech but for audio, what I think is, it will be helpful.
*


by the way:
higher orders also come with the risk of singularities/instabilities due to the finite precision arithmetic. LPC analysis may be very badly conditioned for large orders. IMHO LPC analysis/synthesis should stay where it's used currently --- for narrowband speech (or temporal noise shaping which also uses low order filters).

Sebi

edited: typo

This post has been edited by SebastianG: Jan 26 2006, 17:43
Go to the top of the page
+Quote Post

Reply to this topicStart new topic
1 User(s) are reading this topic (1 Guests and 0 Anonymous Users)
0 Members:

 



RSS Lo-Fi Version Time is now: 24th April 2014 - 13:35