Help - Search - Members - Calendar
Full Version: General Properties of Audio
Hydrogenaudio Forums > Hydrogenaudio Forum > General Audio
audio_geek
Hi, Can someone tell me or suggest me some links where I can find the answers to the questions like..
1. In speech we take around 20ms frame for prediction. What is the proper frame length(in time) for 44.1KHz audio(hereafter audio refers to 44.1KHz, 16bit audio; CD quality)?
2. Unlike speech, audio will have more than one frequency at a time so how do predict pitch for that? So we need to consider very small frame length for which the pitch does not change?
3. Speech follows a peculiar spectral pattern. Ploting the speech spectral envelope will have 3 peaks decreasing in magnitude. Does Audio follow any such pattern which helps in predictio using FIR Linear Prediction??
5. We model vocal tract for Speech source modeling in LPC. Do we model the tube of the music production instrument in case of the Audio when we use LPC model for it?
4. Other properties of audio which are used in linear predicion model for music.

p.s.: I understand that most audio coders use DCT/MDCT based algorithms but I am interested in Linear Prediction of audio, which is used (atleast) in lossless coding of audio like FLAC.

---
dev
audio_geek
I know that the questions I asked are very trivial so you could also suggest me links/papers/tutorials for this.

---
dev
Sunhillow
Hi,

I think there are some flaws.
of course there are more than 3 spectral lines which do not necessarily decrease in amplitude. The envelope (look for "formant" in Wikipedia) is controlled by all parts of the mouth, esp. tongue, lips and yaw.
Unvoiced sounds like "s" are more like pink noise going through the formant filter of the mouth.
audio_geek
Yes, its tru that there are more than 3 spectral lines. I think its 3 lines below 3KHz.
anyway,
I am interested to know more about the audio signal spectral analysis. What are the properties of audio signal which enable us to use LPC in audio signal also.
Do we model the source there too? I mean the instrument which is used for the music??
Aren't there more than one instruments being used at a time? so how exaclt we justify using LPC for audio signal???
SebastianG
Recall that in LPC you usually do something like this:

encoder:
predicted = sample_{n-1} * lpc1 + sample_{n-2} * lpc2 + ...
residual = sampls_{n} - predicted
encode(residual)

decoder:
predicted = sample_{n-1} * lpc1 + sample_{n-2} * lpc2 + ...
residual = decode()
sample_{n} = predicted + residual

Now, realize that the encoder practically applies a FIR filter on the signal to produce the residual and the decoder applies an all-pole IIR filter on the residual to recreate your original.

"Analysis" (encoder):
A(z) = 1 - lpc1*z^{-^1} - lpc2*z^{-^2} - ...

"Synthesis" (decoder):
S(z) = 1 / A(z)

Usually the LPC analysis filters' goal is to reduce the residual's energy. Practically the analysis filter reduces the inter-sample correlation so that the residual is orthogonal to the prediction. Recall that a signal with uncorrelated samples is "white" (flat power distribution in the frequency domain)

What does this mean now?
The LPC analysis filter is flattening the PSD of your signal ("whitening") while the synthesis filter restores ("coloring" the residual) the original PSD (power stepctum density/distribution).

Application for speech coding:
Any voiced speech segment can be modeled by a filtered buzzer (pulse train). The buzzer's harmonics all have the same amplitude before the filtering. The filter (LPC synthesis) is used to color the signal so it'll sound like a vowel of your choice. Non-voiced sounds can be modeled by white noise that passes a filter. In any you can calculate the filter via an LPC analysis. Check data-compression.com/speech.shtml


HTH,
Seb
audio_geek
QUOTE (SebastianG @ May 17 2006, 15:16) *
Application for speech coding:
Any voiced speech segment can be modeled by a filtered buzzer (pulse train). The buzzer's harmonics all have the same amplitude before the filtering. The filter (LPC synthesis) is used to color the signal so it'll sound like a vowel of your choice. Non-voiced sounds can be modeled by white noise that passes a filter. In any you can calculate the filter via an LPC analysis. Check data-compression.com/speech.shtml


HTH,
Seb


I totally agree with you and understand what you have said. My question is whether audio signal(i.e. CD quality music signal and NOT speech) also follow the same principle as speech. The doubt is this that music is produced with instrumets and many instruments at a time having different pitch. So, will the spectral distribution of music be similar to that of the speeech ???
how do u have a unique pitch at a point of time in music ? Or you take a small frame and assume that the pitch is invariant for a very small duration ??
Please reply asap.

---
dev
SebastianG
QUOTE (audio_geek @ May 21 2006, 00:35) *
I totally agree with you and understand what you have said. My question is whether audio signal(i.e. CD quality music signal and NOT speech) also follow the same principle as speech.

The paragraph about parametric speech coding was just an example. LPC analysis (whitening) and synthesis (coloring) does work on music as well, of course.

QUOTE (audio_geek @ May 21 2006, 00:35) *
The doubt is this that music is produced with instrumets and many instruments at a time having different pitch. So, will the spectral distribution of music be similar to that of the speeech ???

LPC still works in this case, but pitch prediction (like it's done in Speex) doesn't because the residual -- the sum of many periodic signals (at different frequencies) -- won't usually be periodic and thus not easily predictable in the time domain. (Big surprise!)

Sebi
audio_geek
QUOTE (SebastianG @ May 22 2006, 13:00) *
QUOTE (audio_geek @ May 21 2006, 00:35) *

I totally agree with you and understand what you have said. My question is whether audio signal(i.e. CD quality music signal and NOT speech) also follow the same principle as speech.

The paragraph about parametric speech coding was just an example. LPC analysis (whitening) and synthesis (coloring) does work on music as well, of course.

QUOTE (audio_geek @ May 21 2006, 00:35) *
The doubt is this that music is produced with instrumets and many instruments at a time having different pitch. So, will the spectral distribution of music be similar to that of the speeech ???

LPC still works in this case, but pitch prediction (like it's done in Speex) doesn't because the residual -- the sum of many periodic signals (at different frequencies) -- won't usually be periodic and thus not easily predictable in the time domain. (Big surprise!)

Sebi


if LP analysis does not work well for audio how does lossless audio coding works ?? the first step of the lossless audio coding is LP analysis only. It finds residual signal with this method which is then entropy encoded.
SebastianG
QUOTE (audio_geek @ May 22 2006, 18:17) *
if LP analysis does not work well for audio how does lossless audio coding works ??


You got me wrong.

Sebi
This is a "lo-fi" version of our main content. To view the full version with more information, formatting and images, please click here.
Invision Power Board © 2001-2009 Invision Power Services, Inc.