Help - Search - Members - Calendar
Full Version: Algorithm used in vocal removers?
Hydrogenaudio Forums > Hydrogenaudio Forum > General Audio
Joe Bloggs
I know the programs or plugins you can use for vocal removal, what I want to know is how they work their magic.

Actually what I have in mind is the inverse of vocal removal, that is, taking away everything except the voice in the middle. I may be undertaking an undergraduate final year project that aims to recreate in machines the abilities of human auditory attention--being able to pick up and follow a discussion even a background full of noise and other conversations that you've determined you don't want to hear.

What I have in mind is to phase-shift and amplify the L/R audio channels for each speaker so that each speaker is effectively speaking from centre stage on its own channel, then apply a 'vocal retainment' algorithm to remove everything except the voice in the 'centre'.
pop3smtp23
I would recommend you to use FFT and then do a bandpass for the vocal band. You could record backgroundnoise before, and use this information for subtracting it from the signal, too, to make your bandpassing better.
In a few words: Get Background -> FFT -> Original.sub(FFT) -> Vocal.
If you are able to record the noise really good, you could use an IIR filter algorithm, too. (Theory: Take the original signal=vocal, apply noise to it=music. Now, do exactly the same thing inverted, with trying to interpolate the noise signal by a taylorinterpolation).
But for fast and easy effects I just would do a bandpass for the vocal band, unfortunately you will get the vocal distorted this way, for better and "exact" results, you should use the IIR approach.
madah
"Voice Removal" in cool edit or winamp dsp will remove all sound which is the same for both channels. Doing a 180 degrees phase shift on only one channel will accomplish that.

I'm also interested in "Keep Voice Only" because I've tried many times to do it with cool edit with no success...
Pio2001
Removing voice inverts one channel and mixes it with the other.
To keep voice only, you should substract the voiceless version (mono) from the original version converted to mono. Doesn't it work ?
Joe Bloggs
So simple??? This might be easier than I thought...

What about what pop3smtp23 said? I suppose that's the kind of stuff you have to do if you only have one channel to work with?
Joe Bloggs
QUOTE
To keep voice only, you should substract the voiceless version (mono) from the original version converted to mono. Doesn't it work?


How the heck do you downmix the voiceless version into mono??

The left channel is L-R
The right channel is R-L
mono would be L-R+R-L = nothing
experimentally proven sad.gif

I guess it's not so simple after all...
daniel
in editing prog
m/s matrix(ch1=(r+l), ch2=(r-l))
remove side channel
m/s matrix(ch1=(r+l)+0, ch2=(r+l)-0)
you get only center stage sounds.
is this correct??
Pio2001
QUOTE
Originally posted by Joe Bloggs


How the heck do you downmix the voiceless version into mono??

The left channel is L-R
The right channel is R-L
mono would be L-R+R-L = nothing
experimentally proven sad.gif

I guess it's not so simple after all...


Well, then it's just inverted phase mono, just invert one channel to get pure mono.
In practice, just keep one of the two channels to work with, they are the same. Only the signum is inverted.
Joe Bloggs
Ok, suppose you just take one channel
L-R or R-L

The original downmixed to mono would be L+R
Result
L+R-(L-R) = 2R or L+R-(R-L) = 2L

Don't see how that would do anything...
Pio2001
No, because the mono is L/2+R/2. But it doesn't work : I realized that in the version without voice, R and L are mixed with an opposite signum. If one is canceled, the other can't be canceled.

Well, just use a Dolby Sorround decoder and keep the central channel, then ! wink.gif
This is a "lo-fi" version of our main content. To view the full version with more information, formatting and images, please click here.
Invision Power Board © 2001-2009 Invision Power Services, Inc.