It was simple, really (why things don't work, that is)
To remove vocals, you have L-R in the left channel and R-L in the right channel (where -L and -R are used to denote inverted waveforms of the same channel)
If you subtract that from the original recording, you get
left channel L-(L-R) = R
right channel R-(Rl) = L
which is just the original recording with inverted channels
If you try downmixing the vocal-removed track to mono, you get
L-R+R-L = nothing!
It seems that pop3smtp23 saw through the problem beforehand and started talking about frequency domain analysis, just as the the professor did...
I guess I'm just being impatient--I would meet the professor tomorrow and I guess he would tell me all about how he intends to go about doing it--but I would like to hear some ideas from people here, if it's not too off-topic...
Of course, if you are trying to pick one voice out of many voices, you can't use the 'record the noise' trick...