tangent:
For audio coding, I suspect transform coding is here to stay. Improvements are always at the horizon but the same applies to subband coding I guess.
fragtal:
I read it in the Vorbis specs. It says that Vorbis I uses MDCT and Vorbis II will use wavelets.
AutumnRain:
I'm not much of an audio coding person, since I come from an image coding background. So I'm only talking about it from a general signal processing perspective. Fourier-based transform coding involves representing your signal using sum of continuous waves like sines and cosines. They stretch forever in time but have a certain definite frequency (sort of like the tone or pitch). Hence transform coders are always good at accurately representing frequencies in your music as there is quite compact support in the frequency domain. But sine and cosine waves stretch forever in time and so they suck a bit when representing transients in music like sharp attacks. I mean, if you got some silence and then intermittently some sharp attacks which occur at specific time periods, then transform coders require (theoretically) a tremendously large amount of information in order to reconstruct that sharp attack at that specific time accurately. With coarse quantisation errors that will occur in high frequencies, these translate to the distortion that occurs throughout the entire frame and most of you would know this as 'pre-echo'. Thus in order to reduce the 'inability' of a transform coder to compress this, we reduce the frame size in order to make the signal more local in time which corresponds to increase in bitrate, so to speak.
With wavelet coders (I'll refrain from speaking from the subband coding perspective), the idea is to represent your signal (audio) as a sum of weighted basis signals which are compact in time (not extending forever like sine wave) as well as frequency. They are sort of an 'in-between' method I guess. The low frequency part of the audio (the trends) are decomposed into weighted scaling functions while the high frequency residue parts (the transients) are represented by weighted wavelet functions. Now when one looks at a wavelet, they will notice just how compact it is in time and thus they were very well suited to representing sharp transients accurately. As Martin Vetterli said in a recent lecture of his, wavelets function as 'singularity detectors'.

Hence there will be less information that is needed to represent that sharp peak and less information (wavelet coefficients) means less bitrate to get the same fidelity. This is the single biggest advantage of subband/wavelet coders over transform coders, which is why I came to the view that Vorbis will never be as good as MPC. Even though you can add in intelligent transient detection in Vorbis to reduce the frame size and code transients better, what you gain in fidelity you will lose in bitrates which is why Vorbis files blow out on tracks which have lots of sharp transients. So even if Vorbis matches MPC, which is very likely, it will surely lose out in terms of compact file size. It is an intrinsic property of transform coding sadly.
Hopefully I explained that as simply as possible and as accurately as I can. Please feel free to correct mistakes in explanation or concepts if you notice them.