I understand that most psychoacoustic based transform coders have difficulties in
coding signals like speech or music clips with strong vocals.. From my listening tests.. I find that these clips seemed to lose some of its "original" quality..

One possible explanation has something to do with the mis-match between the masking threshold calculated in long block for a signal that changes rapidly in time.. and switching to short blocks isn't a good solution as it involves too much block switching.. In AAC there is the TNS tool which flattens the temporal envelope and provides a better matching between the masking thresh and the quantization noise..

Still it is NOT good enough.. The vocals sounded a little flat.. sometimes like someone singing with a "nose block"!! Pitch related problem ??

I wondered if the LTP tools will provide an even better modelling of these kind of signals..