Basically, the codec doesn't estimate how much distorsion can be tolerated on each audio sample. Instead, it estimates how much that distorsion will be remembered by the listener, after a short amount of time.
One strange property of this, is that small sections (a few tenths of a second) are typically easily ABXable from the original !
A longer section, starting from ~ 4 seconds, cannot be ABXed anymore. In other words, the precise impression caused by the section of sound, is very accurately reproduced. This is (psycho^2)acoustics rather than psychoacoustics
In practice, this approach allows for very deep cuts of precision, and very rough quantization even in the mid-range of music - all this, without any perceptible loss !
The codec is completely hybrid: 256 subbands using DWT (discrete wavelet decomposition) for near-static signals, and pure time-domain processing (not even subbanding !) for near-perfect transient handling.
The bit-reduction is huge, even during transients - thanks to the vector rotated-wavelet coding.
As of today, I reach transparent (or very near from that - I only made tests on ~ 75 samples) quality at around 43kbps average (for 48kHz, 16-bit, stereo). A 5.1 channel encoding brings that to ~58kbps with perceptual channel coupling enabled.
The astonishing thing, is that entropy coding is disabled for now. With this enabled, we can expect stereo near-cd quality, perceptually approaching a 85dB SNR at GSM bitrates.
For the moment, no lowpasses were used. I don't think this would bring much of an advantage though.
Edit: after the 1st moment of euphory, a bit more clarity was added