The lossy encoding process can introduce clipping:
Let 's say the maximum value is 1. If we have a sample that is 0.95, it can be encoded into 1.05 after requantization step.
We know that this introduces clipping once decoded.
But from the encoder point of view, this quantization was harmless. The problem is that we have clipping in the time domain once decoded, while the encoder intended it in the the frequency domain.
Image if the decoder was clipping in the frequency domain. On decoding, the 1.05 value sample would be truncated to 1 (the maximum value). This is still correct, according to the required precision computed by the encoder.
Once transformed into the time domain, this would be of maximum value, but not more, and so there would be no time domain clipping.
conclusion: I think that clipping is harmless in the frequency domain, so on decoding it should be handled at this step.