QUOTE
I'm not an expert in this field, so I can't say that's wrong, but it seems off the top of my head that there should be at least some cases in which a forward predictor (even a perfect one) is less than optimal. Other sources of redundancy could be found by other things, such as taking future samples into account (I think?).
It's kind of a chicken-egg problem... if your goal is to predict future data, you cannot "cheat" and exploit the future data for that purpose

In terms of compression ratio,
forward predictor + arithmetic coding has no limits. If your predictor is very good, there are certain kinds of data (sequences of zeroes, etc) where you can literally compress a kilobyte in less than a bit ! I have already done it.
In practice however, the predictor has to take a risk. You guess the next bit. If you take no risk, you gain (or lose) nothing. If you bet and guess right, you gain fractions of bits. Otherwise you lose space. The sum of all this determines the overall compression ratio..
The other way is to take the whole data as a chunk, but then it's not called "prediction"... and when you do it, you produce one "chunk" which must be decompressed in a whole.