Wouldnt the best test be to feed the encoder with random data, bit depths / channels and repeat in an automated fashion?
Do you mean trying to encode white noise? Sure, the compression ratio would be very poor, but at least it could be a somewhat useful test, I think.
I think a number of test samples consisting of random data would be useful. Tests with noise biased to both high and low frequencies would be interesting, especially high frequency noise as (I guess) this would defeat the prediction fairly efectively. Something like this (in MATLAB or Octave):
x=rand(1,1000)*2-1;
b=fir2(32, [0 0.5 0.7 1], [0.2 0.2 0.9 1]); %Design a FIR filter which rejects low frequencies somewhat
y=filtfilt(b, 1, x);
A full set of tests on random data would not prove that FLAC is correct, but they would be useful evidence. Tests could include white, pink and blue noise as well as noise with an unusual distribution - like a Rayleigh distribution.