First things first:
Spectograms and numerical difference between waves
do not represent good and/or reliable data for lossy quality evaluation.
Next:
The usual way to evaluate lossy codecs is by use of human-conducted tests (
ABX and
ABC-HR are two of those methods). This involves getting different samples and different people to test/rate them.
At last:
There are some programs that try to evaluate the codecs in a similar fashion that human-conducted tests would run.
EAQUAL is one such programs.
These programs are limited by the fact that they use the same or very similar methods to check the audio data than the lossy codecs use themselves, so In the end, it is a cat-&-mouse game between who gets the best results.