I know that the general rule of thumb for judging quality of various encoders and/or various encoder settings is to do your own listening tests until you cannot pick out the encoded files from the originals. While this seems to make sense I never really felt totally comfortable with this method.
The reason I feel this way is probably due to the way I am used to judging quality of video encodings. Basically, with video encoding, I usually start with a certain bitrate (i.e. 1200) and then I keep increasing it until I stop seeing any substantial increase in quality. But then, being the anal type as I am, I usually pause the movie on suspect frames and/or zoom in to look closer for artifacts to see if the last bitrate increase resulted in a high enough quality gain to justify the larger file size, even though technically I get to a point where I have to pause or zoom in to really notice the difference.
However, when it comes to trying to judge differences in audio quality between various bitrates/settings, I feel completely overwhelmed because there are just so many sounds and frequencies all going off at the same time in most music and no easy way to pause a frame of the song and zoom in on it to compare it. Even if there were missing notes or freqencies or artifacts or whatever in parts of the song, it seems like it would be very difficult to pick up on due to so many other frequencies bombarding you at the same time.
In attempt to make it easier for me to pick up on differences, I've been using the Filter Toolbox in Nero Wave Editor and setting the band pass filter to subranges of the frequency spectrum and comparing each subrange, one at a time (i.e. 0-250, 250-500, 15000-22050, etc.). This has made a HUGE difference for me in my ablilty to pick up on quality differences (even between alt-preset standard and alt-preset extreme and even between different versions of LAME). For the most part, I've found that the quality differences are most obvious as you get to the higher frequencies.
Before I get too caried away with this way of testing, is this a valid way to test? It seems to me, that it should be as valid as a "normal" listening test (i.e. listing to the whole frequency range of the song all at once) since you are eventually covering the whole frequency range also, but just little by little rather than all at once. It certainly makes it much easier to perform the tests and makes me feel much more confident in the results.