I think Speex and other speech codecs like GSM will never sound perfect because they attempt to achieve maximum compression by modelling the human vocal tract to achieve intelligibility at very low bitrates, and usually with very low latency (encoding delay) for telecommunications, but not to make the encoding audibly transparent.
For a long listen this can be quite seriously annoying. You may be after something more intelligible and hopefully compatible with Ogg hardware players (though they'd need to support low bitrates)
Take a look at
this message on the Vorbis mailing list to find some suitable settings for Ogg Vorbis, which was better than practically anything else tested, including speex.
To summarise:
oggenc --downmix --resample 8000 -q -1.00
was very good - about 10-11 kbps. Some artifacts on applause, but very intelligible. (Follow the
link to the website for an example)
oggenc --downmix --resample 11000 -q -0.60
retained the full bandwidth of the recording mentioned, eliminating artifacts and hit about 20 kbps.
In trying these settings, I'd say they're both very good for this sort of material. If you want to retain higher frequencies you could probably use around -q -0.60 with --resample 22000 or something for something around 30 kbps.
Dick Darlington