The switches I used were: -O3 -unroll -ip -xW -march=pentium4 -static-libcxa
Just did a quick benchmark comparing the gcc compiled version with my ICC version:
GCC 3.2 oggenc: 26 seconds
ICC 7.1 oggenc: 20 seconds
So that shaved off 6 seconds.
EDIT: It supports flac's now and replaced the 'buggy' GT3b2 with the original 1.0.1