you
can not compare image processing with audio processing

.
in a typical "sound reproduction system" there are so many many components that alter phase and introduce frequency-dependant delays, that it really does not matter if some IIR filter adds some more.
of course it always depends on where you want to use that filter, how much attenuation per octave you want (or how small a "Q" when it comes to EQs) and some other parameters.
e.g. an IIR brickwall-filter for CD-audio should be absolutely no problem.
it can sometimes make sense to use a linear-phase (zero-phase) FIR filter, but most of the times the pre-ringing produced by such a filter is more disturbing that the frequency-dependant delya of a linear-phase filter.
most DSP studio-monitors that feature FIR filters to compensate for the IR of the drivers and use them as crossovers have a recommended minimum-phase setting, because the spot where the different pre-ringing artifacts played be the different drivers cancel each other out would be very small in linear-phase mode.
and many digital equalizers use IIR filters also because they need far less computing power.
imho the downside with IIR filters is that they need high floating-point precision to calculate precisely - but then again, you can always use a minimum-phase FIR filter to get (nearly) the same result.
so - i know the problems with pre-ringing - it's simply audible when it's too long, but i do not really know any problems with (moderate) phase-shifts or phase-dependant delays...
so maybe if you have some information i have not, you let me know

bye,
--hustbaer