Listening to my mp3gained tracks on my Karma, I noticed one track was almost painful to my ears and made me decrease the volume by a few steps, because of some loud, mid-frequency air instrument. (I can try to find what track it was precisely, and what instrument)

That made me think about the replaygain algorithm, and now I wonder..
A.t.m, replaygain computes local, A-weighted, RMS averages of samples, as the first step of computation.
I think it would make sense (and lead to more useful normalizaion), if this first stage was replaced with the following :
. compute the local sound spectrum, with some window.
. multiply this spectrum with an A-weight curve
. convolve the obtained spectrum with a local window function covering a frequency band, like 1/6th of an octave, or something. (in short, compute averages of the spectrum over a frequency width)
. finally take the max of the obtained averages
And then take all those local values and process them the same way as it is now (i.e. take 95th percentile or stuff).

I think it physically makes sense to normalize based on the frequency band with maximum power taken on its own, rather than summed with all other frequencies, as this is roughly the componant of sound that will be causing one ear receiver thingy to be more excited than all the others thingies, and lead to perception of sound being too loud.

Has this been considered ? Did someone test and compare the perceived volume of a tone vs a white noise ?

Maybe my idea wouldnt give a good account of volume, but still be more precise to know when it starts being "too loud", and replaygain could make use of both total RMS power and that kind of frequency-concentrated power measure..

Any thoughts ?