I've cobbled together a time-domain algorithm so far, which is fairly simple. Assume I'm dealing with a 96khz sample rate here. If you're in 44.1 land you'll need to change a few numbers around.
- Highpass the signal at ~10khz. I use a FIR filter for this although I suppose any decently designed filter would work. Passband tolerances can be very wide, as long as the stopband rejection is extremely high.
- Divide the signal into 10-sample chunks. (ie, whatever highpass cutoff you set in step 1, block up the data so that the block length is not smaller than the period of the cutoff frequency.)
- Calculate the RMS average of each block's samples, yielding an RMS amplitude plot at ~0.1ms resolution.
- Run a highpass FIR filter over the RMS signal. The goal here is to sum up a potential transient's signal, and the signal around it if it straddles more than one sample, and subtract the average neighboring signal amplitude around the transient to obtain an estimate of the real transient amplitude.
- Threshold the filtered RMS signal to select amplitudes greater than 0.
- Sum the filtered RMS signal samples together, and divide by the signal duration in seconds, to obtain a figure of merit representing the total transient energy across the waveform
At 10khz, most of the energy is either electricial noise, or pops/ticks, or mistracking in the case of vinyl (which can generate astonishingly high harmonics!). Any lower of a cutoff frequency and it seems that too much musical material creeps into the energy measurements, which raises the average background energy level and so reduces the sensitivity.
Another benefit of such a high cutoff is that the resulting block length is small enough to detect transients that are only a short space apart (perhaps 1ms). Depending on the length of the FIR highpass in #4, legitimate pops and ticks could be filtered out if they occur often enough.
Has anybody else done anything similar to this, and can document their algorithm? Any suggestions on testing this properly?
