[Martel, I have not tried to find out exactly how SSRC performs.]
Based on the contents of this thread up till now, I'd be inclined to prefer a 48KHz sampling rate over 44.1Khz as it gives more margin for error, with only a relatively slight (less than 10%) increase in raw file size.
[A range of 48KHz sound cards could be used for the playback and the precise characteristics of the filter would not be all that critical. Similarly the recording could be made with a range of recording devices, without undue concern about the filter characteristics.]
But there is another concern that is sometimes raised, beyond mere frequency response. It is a concern about relative timing and phase.
Is it good enough to shoehorn everything into a strict timing regimen of say 48000 samples a second, if some waveforms are slightly out of phase with each other, as captured by different microphones?
Arguably if 96KHz is used, any natural or artificial reverberation can be richer as the instantaneous wave cancellations are subtly recorded and reproduced without the constraint of a time structure (e.g. the volume level of different recorded tracks could be changed when creating a new mix and this could generate a whole new set of complex phase additions and cancellations, arguably more complex than if 48KHz had been used when recording).
Put another way, if an analogue source is captured simultaneously at 48KHz by two soundcards that are not locked in phase with each other, one card may be triggered by its sampling oscillator to take its sample* as much as 1/96000th sec after the other. In such a case, will the played back sound be perceptibly different in an A B comparison? This could be similar to comparing the sound from two microphones placed a distance apart equal to the distance sound travels in 1/96000th second. At 25 degrees Celsius, sound travels at about 346m/s. In 1/96000 sec, it would travel about 3.6mm, or a bit over a third of a centimetre.
A similar small difference due to sampling phase could also apply if downsampling a 192KHz recording to 48Khz. There will be 4 samples at 192KHz for every 1 at 48KHz. What if a 192Khz recording has 2 samples shaved off the start of it? If it is then converted to 48KHz it will give a slightly different result compared with a version that has not been shaved being converted to 48KHz. Substraction of the two conversions will leave a small residue. But will the two conversions sound different to the ear in an A-B comparison?
Even if they do sound different, is this not comparable with the difference we experience if we move our head back by a third of a centimetre [not when listening to headphones]. A practically negligible difference?
Are there any situations where it could make a material difference to the listening experience if the sound is captured at 48KHz and not, say, 96KHz?
_______________________
* Even with oversampling, there is subsequent decimation/averaging. After all of the processing, there exists but one sample value per channel, for each arbitrarily selected period of 1/48000 sec.
