Help - Search - Members - Calendar
Full Version: Resampling down to 44.1KHz
Hydrogenaudio Forums > Hydrogenaudio Forum > Uploads
Pages: 1, 2, 3
MLXXX
[Martel, I have not tried to find out exactly how SSRC performs.]

Based on the contents of this thread up till now, I'd be inclined to prefer a 48KHz sampling rate over 44.1Khz as it gives more margin for error, with only a relatively slight (less than 10%) increase in raw file size.

[A range of 48KHz sound cards could be used for the playback and the precise characteristics of the filter would not be all that critical. Similarly the recording could be made with a range of recording devices, without undue concern about the filter characteristics.]

But there is another concern that is sometimes raised, beyond mere frequency response. It is a concern about relative timing and phase.

Is it good enough to shoehorn everything into a strict timing regimen of say 48000 samples a second, if some waveforms are slightly out of phase with each other, as captured by different microphones?

Arguably if 96KHz is used, any natural or artificial reverberation can be richer as the instantaneous wave cancellations are subtly recorded and reproduced without the constraint of a time structure (e.g. the volume level of different recorded tracks could be changed when creating a new mix and this could generate a whole new set of complex phase additions and cancellations, arguably more complex than if 48KHz had been used when recording).

Put another way, if an analogue source is captured simultaneously at 48KHz by two soundcards that are not locked in phase with each other, one card may be triggered by its sampling oscillator to take its sample* as much as 1/96000th sec after the other. In such a case, will the played back sound be perceptibly different in an A B comparison? This could be similar to comparing the sound from two microphones placed a distance apart equal to the distance sound travels in 1/96000th second. At 25 degrees Celsius, sound travels at about 346m/s. In 1/96000 sec, it would travel about 3.6mm, or a bit over a third of a centimetre.

A similar small difference due to sampling phase could also apply if downsampling a 192KHz recording to 48Khz. There will be 4 samples at 192KHz for every 1 at 48KHz. What if a 192Khz recording has 2 samples shaved off the start of it? If it is then converted to 48KHz it will give a slightly different result compared with a version that has not been shaved being converted to 48KHz. Substraction of the two conversions will leave a small residue. But will the two conversions sound different to the ear in an A-B comparison?

Even if they do sound different, is this not comparable with the difference we experience if we move our head back by a third of a centimetre [not when listening to headphones]. A practically negligible difference?

Are there any situations where it could make a material difference to the listening experience if the sound is captured at 48KHz and not, say, 96KHz?

_______________________

* Even with oversampling, there is subsequent decimation/averaging. After all of the processing, there exists but one sample value per channel, for each arbitrarily selected period of 1/48000 sec.
2Bdecided
To me, it sounds like you still haven't read the relevant threads in the FAQ. The subject of timing issues, or rather the lack of them, is quite well covered.


Also, IIRC there were cheap converters that did left then right, but I think we're talking decades ago, and they were quite rare. AFAIK no one is using them now in anything like a high quality application.

You can check for this fault quite trivially by recording or playing back the same thing on both channels. Impulses are an ideal test signal.


Even a sub-sample interchannel delay would cause audible high frequency loss if the output was combined to mono, which is one good reason why they are avoided. The other is that there is little reason to introduce them!

Cheers,
David.
pdq
QUOTE(MLXXX @ May 14 2008, 09:34) *

Put another way, if an analogue source is captured simultaneously at 48KHz by two soundcards that are not locked in phase with each other, one card may be triggered by its sampling oscillator to take its sample* as much as 1/96000th sec after the other. In such a case, will the played back sound be perceptibly different in an A B comparison? This could be similar to comparing the sound from two microphones placed a distance apart equal to the distance sound travels in 1/96000th second. At 25 degrees Celsius, sound travels at about 346m/s. In 1/96000 sec, it would travel about 3.6mm, or a bit over a third of a centimetre.

If the two soundcards have clock frequencies that differ by only 0.001% (10 ppm) then the phase shift between them will reach 1/96000 second after only one second and will increase by this amount every second.
MLXXX
2Bdecided, I have looked through the FAQs but there is not a lot that seems conclusive. Several times in my searches I came across this interesting report by yourself from 5 years ago, which seems quite relevant to the current topic:-


QUOTE(2Bdecided @ May 21 2003, 02:35) *
... The next day, while the demo was being run for the Nth time, I was at the back of the room talking with someone. Suddenly, I heard a difference as the source switched. I was surprised, having failed to hear a difference the previous day listening in the sweet spot. I listened as it switched again, and heard it switch back – ah ha, it must have just gone analogue / digital / analogue. I kept listening – I couldn’t hear the difference next time it switched.

I went to the middle/back of the room, and listened through the next demo. Without being told, I could pick out 44.1 and 48kHz. The difference was more obvious back from the sweet spot than in the sweet spot itself. More importantly, the difference wasn’t what I (or the other people who failed to hear it) had been listening for. It didn’t make any difference to the frequency response at all, or to the clarity of the high frequencies.

What 44.1kHz and 48kHz did do was to make the sound slightly less realistic, like the difference between a good and bad CD player. If the lower sampling rate had any defined “quality” it was a glassy kind of sound – I’d heard that word associated with CD before and thought it was complete rubbish – but now I actually heard the difference, I understood exactly what people had meant.

The change from 44.1 or 48kHz to analogue to 96kHz slightly increased the depth of the sound stage. I’d been listening to the amazing demo 1 for 2 days, so it was hardly an impressive difference, but it was still there.

If you’re counting, that’s only two blind detections – once when I wasn’t even listening, and again when I went back to the middle of the room to check – I confirmed which had been which with Kevin afterwards – “The next to last one was 40something, wasn’t it?”


You can say many things about this. You could say it was just luck, but I don’t think it was – I wasn’t even listening for the difference because (having listened the previous day) I didn’t think there was one to hear! You can say that I was hearing sonic deficiencies in the equipment. Well, maybe. That may be what the whole 44.1/96k debate is based on. All I can say is that, if there are sonic deficiencies in this equipment (I think the dCS boxes are around 5k each, and are used in many recording studios) then there isn’t much hope for the rest of us!

What you could say, with some justification, is that the “character” of 44.1 was more obvious outside the sweet spot, so maybe it’s not such a big issue. That’s probably true – except that maybe I was just listening for the wrong thing when I was “in the sweet spot”. Maybe I had to stop listening to the Hi-Fi, and start listening to the music and the performance to hear what was happening.

What is significant is that the 44.1kHz version wasn’t just different from the 96k and analogue version, it was [I[worse[/I]. As the analogue was the master, any difference would be bad news, but for it to be subjectively worse makes matters even, well, worse!

I was upset to think how much recorded music only exists as a 44.1kHz or 48kHz sampled digital master tape. I discussed the subjective imperfections (the improved depth and realism of the 96k version) with Kevin, and he agreed. He was surprised that I’d noticed it that day, but couldn’t even hear anything wrong with 32kHz the previous day! I asked him what he heard with 16-bit (we’d been using 24-bit all along) and DSD. He said 16-bit was even worse – it made the whole sound “grungy”, and that DSD sounded nice, but added it’s own signature. “You can tell when you’re playing DSD through this system – the rooms heats up wink.gif” he said – I looked at the huge amps, and could believe it.

One thing I should note: I didn’t think the analogue master was particularly good quality. It was a gorgeous recording, but it had obvious flaws – e.g. background noise, and some audible edits. Also, I didn’t hear any difference between analogue, 96k and 192k. I can’t explain why 44.1kHz and 48kHz sounded worse, but they did. No one responsible for the demo had any reason to rig the results, and I played with enough of the equipment to know that everything was above board and fair, even though some of the cables we used might not have met with audiophile approval. ...


QUOTE(pdq @ May 15 2008, 00:36) *

If the two soundcards have clock frequencies that differ by only 0.001% (10 ppm) then the phase shift between them will reach 1/96000 second after only one second and will increase by this amount every second.

I think this goes towards explaining why it is good practice to have a master synchronising signal, if more than one sound card is used for a recording.
cabbagerat
2Bdecided is right - you need to do some background reading. I will try to answer your questions as best I can.

QUOTE(MLXXX @ May 14 2008, 05:34) *

But there is another concern that is sometimes raised, beyond mere frequency response. It is a concern about relative timing and phase.
If the waves are slightly out of phase with eachother, they will be captured slightly out of phase. The ability to distinguish two phases in a sampled waveform is not directly limited by the sample rate - the SNR comes into play, too. This has been covered before in a number of threads. A recent thread on time resolution in PCM has all the answers.
QUOTE(MLXXX @ May 14 2008, 05:34) *

Arguably if 96KHz is used, any natural or artificial reverberation can be richer as the instantaneous wave cancellations are subtly recorded and reproduced without the constraint of a time structure (e.g. the volume level of different recorded tracks could be changed when creating a new mix and this could generate a whole new set of complex phase additions and cancellations, arguably more complex than if 48KHz had been used when recording).
You could argue that, but you would be wrong. No matter how "complex", "rich" or "nuanced" a signal is, it can still be described by it's bandwidth and SNR.
QUOTE(MLXXX @ May 14 2008, 05:34) *

Put another way, if an analogue source is captured simultaneously at 48KHz by two soundcards that are not locked in phase with each other, one card may be triggered by its sampling oscillator to take its sample* as much as 1/96000th sec after the other.
Yeah, and probably will be. Quartz clocks suck at long term stability - so you are going to be sampling at different instants. It's not a bad assumption that, given two arbitrary clocks at 96kHz the difference between them will be distributed evenly across 1/96000th of a second.
QUOTE(MLXXX @ May 14 2008, 05:34) *

In such a case, will the played back sound be perceptibly different in an A B comparison?
No, because the output of reconstruction will be the same in both cases, given a bandlimited signal. Kotelnikov's original paper (one of the first in the field) actually discusses this, and it can be proven without difficulty. There is a small theoretical problem with the turn on condition (the beginning of time), but this can be ignored in audio.
QUOTE(MLXXX @ May 14 2008, 05:34) *

This could be similar to comparing the sound from two microphones placed a distance apart equal to the distance sound travels in 1/96000th second. At 25 degrees Celsius, sound travels at about 346m/s. In 1/96000 sec, it would travel about 3.6mm, or a bit over a third of a centimetre.
Back in the mists of time, some radar signal processing was done with things called "accoustic delay lines" which worked in exactly this way. It worked amazingly well, for the time.

QUOTE(MLXXX @ May 14 2008, 05:34) *

A similar small difference due to sampling phase could also apply if downsampling a 192KHz recording to 48Khz. There will be 4 samples at 192KHz for every 1 at 48KHz. What if a 192Khz recording has 2 samples shaved off the start of it? If it is then converted to 48KHz it will give a slightly different result compared with a version that has not been shaved being converted to 48KHz. Substraction of the two conversions will leave a small residue. But will the two conversions sound different to the ear in an A-B comparison?
Blindly subtracting one digital signal from another isn't a good idea for just this reason. The two downsampled versions will differ by a "group delay", which you can correct digitally, in analogue, or by moving your speakers back a couple of centimeters. After reconstruction, the two signals will be identical. There can be a slight difference at turn on, but after that they'll be the same.

QUOTE(MLXXX @ May 14 2008, 05:34) *

Even if they do sound different, is this not comparable with the difference we experience if we move our head back by a third of a centimetre [not when listening to headphones]. A practically negligible difference?

Yes. Take the function f(x) = cos(x) u(x), where u(x) is zero for negative x and 1 for positive x. Start sampling at time zero, and at time 0+1/96000. When you have those samples, reconstruct the original wave. Notice that they will be different at the beginning. After this turn-on period they will be the same. Due to the antialiasing filter, this example is a little more subtle than that, even - but they will still be different as the whole process has to be causal. Does it matter in the real world? No.

QUOTE(MLXXX @ May 14 2008, 05:34) *

Are there any situations where it could make a material difference to the listening experience if the sound is captured at 48KHz and not, say, 96KHz?
In an ideal world, no. With real hardware, I don't know.
2Bdecided
QUOTE(MLXXX @ May 14 2008, 15:52) *
2Bdecided, I have looked through the FAQs but there is not a lot that seems conclusive. Several times in my searches I came across this interesting report by yourself from 5 years ago, which seems quite relevant to the current topic:-
Oh, I stand by that report (though I agree with the criticisms in the same thread).

It was subsequent reading, research, and experiments that cleared up (for me) most of the issues that you are working through. They are none issues (at least in theory).


There are only two slightly credible explanations: human ears don't quite work in the way we think, or the well known and understood imperfections in real equipment combine together to create audible differences.

There is an even more relevant point: no one has ABXed CD vs anything else, except by using seriously faulty equipment or by turning the volume up so high on near-silent passages that "normal" recordings would deafen you.

Cheers,
David.
MLXXX
Thanks cabbagerat; your specific explanations in response to my post are appreciated.

QUOTE(2Bdecided @ May 15 2008, 01:17) *

There are only two slightly credible explanations: human ears don't quite work in the way we think, or the well known and understood imperfections in real equipment combine together to create audible differences.

There is an even more relevant point: no one has ABXed CD vs anything else, except by using seriously faulty equipment or by turning the volume up so high on near-silent passages that "normal" recordings would deafen you.

It's relatively easy equipment-wise to test 24 bits against a dither to 16 bits because you have exactly the same timing of the samples, and can use the same sound card for playback, operating with the same filter; whether reproducing 24 bits, 16 bits dithered, or a truncation to 16 bits. [I have done this myself with my own equipment at home.]

It's much harder to compare 96KHz as against 44.1KHz, and any differences that were heard could be ascribed to deficiencies in the equipment. I assume that is how you might now primarily explain that report of your own listening experience in 2003, at different sample rates.

But I wonder whether there are any recent tests with highly evolved equipment that have concentrated on the 44.1 vs 48 vs 96+ issue with audio clips designed to highlight differences.

I could imagine that if six violinists played in front of a microphone each and the sound was mixed in analogue the result would be quite complex. Alternatively, if each of the six sources were separately converted to digital at just 44.1KHz and mixed digitally with the other violins each at 44.1KHz, the result seems likely to be different, compared with sampling each at say 96KHz and mixing; even if the final mixdown of the 96KHz sources were at 44.1KHz.

People may ask 'why bother to use a separate ADC for each microphone?': just mix in an analogue mixer. Well as technology advances, ADCs are becoming quite cheap and it may be an attractive proposition to fit out a microphone with its own ADC (and perhaps some sort of wireless data link) and dispense with any analogue mixer.

There may be other recording situations that would be more demanding and have greater potential to be affected by phase differences.

If we really are sure that 96KHz is of no benefit now, are recording engineers using it just in case it may make a difference with loudspeakers of the future; or is the use of 96KHz driven by (i) flawed technical assumptions, and/or (ii) a market demand fostered by advertising hype?
Martel
QUOTE(MLXXX @ May 14 2008, 17:51) *

But I wonder whether there are any recent tests with highly evolved equipment that have concentrated on the 44.1 vs 48 vs 96+ issue with samples designed to highlight differences.

Well, those tests would merely prove/deny the equipment's ability to play back those samplerates. I guess there are some tests of CD players versus SACD on some audiophile pages. Since the differences between the formats are theoretically negligible, the real difference should lie only in playback equipment quality (or different mastering of CD and SACD version, so beware).
QUOTE(MLXXX @ May 14 2008, 17:51) *

I could imagine that if six violinists played in front of a microphone each and the sound was mixed in analogue the result would be quite complex. If each of the six sources were separately converted to digital at just 44.1KHz and mixed digitally with the other violins each at 44.1KHz, the result seems likely to be different, compared with sampling each at say 96KHz and mixing; even if the final mixdown is to 44.1KHz.

I think there's no theoretical reason why it should be different. Theoretically, you should be able to do filtering, analog-to-digital conversion, resampling and mixing in arbitrary order and get the same result. Practically, there is a preferred order of those since equipment is not ideal (linear, unlimited dynamic range etc.) and the effort is to minimize the overall distortion. Just look at the resampling results of those software resamplers. It is all about lowpass filtering and most of the resamplers fail at that utterly. It is problematic to properly design an analogue antialiasing filter for a 44kHz ADC, so a 96kHz one is a much better choice.
QUOTE(MLXXX @ May 14 2008, 17:51) *

Of course we could ask 'why bother to use a separate ADC for each microphone?': just mix in an analogue mixer.

Because it is not practical to have million ADCs in a studio and have to mix million different tracks in software.
QUOTE(MLXXX @ May 14 2008, 17:51) *

If we really are sure that 96KHz is of no benefit now, are recording engineers using it just in case it may make a difference with loudspeakers of the future; or is the use of 96KHz driven by (i) flawed technical assumptions, and/or (ii) a market demand fostered by advertising hype?

96kHz ADCs are less likely to be plagued by analog antialiasing filter, which they need to include. You may (relatively) easily design something like the SSRC's lowpass in software but it is virtually impossible using analogue circuit.
Kees de Visser
QUOTE(MLXXX @ May 15 2008, 02:51) *
If we really are sure that 96KHz is of no benefit now, are recording engineers using it just in case it may make a difference with loudspeakers of the future; or is the use of 96KHz driven by (i) flawed technical assumptions, and/or (ii) a market demand fostered by advertising hype?
On a recording budget the difference between using 44.1 and 96 kHz (or higher) is really benign these days. Since there seems no evidence that using 44.1 gives better results there is very little reason not to use 96 kHz or higher as a production format.
There seems anecdotal evidence that some plug-ins perform (sound) better at 96 kHz rate. A possible explanation is that the code has been optimized for that rate and not for 44.1. This "shouldn't" be a reason to record at 96, but it's probably the most practical workflow.
2Bdecided
QUOTE(MLXXX @ May 15 2008, 02:51) *
I could imagine that if six violinists played in front of a microphone each and the sound was mixed in analogue the result would be quite complex. Alternatively, if each of the six sources were separately converted to digital at just 44.1KHz and mixed digitally with the other violins each at 44.1KHz, the result seems likely to be different, compared with sampling each at say 96KHz and mixing; even if the final mixdown of the 96KHz sources were at 44.1KHz.
Let's think this through. Firstly, nothing samples at 44.1kHz in the 21st century - ADCs are always oversampled. So what you have is at least 352.8kHz resampled to 44.1kHz, vs at least 384kHz resampled to 96kHz resampled to 44.1kHz. The mixing is not the only (or even the main) difference here. It's bad experimental practice to introduce multiple variables: You should compare sample rates and associated resampling, or mixing - not both at once.


Here is a comparison which at least has analogue vs digital mixing (and the inevitable circuit differences) as the only variable:

Situation 1 = 6 ADCs, 96kHz, resample to 44.1kHz, mix signals
Situation 2 = mix signals, 1 ADC, 96kHz, resample to 44.1kHz

The problem with this experiment in practice is that the digital gains could be matched perfectly, whereas the analogue gains could not. Still, let us forget that for a moment. Let us assume we can do a perfect summation in both digital and analogue, use unity gain for each, and not clip. Let us make the equations easier by simply having two violin players yielding two microphone feeds, x and y. Let us denote the function of the ADC and the resampling by f. Let us denote simple summation by +.

Situation 1: 2 ADCs, digital mixing
final output = f(x) + f(y)

Situation 2: analogue mixing, 1 ADC
final output = f(x+y)


Then question becomes simple, because the very definition of a linear system (in this case, system f) is that these two situations yield an identical result for any value of x and y. In reality, we would put limits on x and y and say that the system was linear within these limits (no use considering levels that would blow up the equipment!).

So, if x and y are sensible voltages from real microphones, is f a linear system? Let's pull it apart and check each part in term, since a concatenation of linear systems is by definition also linear.

ADC:
0. the buffer amplifier might(!) be linear
1. low pass filtering is linear
2. straight quantisation is not linear - so we won't use that!
2a. dithered quantisation is still not linear, but breaks down into a linear-on-average system, and a noise source
3. Nyquist sampling is linear, but that assumes a perfect filter
3a. non Nyquist sampling creates aliases - however, this is linear distortion, so is still linear
Resampling: conceptualised as a resample up to a common multiple, filtering, and decimation to the desired rate
4. adding zero samples to pad the sample rate to the desired one is linear
5. low pass filtering is linear
6. throwing away samples is linear

The only part which may be mathematically non linear is the dithered quantisation, and that can be arbitrarily good based on the bit depth - which you already seem unconcerned by.


To summarise, the systems involved are linear, and it doesn't make any difference whether you have 6 ADCs and a digital mixer, or an analogue mixer followed by 1 ADC. All these superfine details that you are imagining are perfectly captured (to within the parameters of the system, namely bandwidth and noise floor) - whichever way around you do it.

Non linearities (e.g. that first buffer amplifier) would break this - but they'd also introduce signals that weren't supposed to be there anyway! Depending on where in the chain you introduced non-linearities, either version could be closer to the "correct" version.

Cheers,
David.
Kees de Visser
QUOTE(Martel @ May 15 2008, 09:46) *
96kHz ADCs are less likely to be plagued by analog antialiasing filter, which they need to include. You may (relatively) easily design something like the SSRC's lowpass in software but it is virtually impossible using analogue circuit.
That's why almost all modern ADCs use oversampling and digital filtering. I think it's the need for low latency that restricts the complexity of digital filtering in recording equipment.
Martel
QUOTE(Kees de Visser @ May 15 2008, 02:37) *

QUOTE(Martel @ May 15 2008, 09:46) *
96kHz ADCs are less likely to be plagued by analog antialiasing filter, which they need to include. You may (relatively) easily design something like the SSRC's lowpass in software but it is virtually impossible using analogue circuit.
That's why almost all modern ADCs use oversampling and digital filtering. I think it's the need for low latency that restricts the complexity of digital filtering in recording equipment.

Oh, sorry, I completely forgot that they are mostly based on delta-sigma. I must have been outside the audio territory for far too long. sad.gif
But I guess the claim about (lowpass) filtering quality and its impact still holds, be it digital or analogue. smile.gif
MLXXX
QUOTE(cabbagerat @ May 15 2008, 00:57) *

No, because the output of reconstruction will be the same in both cases, given a bandlimited signal. Kotelnikov's original paper (one of the first in the field) actually discusses this, and it can be proven without difficulty. There is a small theoretical problem with the turn on condition (the beginning of time), but this can be ignored in audio.

I've noticed in several other threads in other forums that when a "What about different phases?" question is raised, it is dealt with by reference to steady waveforms and Nyquist. The argument goes that you can represent waveforms accurately with sampling at twice the maximum frequency of the Fourier series for a particular source. The question is not dealt with of the quality of representing interactions between waveforms from independent sources with continuously varying phase relationships. (I imagine I would not be in a position to understand a detailed mathematical explanation anyway!) Perhaps my query does seek to explore the "turn on condition".

QUOTE(2Bdecided @ May 15 2008, 20:34) *

To summarise, the systems involved are linear, and it doesn't make any difference whether you have 6 ADCs and a digital mixer, or an analogue mixer followed by 1 ADC.


I do not understand the beginning of the explanation as these formulae appear to anticipate the conclusion reached:-

Situation 1: 2 ADCs, digital mixing
final output = f(x) + f(y)

Situation 2: analogue mixing, 1 ADC
final output = f(x+y)


They seem to be declarations that a sampled output resolves to the same thing as an analogue output, for bandlimited input.

A 96KHz extract

I have always found combined strings a good test for audio equipment. I have come across a recording of an orchestra playing The Earth Overture by Kosuke Yamashita.

THe format is 7.1 channel 96KHz 24-bit linear PCM. (The Blu-ray reference disc has been released by Q-TEC.)

The audio quality is very good. I found that when I converted a short extract to 48KHz with Audition 3, the quality was reduced slightly (at least as played back by my AVR). In contrast, many other recordings I have experimented with have revealed no apparent (to me) audible differences when downsampled to 48KHz.

The 48KHz version is not quite as smooth sounding. I find this noticeable in the harmony between the string sections. With the 96KHz version, the sounds blend such that the strings taking the lower part are less noticeable. I'll upload a 9 second extract in this post if possible.

Now I imagine 2Bdecided and many others will assume my playback equipment is responsible for the difference, and that is distinctly possible; but it is also possible that a conversion to 48KHz of this particular recording will impair it.

ABXing was not easy. Loudspeakers revealed the differences (not my headphones). Here are my results:

foo_abx 1.3.1 report
foobar2000 v0.9.5.1
2008/05/18 22:33:06

File A: C:\Users\Public\earthsong_9seconds.wav
File B: C:\Users\Public\earthsong_9secondsAuditionConvertedto48KHz.wav

22:33:06 : Test started.
22:35:11 : 01/01 50.0%
23:01:19 : 02/02 25.0%
23:02:28 : 03/03 12.5%
23:03:18 : 04/04 6.3%
23:03:37 : 05/05 3.1%
23:03:44 : Test finished.

----------
Total: 5/5 (3.1%)
Martel
A 44 kHz digital waveform PERFECTLY describes ANY signal (or mixture of signals), including phase, from 0 to 22049 Hz, if you do not consider distortion caused by finite number of amplitude quantization steps.
Just looking at the waveform, you might get suspicious about accuracy at frequencies near the Nyquist one, since the signal hardly gets 3-4 samples per period. Try zooming in the waveform in Cool Edit up to the sub-sample accuracy. There you will see some interpolated points between actual samples. These are calculated solely by upsampling. No information is lost, you may recalculate the "missing" samples any time. This "upsampling" also happens naturally in DAC upon conversion to continuous-time domain.
cabbagerat
QUOTE(MLXXX @ May 18 2008, 07:29) *

I've noticed in several other threads in other forums that when a "What about different phases?" question is raised, it is dealt with by reference to steady waveforms and Nyquist. The argument goes that you can represent waveforms accurately with sampling at twice the maximum frequency of the Fourier series for a particular source. The question is not dealt with of the quality of representing interactions between waveforms from independent sources with continuously varying phase relationships. (I imagine I would not be in a position to understand a detailed mathematical explanation anyway!) Perhaps my query does seek to explore the "turn on condition".
You need to read some of the background theory, because I am not sure I can explain this clearly in a forum post. Essentially, the idea is that the sum of two bandlimited signals is a bandlimited signal. Therefore, in an ideal (no quantization, no clipping) system, if x would be properly sampled, and y would be properly sampled, then x+y will be properly sampled. With clipping and quantization, this becomes a little more grey, because (as detailed in 2Bdecided's post) we can't really assume the system is linear any more - but it's probably close enough. But the matter remains, there are no bandlimited signals whose "continuously varying phase relationships" cannot be captured by a sampled system - within the limits of the system SNR. It might seem logical that there are, but there really aren't.

As for the turn on condition - this is the question of, if your first discrete sample is sample x[0] of x(0), then what do you assume x[-1] to be during the reconstruction process? There is a mathematically correct way of doing it, and the way it's done in real systems.
QUOTE(MLXXX @ May 18 2008, 07:29) *

QUOTE(2Bdecided @ May 15 2008, 20:34) *

To summarise, the systems involved are linear, and it doesn't make any difference whether you have 6 ADCs and a digital mixer, or an analogue mixer followed by 1 ADC.


I do not understand the beginning of the explanation as these formulae appear to anticipate the conclusion reached:-

Situation 1: 2 ADCs, digital mixing
final output = f(x) + f(y)

Situation 2: analogue mixing, 1 ADC
final output = f(x+y)


They seem to be declarations that a sampled output resolves to the same thing as an analogue output, for bandlimited input.
Yes, as 2Bdecided said in his (excellent) post - the process is for the most part linear. If f(x) is a linear function - then f(x+y) = f(x)+f(y) and f(ax) = af(x) for constant x. The post goes on to develop an argument why the sampling process can reasonably be considered to be linear - hence these relationships hold. Obviously this only holds up to clipping, and above the noise floor - but is a fair enough assumption about *reasonable* signals.

Please read his post again.
MLXXX
Rereading the post leaves me with the same impression. The conclusion of 2Bdecided's (excellent) post appears to flow from the mathematical basis it establishes at the beginning.

I note that in the analogue domain the sources to be mixed are not as severely bandlimited as they end up being when converted to the digital domain (assuming use of microphones that respond to frequencies exceeding 22050Hz, and assuming the use of a nominal digital sampling rate of 44.1KHz).

This difference between the bandwidths of the analogue and digital mixing processes must, I presume, be contemplated in the equations used at the beginning of the presentation, and must be considered to have no ultimate impact.

QUOTE(Martel @ May 19 2008, 04:25) *
Try zooming in the waveform in Cool Edit up to the sub-sample accuracy.
With the particular sample clip, the 96Khz and 48KHz waveforms (at a given elapsed time from the start of the clip) often differ dramatically, presumably as there is so much content above 24KHz in the 96KHz version.

But I can see that if a continuous high frequency sine wave not far below the Nyquist limit were being sampled one could verify performance near the Nyquist limit by inspection of the Cool Edit produced waveform graphs, and this would be an interesting exercise. The waveform would approximate a sine wave, possibly with a bit of phase delay introduced by digital filtering. I guess the phase delay could be observed by generating a waveform at 10.5Khz with a weak 2nd harmonic and observing the [average] displacement of the zero crossing of the 21Khz component relative to the zero crossing of the fundamental, though I've never tried this.


[Will upload my sample clip if possible within the next 24 hours.]
pdq
Let me see if I can provide an analog-domain equivalent to what we are discussing (and somebody correct me if I'm wrong).

Let's say that you start with some waveform, and then you add a 22.05 kHz sine wave to it. Now lowpass the result to 22049 Hz.

You will now have one of two things. Either the original waveform had no content above 22049 Hz, in which case you have back the original waveform, no matter how complex it was; or else the original waveform had content above 22049 Hz, in which case you now have intermodulation products between the original waveform and the 22.05 kHz sine wave.

When you translate this to A/D conversion followed by D/A conversion and bandwidth limiting the result is exactly the same except for clipping and quantization.


Apparently this only applies if you are multiplying by a 22050 Hz sine wave.
2Bdecided
QUOTE(pdq @ May 19 2008, 16:10) *

Let me see if I can provide an analog-domain equivalent to what we are discussing (and somebody correct me if I'm wrong).

Let's say that you start with some waveform, and then you add a 22.05 kHz sine wave to it. Now lowpass the result to 22049 Hz.

You will now have one of two things. Either the original waveform had no content above 22049 Hz, in which case you have back the original waveform, no matter how complex it was; or else the original waveform had content above 22049 Hz, in which case you now have intermodulation products between the original waveform and the 22.05 kHz sine wave.
Why would you have intermodulation products? Is this analogue circuit broken or something?

As long as everything is working, and you choose a sensible filter (let's say 20kHz) you won't know whether you added a 22.05kHz sine wave before filtering, or not. It won't interact within anything, and it'll be gone after you filter.

Cheers,
David.
greynol
Key word here is product.

Simply summing two signals will not result in intermodulation.
pdq
I could be wrong about this, but I thought that when you sum two frequencies the waveform is the same as if you had the sum and the difference of the two frequencies, but when you filter out the sum of the frequencies then you are left with the difference, which is an intermodulation product.
greynol
You have to multiply the two signals or subject them to some other non-linear process during the summation in order to get sum and difference frequencies.
pdq
Sorry, post corrected.
MLXXX
QUOTE(Martel @ May 19 2008, 04:25) *

A 44 kHz digital waveform PERFECTLY describes ANY signal (or mixture of signals), including phase, from 0 to 22049 Hz, if you do not consider distortion caused by finite number of amplitude quantization steps.
Just looking at the waveform, you might get suspicious about accuracy at frequencies near the Nyquist one, since the signal hardly gets 3-4 samples per period. Try zooming in the waveform in Cool Edit up to the sub-sample accuracy. There you will see some interpolated points between actual samples. These are calculated solely by upsampling. No information is lost, you may recalculate the "missing" samples any time. This "upsampling" also happens naturally in DAC upon conversion to continuous-time domain.

I do get suspicious when I look at a digital mixdown of 19KHz and 20Khz sinewaves that were created at 44.1KHz. There are so few sample points and yet as you say cooledit manages to create a realistic graphical interpolation (with this relatively simple waveform).

In contrast, when I look at 19KHz and 20KHz sinewaves created at 96KHz and mixed digitally in cooledit, there are so many more sample points in the mixdown that sophisticated interpolation would not be necessary: you could simply join the dots with a most basic form a of integration (a resistor and capacitor). The undulations in overall amplitude at a rate of 1KHz appear to be relatively smooth, at this higher sampling rate. I could readily imagine this undulating signal surviving, despite the addition of other high frequency signals into the digital mix each needing to be 'interpolated'.
greynol
Reconstruction using a sinc pulse at every sample is perfect (ignoring quantization error and possible distortion at the edges) so long as the original signal is BW limited to half the sample rate. I am pretty sure this is exactly what cool edit and adobe audition are doing with their graphical representation. The software isn't Spice; it doesn't care about resistors and capacitors.

This is all that needs to be said. The number of sample points used is extraneous and therefore irrelevant.
Martel
QUOTE(MLXXX @ May 19 2008, 09:11) *

I do get suspicious when I look at a digital mixdown of 19KHz and 20Khz sinewaves that were created at 44.1KHz. There are so few sample points and yet as you say cooledit manages to create a realistic graphical interpolation (with this relatively simple waveform).
There's really no reason to get suspicious as there is EXACTLY ONE WAY how to fill in the missing samples, there's NO ambiguity. And this is by inserting arbitrary number of null samples between actual samples, then apply a digital lowpass filter which would eliminate any frequencies at and above the original Nyquist frequency. Well, the results may vary depending on the filter design quality but the principle is the same. If you have top quality filters, you are able to almost perfectly reconstruct any signal present in a 44.1kHz digital waveform, when going into the continuous-time domain (analogue signal). And this holds vice-versa as well (going from analogue to digital), as pointed out in my previous post.
QUOTE(MLXXX @ May 19 2008, 09:11) *

In contrast, when I look at 19KHz and 20KHz sinewaves created at 96KHz and mixed digitally in cooledit, there are so many more sample points in the mixdown that sophisticated interpolation would not be necessary: you could simply join the dots with a most basic form a of integration (a resitor and capacitor). The undulations in overall amplitude at a rate of 1KHz appear to be relatively smooth, at this higher sampling rate. I could readily imagine this undulating signal surviving, despite the addition of other high frequency signals into the digital mix each needing to be 'interpolated'.

There is no "sophisticated" interpolation involved. I do not call lowpass filtering a sophisticated method. Well, perhaps the filter design itself might be "sophisticated" but the reconstruction process is not.
The samples that are present in the 96kHz wave and not in the 44.1kHz one are simply redundant and bring no additional information at all since they can be easilly (and almost perfectly, considering the digital filtering limits) recalculated.
2Bdecided
QUOTE(2Bdecided @ May 14 2008, 14:54) *
To me, it sounds like you still haven't read the relevant threads in the FAQ.
...the subject of there being very few sample points per cycle of a high frequency waveform is covered well in them.


This thread is like deja vu!

Cheers,
David.
MLXXX
QUOTE(2Bdecided @ May 9 2008, 19:19) *

So that's one positive ABX result traced to faulty/poor equipment. I just need to convince MLXXX to look in the same direction, and we might get a sane conclusion to this discussion. wink.gif

Just for the record, I find this type of wording mildly offensive, despite the habitual 'Cheers' tag.

QUOTE(MLXXX @ May 19 2008, 21:06) *

[Will upload my sample clip if possible within the next 24 hours.]
There is an unresolved issue over whether the particular audio clip would meet Forum guidelines. It may be that I will be unable to upload the extract as a test clip. In that case, I guess I'll have to try to find another one that at least prima facie sounds different at 48KHz rather than 96KHz. If it can be established the difference is simply due to deficiencies in the playback chain then so be it. However that might still be a significant result if playback equipment generally available cannot play back well at 48KHz, despite theory indicating 48KHz should be sufficient.
cabbagerat
QUOTE(MLXXX @ May 20 2008, 14:51) *

However that might still be a significant result if playback equipment generally available cannot play back well at 48KHz, despite theory indicating 48KHz should be sufficient.
A more likely conclusion is that the equipment works fine at 48kHz, and adds additional distortion when playing material with content above 20kHz. While it would be difficult to rate which one works "better", good recordings of the sounds as played will answer the question of which is more accurate. It wouldn't surprise me if the 48kHz version were more accurate.
greynol
If we're talking sound cards, the problem is with playback at 44.1kHz whether it be through the analog out or the digital out.

Does Creative Labs even make a card that doesn't re-sample when fed a 44.1kHz signal???

Is it me or is this thread becoming increasingly tedious?
MLXXX
This thread started with the downsampling to 44.1KHz question. But if downsampling to even 48KHz is a probem, 44.1KHz would be even more so.

If anyone can upload a clip of 96KHz audio that is apparently impaired when downsampled even to 48KHz, that might momentarily rescue this thread from tedium, for some participants anyway. smile.gif
SebastianG
QUOTE(MLXXX @ May 19 2008, 19:11) *

I do get suspicious when I look at a digital mixdown of 19KHz and 20Khz sinewaves that were created at 44.1KHz. There are so few sample points and yet as you say cooledit manages to create a realistic graphical interpolation (with this relatively simple waveform).

In contrast, when I look at 19KHz and 20KHz sinewaves created at 96KHz and mixed digitally in cooledit, there are so many more sample points in the mixdown that sophisticated interpolation would not be necessary

Sounds like you still think that downmixing makes reconstruction somewhat harder. As 2B already pointed out all of the following operations are linear:
(1) sampling
(2) mixing
(3) reconstruction
It follows that
CODE

reconstruct(sample(x)) + reconstruct(sample(y))e
= reconstruct(sample(x) + sample(y))
= reconstruct(sample(x + y))


QUOTE(MLXXX @ May 19 2008, 19:11) *

[at a higher sampling rate] you could simply join the dots

So? Relevance? Seriously. Grab a good DSP book that explains sampling and reconstruction. All what's been said here has been said many many times before.

QUOTE

This thread started with the downsampling to 44.1KHz question. But if downsampling to even 48KHz is a probem, 44.1KHz would be even more so.

There's no problem with downsampling to 44.1 kHz or 48 kHz (in theory). Reconstruction is also not a problem (in theory). There are simply soundcards out there that manage to screw up reconstruction. That's about it.
MLXXX
SebastianG, thanks for taking the time to restate this, with precision. However to me it was a side issue. I only mentioned it in response to a post of Martel's [#64]. I may at some stage in my life try to immerse myself in the mathematics that you and others obviously understand so well.

At this point I would like to cut to the chase and ascertain whether there exists a section of a recording of music that people claim is impaired when downsampled to even 48KHz.

If no-one can provide such a clip, and if theory explains why this is so, then I can rest easy when purchasing material that is at 48KHz rather than 96KHz. And I could make my own recordings of musical performances with confidence at only 48KHz.
SebastianG
QUOTE(MLXXX @ May 21 2008, 11:28) *

At this point I would like to cut to the chase and ascertain whether there exists a section of a recording of music that people claim is impaired when downsampled to even 48KHz.

If no-one can provide such a clip, and if theory explains why this is so, then I can rest easy when purchasing material that is at 48KHz rather than 96KHz. And I could make my own recordings of musical performances with confidence at only 48KHz.

By "impaired" you probably meant "perceptually different".

I can't think of any reason why it should be perceptually different given a good reconstruction of both versions (48kHz versus 96kHz) simply because our human ears don't pick up ultrasonics -- at least to the best of our knowledge. This is fairly easy to test with pure tones (sine oscillator). But when people try to verify this with "normal music" instead possible reconstruction errors (aliasing or nonlinear distortions which lead to intermodulation) might make ultrasonic frequencies indirectly audible by polluting the audible spectrum. So, it's very likely that when people succeed in ABXing 48kHz versus 96kHz that something's wrong with the whole reconstruction process (from digital to air pressure). Then, the reconstructed 48kHz version could be even closer to the "original" in terms of perception.

IIRC, this is what 2B has said already.

Cheers,
SG
2Bdecided
QUOTE(MLXXX @ May 20 2008, 23:51) *

QUOTE(2Bdecided @ May 9 2008, 19:19) *

So that's one positive ABX result traced to faulty/poor equipment. I just need to convince MLXXX to look in the same direction, and we might get a sane conclusion to this discussion. wink.gif

Just for the record, I find this type of wording mildly offensive, despite the habitual 'Cheers' tag.
MLXXX, there have been posts where I have been too harsh with you. I apologise. Please let me try to explain where I am coming from, and where my frustration at this discussion comes from!

A lot of people over the years have arrived at Hydrogenaudio, stated they have almost no knowledge or understanding of the subject, yet they feel sure that they have discovered some problem with some aspect of audio that has been missed by the entire audio industry.

The icing on the cake is that the "problem" is probably due to faulty hardware or software, but rather than investigating this possibility to rule it in or out, they prefer to embark on a discussion which implies "every mathematician and engineer who ever proved how this works was an idiot" or more simply "Nyquist was wrong", while saying they have no interest in acquiring the knowledge and understanding that they lack.


That's the offensive part - the "I know nothing about this, but I'm sure all these engineers and mathematicians were wrong". The people probably don't know enough about it to realise that's what they're implying - but that's exactly what they're implying. That's what's offensive - "engineering, science, maths, theory - pah - load of junk - the output of my Creative soundcard proves it's all wrong!". Actually, stated like that, it's not offensive, just funny.

You can find such threads in the FAQ; I hoped you'd see the parallels between them, and your own.



The thing that worries me the most is the kind of rubbish that plagues boards like Audio Asylum, where any problems that might exist in audio will never be solved because there's no acceptance of the basic science behind it. I really don't want to see that kind of thing at Hydrogenaudio.

Cheers,
David.
pdq
@MLXXX: You seem to want someone to prove to you that there is never a problem when downsampling 96 kHz to 48 kHz/44.1 kHz, but such a thing cannot be proven. It is only possible to prove that something does exist, not that it doesn't exist.

However, consider this. You have come to the HydrogenAudio forum, the place where all of the top experts in the field post regularly. The combined experience of these folks is hundreds if not thousands of years. So far nobody has come forward to say that they know of a case where downsampling resulted in an audible difference where it was not eventually shown to be a hardware problem. Can't you accept this as sufficient evidence that your worries are not justified?
MLXXX
2B, yes I can understand that some of my posts may have irritated you and others for the reason that they may have created the impression that I thought that sophisticated mathemathics can readily be disproven with some simple tests in a domestic environment with unsophisticated equipment. That is not where I am coming from.

I do not for a moment question classic Nyquist Shannon concepts. However it is not readily apparent to me how those concepts apply precisely to the human listening experience. It seems to be accepted that the upper frequency continuous sinewave response limit of the human ear (up to about 20KHz) is the relevant bandwidth limit. I have to accept on blind faith that that is all that is relevant and sufficient for the human listening experience.


I've recently spent many hours reading quite a few threads on the sampling rate topic and I would have to say the HA threads are way above the average standard for myself as a reader with an interest in the science (even if I do not fully understand it) as well as broader subjective comments.

Significantly, in no threads have I seen any upload of a file with a higher sample rate such as 96KHz claimed to be reduced in perceptible quality if played back after downsampling to a lower rate such as 48Khz.

That seems extraordinary to me. If 96KHz can sound better as is claimed in so many threads, where is the concrete illustration of the claim???

The absence of such uploads to me significantly weakens the credibility of those who claim superiority of 96KHz per se. [I exclude matters such as the fact that certain DSP operations may be more easily and accurately implemented with some software at 96KHz, e.g. a graphic equalizer function.]

QUOTE(pdq @ May 21 2008, 21:37) *

However, consider this. You have come to the HydrogenAudio forum, the place where all of the top experts in the field post regularly. The combined experience of these folks is hundreds if not thousands of years. So far nobody has come forward to say that they know of a case where downsampling resulted in an audible difference where it was not eventually shown to be a hardware problem. Can't you accept this as sufficient evidence that your worries are not justified?

I have only just read this post of yours pdq, so am adding the following as an edit.

Indeed I think this is the thing. If no-one can identify an instance where 96Khz sampled music sounds superior to 48Khz to the human ear, that really is damning for the 96KHz proponents.
Martel
QUOTE(MLXXX @ May 21 2008, 03:58) *

2B, yes I can understand that some of my posts may have irritated you and others for the reason that they may have created the impression that I thought that sophisticated mathemathics can readily be disproven with some simple tests in a domestic environment with unsophisticated equipment. That is not where I am coming from.

I do not for a moment question classic Nyquist Shannon concepts. However it is not readily apparent to me how those concepts apply precisely to the human listening experience. It seems to be accepted that the upper frequency continuous sinewave response limit of the human ear (up to about 20KHz) is the relevant bandwidth limit. I have to accept on blind faith that that is all that is relevant and sufficient for the human listening experience.


I've recently spent many hours reading quite a few threads on the sampling rate topic and I would have to say the HA threads are way above the average standard for myself as a reader with an interest in the science (even if I do not fully understand it) as well as broader subjective comments.

Significantly, in no threads have I seen any upload of a file with a higher sample rate such as 96KHz claimed to be reduced in perceptible quality if played back after downsampling to a lower rate such as 48Khz.

That seems extraordinary to me. If 96KHz can sound better as is claimed in so many threads, where is the concrete illustration of the claim???

The absence of such uploads to me significantly weakens the credibility of those who claim superiority of 96KHz per se. [I exclude matters such as the fact that certain DSP operations may be more easily and accurately implemented with some software at 96KHz, e.g. a graphic equalizer function.]

QUOTE(pdq @ May 21 2008, 21:37) *

However, consider this. You have come to the HydrogenAudio forum, the place where all of the top experts in the field post regularly. The combined experience of these folks is hundreds if not thousands of years. So far nobody has come forward to say that they know of a case where downsampling resulted in an audible difference where it was not eventually shown to be a hardware problem. Can't you accept this as sufficient evidence that your worries are not justified?

I have only just read this post of yours pdq, so am adding the following as an edit.

Indeed I think this is the thing. If no-one can identify an instance where 96Khz sampled music sounds superior to 48Khz to the human ear, that really is damning for the 96KHz proponents.

Please, let this end already... smile.gif
Someone might claim superiority of 96 kHz to 44,1 kHz but in reality, this is mostly NOT based upon capabilities of the format itself, only upon lame implementation of playback chain, unfounded rumors or general feeling that GREATER = BETTER. There is absolutely no guarantee that a 96 kHz equipment playing a 96 kHz material will sound better than a 44,1 kHz one. There is, perhaps, just higher probability that equipment playing a 44,1kHz content will screw something up because of poor/cheap playback chain design. So by going 96 kHz you are more likely to avoid (audible) issues caused by poor filter design.
If you do not trust your hardware, please go ahead and convert everything to 96kHz using SSRC, so you may find peace having "enough" samples per signal period. laugh.gif
Kees de Visser
QUOTE(MLXXX @ May 21 2008, 12:58) *
If no-one can identify an instance where 96Khz sampled music sounds superior to 48Khz to the human ear, that really is damning for the 96KHz proponents.
I applaud your persistence, especially in this forum full of sceptics. Let's continue the search for a killer sample where the difference is obvious. I would be happy to record and host some (free) samples up to 24/192 kHz. Any suggestions ?
QUOTE(pdq @ May 21 2008, 21:37) *
However, consider this. You have come to the HydrogenAudio forum, the place where all of the top experts in the field post regularly. The combined experience of these folks is hundreds if not thousands of years. So far nobody has come forward to say that they know of a case where downsampling resulted in an audible difference where it was not eventually shown to be a hardware problem. Can't you accept this as sufficient evidence that your worries are not justified?
Mind you, not "all of the top experts" are HA members. I find this kind of reasoning rather deceiving and even intimidating. Can't we just encourage curious people like MLXXX to perform tests and discuss the best ways to do so ? Thousands of audio professionals are moving to hi-res audio. They could all be wrong and wasting money and bandwidth. It can also be a motivation to search for (not necessarily perceptual) reasons why they prefer hi-res audio.
pdq
QUOTE(MLXXX @ May 21 2008, 07:58) *

It seems to be accepted that the upper frequency continuous sinewave response limit of the human ear (up to about 20KHz) is the relevant bandwidth limit. I have to accept on blind faith that that is all that is relevant and sufficient for the human listening experience.

Just one more correction and then I think we can lay this topic to rest.

The relevant bandwidth limit is not the ability to hear continuous sinewaves, unless one is in the habit of listening to high frequency sinewaves. The ability to hear high frequencies in real music, even very synthetic music, is significantly lower. Being able to hear the difference after music has been lowpassed at about 16 to 17 kHz is actually quite rare, although I think there have been some verified cases. Recently someone with admitedly very unusual hearing claimed to hear much higher, but I don't recall that this was ever verified.
MLXXX
QUOTE(Kees de Visser @ May 21 2008, 22:53) *

Let's continue the search for a killer sample where the difference is obvious. I would be happy to record and host some (free) samples up to 24/192 kHz. Any suggestions ?
Thx for your kind remarks.

I suspect that even with a killer sample, the effect might not be all that obvious.

The only suggestion I have and it is one that would only apply where a large number of string players were available (and perhaps playing at a very high standard!) is a recording made with extended range microphone(s) of the violin section of an orchestra.*

As an easier alternative, perhaps people who have in their possession some high definition recordings [recent era; 96Khz+] might be inclined to downsample one or two tracks to 48Khz [44.1Khz could be problematic for other reasons as has been mentioned in this thread] and compare the listening experience to the original sample rate.

There was one website I encountered (Mytek Digital) which had samples of different analogue to digital converters operating at the same sampling rate (192KHz) that had apparently processed the same performance of music. The website invited visitors to compare the digital versions. I could hear differences between the ADCs (which rather surprised me), but I could not hear any differences from converting the sampling from the various ADCs down to 48Khz using Audition 3. The music genre was jazz.

___________

* In the recording I referred to in post #63, the harmony between string sections was sweeter and more fluid on my AVR at 96KHz than at 48KHz. The effect was very subtle and very possibly due to hardware issues, but of a handful of high definition recordings I have evaluated it is the only one where I found I could hear a difference. Subjectively it was similar to the difference between 24 bits and 24 bits truncated to 16 bits, i.e. a very subtle differerence to do with the smoothness of the sound.
2Bdecided
QUOTE(Kees de Visser @ May 21 2008, 13:53) *
Mind you, not "all of the top experts" are HA members.
Clearly not, but I think you'd be amazed at some of the people who are (anonymously). You can catch more of them on various mailing lists, should you want to.

QUOTE
I find this kind of reasoning rather deceiving and even intimidating. Can't we just encourage curious people like MLXXX to perform tests and discuss the best ways to do so ? Thousands of audio professionals are moving to hi-res audio. They could all be wrong and wasting money and bandwidth. It can also be a motivation to search for (not necessarily perceptual) reasons why they prefer hi-res audio.
I don't discourage the investigation - investigation is good. There are, however, clear caveats we can't ignore - otherwise it's a pretty meaningless investigation.

As has been said, we've done that part to death now, so I shall shut up on that.

I think the other samples on PCABX are a good place to start.
http://64.41.69.21/technical/sample_rates/index.htm
Pity there isn't a closely mic'd trumpet.

Cheers,
David.
krabapple
QUOTE(MLXXX @ May 21 2008, 05:28) *

SebastianG, thanks for taking the time to restate this, with precision. However to me it was a side issue. I only mentioned it in response to a post of Martel's [#64]. I may at some stage in my life try to immerse myself in the mathematics that you and others obviously understand so well.

At this point I would like to cut to the chase and ascertain whether there exists a section of a recording of music that people claim is impaired when downsampled to even 48KHz.

If no-one can provide such a clip, and if theory explains why this is so, then I can rest easy when purchasing material that is at 48KHz rather than 96KHz. And I could make my own recordings of musical performances with confidence at only 48KHz.



For recording, why not just split the diff and record at 88.2/24bit? That's an even multiple SR of 44.1, computationally a snap if you need to downsample to CD rate. And it's well above even the ~60kHz 'safety' rate proposed by Lavry and others for surmounting any real or theoretical problems with suboptimal antialias and anti-image filters. At 88.2 you should have no pangs of anxiety (even though I think it's way overkill).


QUOTE
I do not for a moment question classic Nyquist Shannon concepts. However it is not readily apparent to me how those concepts apply precisely to the human listening experience. It seems to be accepted that the upper frequency continuous sinewave response limit of the human ear (up to about 20KHz) is the relevant bandwidth limit. I have to accept on blind faith that that is all that is relevant and sufficient for the human listening experience.



First, my impression is that understanding the maths behind DSP is really the only way to *truly* understand what's going on (which I do not claim I do). As you see some aspects of DSP really are counterintuitive on their face....like the 'few samples at high frequencies' thing.

Second, every attempt so far to argue for the physiological need for higher sample rates in order to produce realistic audio, founders at the blind test stage. Thus proponents have to resort to arguments like: it's a hypersonic effect that is only detectable by brain imaging! (though the 'effect curiously seems to last much longer than the stimulus, and requires custom made playback gear) or, some musical instruments have lots of energy above 20kHz! (and some visible light sources have lots of energy in the UV or infrared ranges...so?) or , what about bone conduction?! (what about it? it's a vibration effect that requires the source to be very close to the body). The only argument with any solid foundation is: 44.1 puts the onus on engineers to make their brickwall filters very good indeed, or to use oversampling, because at 44.1 the cutoff frequency (22.05) is so close to the audible limit. So shoddy implementation at recording or playback could lead to audible artifacts.

You don't have to accept on 'blind faith' that the ear's passband for sounds transmitted through air extends 'only' up to the mid-20's at very best, these numbers weren't pulled from thin air, there is a scientific literature on psychoacoustics and the physiology of audition dating back a century.

QUOTE
The absence of such uploads to me significantly weakens the credibility of those who claim superiority of 96KHz per se


Well, no kidding! biggrin.gif I don't know where you get the impression that '96 kHz per se is superior' is the consensus on HA.org. I'd say it's quite the opposite. Of course, once we travel beyond the confines of 'the village' here, and out into the woods of other 'audiophile' forums, then we start to see claims that have more foundation in belief than evidence. wink.gif
krabapple
QUOTE(MLXXX @ May 21 2008, 09:52) *

QUOTE(Kees de Visser @ May 21 2008, 22:53) *

Let's continue the search for a killer sample where the difference is obvious. I would be happy to record and host some (free) samples up to 24/192 kHz. Any suggestions ?
Thx for your kind remarks.


As an easier alternative, perhaps people who have in their possession some high definition recordings [recent era; 96Khz+] might be inclined to downsample one or two tracks to 48Khz [44.1Khz could be problematic for other reasons as has been mentioned in this thread] and compare the listening experience to the original sample rate.
[/size]



Here is a site that claims to offer the same sample recorded in 96/24 and 44/16.


http://www.soundkeeperrecordings.com/format.htm


(FWIW, the engineer, Barry Diament, spouts a considerably amount of audio woo on Hoffman's board, so take that in advisement)
MLXXX
QUOTE(krabapple @ May 22 2008, 01:40) *

You don't have to accept on 'blind faith' that the ear's passband for sounds transmitted through air extends 'only' up to the mid-20's at very best, these numbers weren't pulled from thin air, there is a scientific literature on psychoacoustics and the physiology of audition dating back a century.

Thx for your various comments krabapple. On this particular aspect, the point I was trying to make is that although it can be said based on decades of testing that the human ear has a bandwidth of around 20KHz when tested with continuous tones, I am obliged to accept on blind faith that that is all that is required as the bandwidth of a digital reconstruction process.

To many people, the two bandwidths are equivalent and no further analysis is necessary.

I am hesitant as real life audio sources can start and stop abruptly and asynchronously. We have a perception of the direction of a sound source as well as its pitch and tonal quality. All of this strikes me as very complex. It is not clear to me (but has to be accepted with blind faith) that because our ears can only hear a continuous tone up to about 20KHz, a bandwidth of 20KHz in the electronics is sufficient for recording and reproducing music.
sld
Don't you think that it is futile to try to force a point home without backing it up with the necessary objective testing data as well as some semblance of knowledge of the mathematics behind longitudinal waveforms and the human ear's response to them?
krabapple
QUOTE(MLXXX @ May 21 2008, 12:23) *

QUOTE(krabapple @ May 22 2008, 01:40) *

You don't have to accept on 'blind faith' that the ear's passband for sounds transmitted through air extends 'only' up to the mid-20's at very best, these numbers weren't pulled from thin air, there is a scientific literature on psychoacoustics and the physiology of audition dating back a century.

Thx for your various comments krabapple. On this particular aspect, the point I was trying to make is that although it can be said based on decades of testing that the human ear has a bandwidth of around 20KHz when tested with continuous tones, I am obliged to accept on blind faith that that is all that is required as the bandwidth of a digital reconstruction process.

To many people, the two bandwidths are equivalent and no further analysis is necessary.


Well, if anything, 'test tones" can be *more* useful for discriminating differnces, than complex samples like music, where psychoacoustic masking effects kick in.


QUOTE
I am hesitant as real life audio sources can start and stop abruptly and asynchronously. We have a perception of the direction of a sound source as well as its pitch and tonal quality. All of this strikes me as very complex. It is not clear to me (but has to be accepted with blind faith) that because our ears can only hear a continuous tone up to about 20KHz, a bandwidth of 20KHz in the electronics is sufficient for recording and reproducing music.


Again, the documented upper limit of hearing is more like 24, not 20, kHz, but this is for children and exceptional adults. Typically adult hearing's increasingly degraded from 16 kHz on up, and even at best in our youth, we are always more sensitive to some ranges than others. That's the way our hearing works, and it makes good evolutionary sense for us to be more sensitive to midrange (speech, vocalization) than to what bats hear. By contrast, the delivered response of CD audio is essentially *flat* to about 20.

The 'it strikes me as complex' is close to an argument from personal incredulity. and the answer is: more reading about how digital audio *works*. The argument about transients ('abrupt starts and stops'), phase ('asynchonicity') and directionality have been done to death and tend to devolve back to one side refusing to accept the science and the maths 'on faith' , though they haven't really grasped the science and the maths in the first place.






2Bdecided
QUOTE(MLXXX @ May 21 2008, 17:23) *
Thx for your various comments krabapple. On this particular aspect, the point I was trying to make is that although it can be said based on decades of testing that the human ear has a bandwidth of around 20KHz when tested with continuous tones, I am obliged to accept on blind faith that that is all that is required as the bandwidth of a digital reconstruction process.

To many people, the two bandwidths are equivalent and no further analysis is necessary.

I am hesitant as real life audio sources can start and stop abruptly and asynchronously. We have a perception of the direction of a sound source as well as its pitch and tonal quality. All of this strikes me as very complex. It is not clear to me (but has to be accepted with blind faith) that because our ears can only hear a continuous tone up to about 20KHz, a bandwidth of 20KHz in the electronics is sufficient for recording and reproducing music.
You're doing it again - you're assuming that, in the entire history of psychoacoustics, no one tried low pass filtering impulses to check the limit that way; no one tried manipulating interaural level, time, and frequency to determine the effects; and no one tried recording audio at high bandwidth, and checked for the audibility of various low pass filters.

krabapple put it succinctly...
QUOTE
The 'it strikes me as complex' is close to an argument from personal incredulity. and the answer is: more reading about how digital audio *works*.

...though you're moving into psychoacoustics now. A very fascinating field. Try these:

http://www.amazon.co.uk/Introduction-Psych...e/dp/0125056281
(the £5 link is a bargain!)
http://www.amazon.co.uk/Psychoacoustics-Mo...595/ref=ed_oe_h
http://www.amazon.co.uk/Hearing-Handbook-P...2980&sr=1-1

These are probably superseded by more modern publications - it's 10 years since I read them.

Finally, this is far away from what you're looking for, but it's so great at explaining the in-band limits of human hearing wrt audio coding that I had to include it...

http://www.ece.rochester.edu/~gsharma/SPS_...AudioCoding.pdf

Cheers,
David.

P.S. EDIT: Sampling Theory
http://groups.google.com/group/comp.dsp/ms...hl=en&fwc=1
Forget the maths if you want - the conclusions themselves are interesting. You'll find some of them quoted in this thread. Doubting them is about as useful as doubting that 2+2=4.
MLXXX
QUOTE(cabbagerat @ May 21 2008, 17:00) *

QUOTE(MLXXX @ May 20 2008, 14:51) *

However that might still be a significant result if playback equipment generally available cannot play back well at 48KHz, despite theory indicating 48KHz should be sufficient.
A more likely conclusion is that the equipment works fine at 48kHz, and adds additional distortion when playing material with content above 20kHz. While it would be difficult to rate which one works "better", good recordings of the sounds as played will answer the question of which is more accurate. It wouldn't surprise me if the 48kHz version were more accurate.

Thanks Cabbagerat. It seems that even if a particular file did seem to sound better when played at one sample rate than another, using a particular playback chain, there could be any number of possible reasons for that outcome.

QUOTE(2Bdecided @ May 22 2008, 20:49) *

You're doing it again - you're assuming that, in the entire history of psychoacoustics, no one tried low pass filtering impulses to check the limit that way; no one tried manipulating interaural level, time, and frequency to determine the effects; and no one tried recording audio at high bandwidth, and checked for the audibility of various low pass filters.

2B, I have for years assumed such tests would have been done.

_____________________


Perhaps this thread has reached a natural end, unless there are any actual audio clips at around 96KHz or more that have been identified (and can be linked to, or uploaded ) that appear to sound better at the higher sampling rate than when downsampled to 48Khz. Though if anyone has the courage to identify such a clip, they should be ready for their claim to be challenged!

Thanks again for the various helpful comments,
MLXXX
Martel
QUOTE(MLXXX @ May 22 2008, 06:24) *

Perhaps this thread has reached a natural end, unless there are any actual audio clips at around 96KHz or more that have been identified (and can be linked to, or uploaded ) that appear to sound better at the higher sampling rate than when downsampled to 48Khz. Though if anyone has the courage to identify such a clip, they should be ready for their claim to be challenged!

Man, this is not about courage, this would be about breaking the human body limits! laugh.gif
And, please, rule out the sampling rate from your considerations, a 96kHz waveform lowpassed at 24 kHz bears exactly the same information as the same waveform downsampled to 48 kHz (if downsampled ideally).
If I were you, I would first "investigate" the possibility of identifying a 24 kHz lowpass filter. Try and start a new thread. laugh.gif
MLXXX
QUOTE(Martel @ May 23 2008, 17:08) *

If I were you, I would first "investigate" the possibility of identifying a 24 kHz lowpass filter. Try and start a new thread.
Martel, I've already spent a lot of time in this thread and what I am about to write is relevant to what has gone before.

1st test:
File 1: Audition 3 used to generate a tone of 8333Hz at -20dB @ 192KHz sample rate (single channel).
File 2: Audition 3 used to generate a third harmonic of 8333Hz at -20dB (i.e. a tone at 24999Hz)@ 192KHz.
File 3 created (single channel): file 1 + file 2, @ 192KHz

When file 1 and file 3 were attempted to be ABXd a problem arose as the tweeter in trying to handle the 24999Hz tone was not able to reproduce the 8333Hz at full amplitude.

[With a microphone at 1m from the tweeter and using an oscilliscope connected to the output of the analogue mixer, the peak to peak voltage was slightly less when playing file 3 compared with file 1. The waveform shape was different as well.]

After temporarily reducing the amplitude of file 1 by a small amount, the files still sounded different when an ABX was attempted.

However I was concerned that the tweeter might be creating spurious effects, so I changed the experimental setup.

2nd test:
Stereo file A created with file 1 (8333Hz) as the left channel and file 2 (24999Hz) as the right channel.
Stereo file B created with file 1 (8333Hz) as the left channel and zero signal for the right channel.

Playback volume of the left speaker was tested with the microphone 1 metre in front of the tweeter and feeding the oscilloscope. Amplitude of the waveform from the left speaker remained constant whether or not the right channel was playing, i.e. whether file A or file B was played.

At a reasonable listening distance, A and B sounded different (file A seemed louder and a little richer).

I was concerned that the separation of the speakers was so great it was creating a sound field full of peaks and troughs. The wavelength of 8333Hz is only a little over 4 centimetres.

3rd test:
In the interest of science, I moved the front left and front right home theatre speaker enclosures so their sides were touching, and played files A and B in an endless loop.

Even at a distance of 8 meters on axis from the speakers there were very noticeable nodes in the sound field. As in test 2, file A seemed louder and little richer. This was clearcut. (However it was important not to move as the loop played.)

As a type of control, I created a file 3L, which had the contents of file 3 in the right channel, and nothing in the left channel. When this was played, the sound field was full of nodes. This was to be expected. Our living room is not an anechoic chamber.

Also, with the speakers adjacent, I positioned the microphone about 1.5 metres away and observed the oscilloscope. The waveform was not perfect but it was very different to a sine wave when one speaker was reproducing the 8333Hz tone and the other the 3rd harmonic.

Conclusions:

Although the third harmonic of a tone at 8333Hz cannot be heard when played by itself (i.e. as a tone of 24999Hz) by adult human beings, it can have an impact on the human listening experience, when the fundamental frequency is also being reproduced by a loudspeaker system in a home environment.

If the 24999Hz tone is absent, the listening experience can be different. Subjectively (for me) it is slightly less rich. Also I found that when the harmonic was present, I perceived the pitch as sounding slighter sharper if my ears were fresh, but flatter if I had been ABXing for a while. [This certainly didn't assist the ABX process!]

The effect was subtle.

Some audio cannot be downsampled to 44.1KHz, or even 48KHz, without affecting the perceived sound.

Equipment used:

Software: Audition 3, Cooledit 2, foobar
AVR driven from PC with coaxial SPDIF at 192Khz.
Medium price hi-fi speakers [Magnat "Vintage 350", rated 20Hz - 35KHz]
Rode NT1-A microphone
Behringer analogue mixer
Dated oscilloscope

*******************

As it is late, I will not attempt to upload any of the test files. They are quite easy to generate using cooledit or audition, anyway. [Edit: Stereo test files are now at post #105.]

I imagine that these results are no surprise to many readers, but will surprise some others.

How this type of experiment relates to the proposition that an audio bandwidth of around 20KHz is sufficient for the human listening experience I will leave to others to comment on, if they so wish.

I note that by sending the third harmonic through a separate amplifier, I avoided the issue of intermodulation distortion in the amplifier and the speakers [though not any possible IMD in my own hearing]. I listened at what I'd term a 'moderate' level, certainly not a loud level for listening to music. The 24999Hz waveform when displayed on the oscilloscope looked quite smooth (a sinusoid) when only it was being played. Similarly when only the 8333Hz waveform was played, there was a smooth sinusoid. However when the combined waveform was played through one speaker [or when played with two adjacent speakers each taking a separate frequency], the shape of the waveform altered on the oscilliscope, and the quality of the sound changed slightly for my ears.


ABX results:

foo_abx 1.3.1 report
foobar2000 v0.9.5.1
2008/05/26 00:20:21

File A: \\star8\shareddocs\sineplus3rdharmonic\8333inleft&8333_3rdharmonicinright@192.wav
File B: \\star8\shareddocs\sineplus3rdharmonic\8333inleftnothinginright@192.wav

00:20:21 : Test started.
00:20:40 : 01/01 50.0%
00:20:52 : 02/02 25.0%
00:23:49 : 03/03 12.5%
00:43:17 : 04/04 6.3%
00:43:57 : 05/05 3.1%
00:44:12 : Test finished.

----------
Total: 5/5 (3.1%)
This is a "lo-fi" version of our main content. To view the full version with more information, formatting and images, please click here.
Invision Power Board © 2001-2008 Invision Power Services, Inc.