I've been trying tigre's 24/96 test proposed in
this thread, and also discussed at
Afterdawn.
High definition stuff is also discussed
here,
here, and samples are
here, but yeah, we've got a listening test thread now, so might as well use it...
My equipment is an M-Audio Revolution 7.1 feeding straight to Sennheiser HD 200 headphones. I downloaded
Lovely_1.wv and used foobar2000 to do resampling, replaygaining, and ABXing. At first I was using waveOut, but then I retested them all using Kernel Streaming.
Anyway, I can ABX (with less than 1% chance of guessing):
[24/96] vs. [24/96->16/44.1->24/96] (slow resampling, dither)
[24/96] vs. [24/96->24/44.1->24/96] (slow resampling)
[24/96] vs. [24/96->16/96->24/96] (dither)
My results varied a bit, but all were significant. The first test I did, I was not expecting to hear any difference, so I was very careful, and got 12/12. Since then I've had 12/12s, 11/12s, a 10/10, and an 8/8 (got interrupted but still a valid result, and it was only a retest...)
The most consistently hearable difference for me is when I listen between 5.2 and 7.2 seconds. Some sort of drum gets hit at about 5.7s. The high definition one is somehow more convincing. Today I was thinking of the good one as a push and the bad one as a pull, but yeah that's not a very helpdul description..
I'm also hearing other differences, but it's hard to know whether I'm being tipped off by something while focussing on something else, or even what the actual difference is in objective terms.
So, what could be wrong? What else would be worth testing? I was thinking of noise shaping the output maybe...
I'm not that keen to do a huge amount of retesting with every possible combination, but if someone thinks of something important I'll be sure to check it.
Thanks for the effort, listen. I hope this will encourage others to perform the test too - and some knowledgable people arround here to share their ideas about 'what could be wrong'.
I have tried to reproduce your findings, but never got better than 4/4, then results got messed up. I need to take breaks after a few trials but haven't had enough patience yet.
I just tried another ABX test.
This time it was 96kHz with a lowpass around 21-22, against 44.1kHz.
I tried this for two reasons.
- To ensure that really high frequency content was not messing stuff up and causing an audible difference lower down in the spectrum. For example my headphones probably don't respond well to things much above the audible spectrum. Low passing eliminates the possibility that the high-def one actually sounds worse because of my equipment.
- Because I think the difference I'm hearing is something other than really high frequency sound. In fact, at the volume I'm listening at, I wouldn't be surprised at all if I couldn't even hear to 18kHz. Come to think of it, I don't even know if I can hear that high anyway.
Anyway, I got 11/12... good enough... and not hugely difficult (I must be getting used to it). I heard exactly the same difference as I did without the low-pass. Well, that's what I think at least. The bit of percussion that I mentioned above seems more defined and convincing, and it just sits better. It sounds slightly more like someone playing it, rather than just a recording of the sound it can make.
I shouldn't forget though, that filters aren't perfect.
I'm guessing nobody knows a foolproof way of low-passing that I could use (not resampling to 44.1

)
edit: what a demented winkie..
Intreseting. Mabe we're getting closer to track it down. What kind of lowpass have you used for your last test?
Can you please run the sample I attatch (one second of silece with single click) through this lowpass and post it here (if you don't have the possibility to upload, use upload forum or tell me, I'll PM you my email addy).
Hi tigre,
I'm not sure why I didn't check this before, but I just discovered that none of the low-pass methods I used are transparent. Even though I think they still sound better than 44.1, to prove that it's better (not just different) I guess I would need to use a low-passed file that I can't tell apart from the original 96.
I don't know a huge amount about these filters, so please suggest a better way if you know. I'm using Audition (same as CoolEdit), and so far I've used the 'Butterworth' filter to make three files. One has a very steep rolloff starting at 21K, and practically disappearing just over 22K. Then I tried with a slow rolloff starting at 19K, with a fair bit of sound remaining past 22K. And I also tried one somewhere in between. None of them was transparent
I don't want to start any lower than 19K, and I don't want to let in very much above 22K, because that would defeat the purpose of the test. What can I do?
Such rate of success in ABX makes the results a little bit suspicious, in my opinion. Suspicious mostly about something going wrong during the file generation process or the ABX procedure, possibly the RG process. But I can't say for sure, maybe it's all ok.
One of the things that looks strange is that you can ABX easily 16/96 vs. 24/96, that is, a change just in bitdepth. This is the first time I've seen something similar, here, on at any other forum I know, and this can't be reasonabily explained by any kind of poor transducer (headphone) or amp performance, mostly intermodulation. So I think more tests should be carried out to find out what's really going on.
I haven't still looked at the mentioned test files, and verified the actual high frequency content and noise floor at the parts that are ABXable. When I find some time and feel like working on this, I will try to look at what I just commented, and generate some controlled test files over these parts changing just bitdepth and lowpass and see if you can ABX them.
About Audition Butterworth lowpass filter: it distorts phase at frequencies somewhat below the cutoff point. Better use a Chevychev 2 filter, or even better use FFT filter of as much as 1024 points, Blackman windowing.
Also, if you feel like it, try disabling RG on foobar ABX tool, and verify you are not using any DSPs when generating test files. Try also using flat dither instead of noiseshaping dither, and fast resampling when downsampling.
In any case, it's good to know of your experience and results.
(I wish I had already fixed WinABX for 24-bit playback in W2K and XP. I'm on the way, but not done it yet, sorry)
Ok, finally I got to generate the test files.
Download this 1.8M zip:
http://www.kikeg.arrakis.es/various/lovely_test.zipIt includes a flac file, flac decoder and SSRC. Extract to a folder, and execute (click) the 'generate.bat' file. 4 files will be generated:
- A: lovely_short.wav: original
- B1: lovely_16bit_dshaped.wav: dithered to 16 bit using noiseshaping dither, then back to 24 bit.
- B2: lovely_lowpass.wav: resampled to 44.1 KHz, then back to 96 KHz, all at 24 bit.
- B3: lovely_16bit_dflat.wav: dithered to 16 bit using flat dither, then back to 24 bit.
Edit: the flac file goes from approx. 4.2 sec to 8.2 sec. of original lovely_1.wv file.
Now, try ABXing the original from any of the other 3. Don't use RG since it's not needed at all. Please post total nş of trials and correct identifications.
Edit: of the two 16-bit converted files, try anyone you wish. I'd try the other varying dither alternative only if I could ABX the first dither option tried.
2Bdecided
Jan 7 2004, 04:26
QUOTE(KikeG @ Jan 7 2004, 09:52 AM)
Don't use RG since it's not needed at all.
That's very good advice! I don't know why foobar2k suggests using ReplayGain when ABXing. It's useful if you're comparing a codec which has intentionally (or unintentionally) scaled the audio, and it will probably prevent clipping - but otherwise it's a bad idea when ABXing! If you're sure that neither of the above can happen, then there is simply no need to ReplayGain in an aBX test, and it only adds another chance for error. You need to avoid all possible errors in a 16-bit vs 24-bit test!
Replay Gain cannot possibly help here if you're doing things correctly, but there's always the chance that it might scale one file fractionally differently from another, which introduces an extra variable that you don't want.
(If you're ABXing codecs which have (or might have) changed the volume of the file, then of course ReplayGain is very useful, but that's a different matter).
Cheers,
David.
Now, when you have tried the test files at my previous post and want to try the true test at issue, download the bat file at
http://www.kikeg.arrakis.es/various/generate2.bat , put it on the same folder of the previous test, and execute it. It will generate 2 additional files to try to ABX from the original:
- B4: lovely_downs_dflat.wav: resampled to 16/44.1 and back to 24/96, using flat dither.
- B5: lovely_downs_dath.wav: resampled to 16/44.1 and back to 24/96, using soft ATH noiseshaping dither.
Try one of the files first. I'd try the other varying dither alternative only if I could ABX the first dither option tried.
Thanks for the input, I'll start working through your files soon...
I didn't mean to give the impression that 16bit vs. 24bit was easy, although there was one time I tried in the middle of the night (very relaxed, and no highway noise, background music etc..) when it was easy for a couple of minutes. Usually it is more difficult, in fact yesterday I couldn't seem to do it at all (with the same files and settings that I could tell apart before). Maybe I was just impatient..
also, I've discovered that my left ear is a bit blocked, and hears a couple of dB less than my right. Yesterday it 'popped', which is a relief to know it's not deafness, but everything in the headphones sounded quite different. Now it's back to how it was before. It feels like a blockage starting at my nose (if that makes sense)... in fact even when I swallow, I feel it more on the right, or maybe that's just what I'm hearing.
Yeah, replaygain... it seemed like foobar wouldn't let me ABX unless they were replaygained, but now I see there is an option to turn it off. I hope that's not all I was hearing..
Pio2001
Jan 8 2004, 17:31
I tried to ABX KikeG's files : lovely_short vs lovely_downs_dath
I thought I could hear the difference. I was carefully performing the ABX sessions...
After 5 sessions I made a pause and looked at my results : 3/5. Since I decided to go in 8, this is a failure.
WindowsXP, Wave Out, Marian Marc 2 soundcard, Senheiser HD-600 headphones.
The PC was running in one room, and I was listening in the next room, with the PC picture displayed on a screen by a very silent videoprojector in low lamp mode.
EDIT : no cable extension was used, the Senheiser cable was just long enough to run under the door from one room to the next, with the mouse, keyboard and video extention cables
Ok, I had a session last night and got some results. I didn't spend all night trying for high-scores, these are more like 'first concentrated attempt once I figured out the difference' scores.
First I tested 16 bit, flat dither:
The spot I had found the best for 16 bit testing was at 9s (in lovely_1), but doesn't matter, I used another guitar chord, at 2.1s in lovely_short.
This was not easy, and I only recorded a 10/12. I might re-do this at some stage to try for a better result, just to make sure.
Then 44.1KHz:
Using the tambourine at 3.1s (in short file), I was able to work through this in just a few minutes. I made a couple of stupid mistakes and didn't want to record a 10/12, so I took it up to 14/16. I also loaded up Audition and discovered that at the volume level I'm working with, 18KHz by itself is completely inaudible to me.
16 bit, noiseshaping dither:
I recorded that this was a dead giveaway. Sorry I didn't write down what part I listened to, but I would guess it was either 2.1s again, or my main spot around 3s. I got it to 8/8 without problems, but suddenly lost it completely, so I took a break.
lovely_downs_dflat:
This was easy, sounded like a lowpass (listening to the tambourine at 3.1s).
8/8, pretty hard to miss.
lovely_downs_dath:
I found this sample very confusing. I'm not in the habit of regularly comparing X and Y to A, I usually just compare X and Y and say that the best one is A. Well, this time, I got nine wrong in a row! It seems that listening to the ath shaped file makes the original sound bad

... I kept choosing downs_dath as the good one. Then I realised I should compare to A, and I eventually ended up with a 12/14, listening to the tambourine. The 'lowpass' problem was not as easy to pick up here though, or maybe I was just getting tired by this stage.
Well, this was not meant to be a dither test, but it's still interesting
For any sceptics, all I can say is that nobody knows who I am, so I don't really have anything to gain. If I was a well known sound engineer it would be different.
Anyway, I'm sure that more people can hear it than they realise. Maybe I will write an ABX guide...
Pio, have you tried kernel streaming? It was suggested to me because of waveout possibly not getting through windows untainted.
I'm quite envious of your setup... even my computer fan is really starting to irritate me now.
Pio2001
Jan 9 2004, 05:22
My soundcard doesn't support kernel streaming. But I checked long ago that the PCabx playback was bit perfect. Anyway it is bit perfect in Winamp with wave out.
Continuum
Jan 9 2004, 07:59
OT:
QUOTE(Pio2001 @ Jan 9 2004, 12:31 AM)
...a very silent videoprojector...
Unbelievable! What is this prodigy of engineering?!
Very interesting...
Listen, could you try another test more, with alternative processing algorithms?
Dowload this 1.5M file:
http://www.kikeg.arrakis.es/various/lovely_test2.zipExtract its contents to same folder of my previous test files, and execute (click) the 'generate3.bat' file. 3 new files will be generated:
- B6: lovely_lowpass2.wav: a different lowpass.
- B7: lovely_dith2.wav: a different bitdepth reduction type.
- B8: lovely_lowpass2_dith2.wav: lowpass and bitdepth reduction simultaneously.
Again, try to ABX them from the 'lovely_short.wav' original of my previous test files.
Now, could you please verify and notify us that RG and DSP (except maybe volume control) in foobar2000 are disabled? Also, do you use something to control output volume? If so, what is it? Foobar volume control, or Revo control panel? If so, what are the settings?
Thanks for the testing.
QUOTE(listen @ Jan 9 2004, 03:13 AM)
lovely_downs_dath:
I found this sample very confusing. I'm not in the habit of regularly comparing X and Y to A, I usually just compare X and Y and say that the best one is A. Well, this time, I got nine wrong in a row! It seems that listening to the ath shaped file makes the original sound bad

... I kept choosing downs_dath as the good one. Then I realised I should compare to A, and I eventually ended up with a 12/14, listening to the tambourine
How can you end up with 12/14 if you got the first nine wrong?
Please mention _ALL_ ABX results, not just the ones you like. Otherwhise these tests are worthless.
listen
Jan 10 2004, 17:57
Sure I will try the next batch, and double check foobar when I do. About volume, yes I use just the master volume on Revo control panel. It's set on about 3/4. Sensaura mode is not on, and the sample rate selector shows 96000.
Might be a bit longer this time, I'm trying the tests on other people too.. although a result from somewhere else would be better, just to show it's not my setup going wrong.
Garf,
the difference I hear is subtle, and most of the time I have to play around with the files for a while before I can rely on what I think I'm hearing. Once I figure it out,
of course I use the reset button before trying for a good result. I don't see how a result of say 10 in a row is worthless in any context, assuming I haven't made hundreds (or thousands

) of attempts before it.
Continuum
Jan 11 2004, 01:43
QUOTE(listen @ Jan 11 2004, 12:57 AM)
the difference I hear is subtle, and most of the time I have to play around with the files for a while before I can rely on what I think I'm hearing. Once I figure it out, of course I use the reset button before trying for a good result.
If you do this everytime, i.e. resetting before the true test starts, and never count the earlier trials, then it is no problem. But if you choose to reset based on your previous score, the results will loose some of their statistical significance.
QUOTE(Continuum @ Jan 11 2004, 09:43 AM)
If you do this everytime, i.e. resetting before the true test starts,
Another problem is determining when 'the true test' starts.
In any case, you should always count all trials. Even if you were just trying at first you will still get a significant result, provided you're really hearing a difference in the later trials. A score like 35/50 may not look as impressive as 10/10 but it's still significant!
Throwing out results is a big no-no in a sensitive test like this, and can very easily flaw the results.
PS. If I read the comments above, It seems to me that you did do more tests and those didn't give significant results. You must mention this! If you do 6 tests and 1 comes back significant, the overall result isn't necessarily valid with the same degree of confidence!
Continuum
Jan 11 2004, 04:35
QUOTE(Garf @ Jan 11 2004, 10:51 AM)
Another problem is determining when 'the true test' starts.
Doesn't matter, as long as he resets the counter before taking the test.
(...and there is only
one true test

)
Some more stuff to test with:
http://sjeng.org/ftp/Orig.wvhttp://sjeng.org/ftp/NoTrunc.wvhttp://sjeng.org/ftp/Trunc16.wvFirst one is the original, padded with 2 secs of silence to either edge (to prevent edge artifacts).
Second one is the original, resampled to 44.1, and then back to 96k, in full 32 bit float precision with my own resampling filter.
Third one is the original, resampled to 44.1, truncated to 16 bits, and then again upsampled to 96k at 24 bit precision.
The resampling filter should have better quality than SSRC slow mode. If you can ABX the first against the second, I don't know what the heck could be wrong
listen
Jan 13 2004, 20:38
Just thought I should say that I'm leaving town tomorrow, and will be away from my computer for at least six weeks. I've been busy recently, and haven't had much chance for listening.. I did try your most recent files KikeG, but without success yet. I will give them some more time in March though. So far, I have no interesting results for lovely_dith2, and while lovely_lowpass2 is pointing my way a bit, it could just be luck. Still, I need to sit down and really concentrate before I discount them completely. Garf, I will try yours in March too.
So far, I've also tried the test on about 10 people, who were unable to hear anything. But the other day, one of my friends got a 9/12 on lovely_16bit_dflat. That's not a great result, but it was the only result, and also the first time she had listened to it. It took more than an hour, so for anyone still trying it, don't just start guessing and give up after 10 minutes...
I think the most important thing to remember when testing these files is that the same file will gradually sound different as you listen to it more and more. So you can't just recognise a certain problem straight away every time you listen. A good way to test is to decide on which file (X or Y) you think is better, and compare it to the other. Listen to it a few times in a row and then switch to the other one. Take note of how much worse the second one was. Then repeat it a few times. After this, swap the files over.. that is, decide that actually the other file is the good one. Then repeat the process. After that, you might want to swap back again. How long it takes probably depends on a lot of things, but eventually you will notice that one of them takes a bigger boost from your imagination than the other. Or you might notice that one of them seems reluctant to be the bad one. You also should listen to A occasionally, to keep your perspective right. Once you begin to notice some consistency in all the little hints that you pick up from this process there is a very good chance that you will get it right.
Hey tigre, where does this sample come from? I'm really getting into it now, and I think I'll buy the DVD(?) if it's available.
listen
Mar 30 2004, 21:57
I got motivated by a thread I saw the other day... and it's still March, just..
I tested lovely_short against lovely_lowpass2 last night. I had a
single attempt, listening to the last percussion sound in the file. 11/12. I also checked it out with a spectogram, and was surprised to see frequencies represented right up to 29KHz! Since I can't hear a lone sine-wave even at 18KHz, I would say this might suggest there is more to hearing than we think..
Still no results for lovely_dith2.
I'm very busy this year, but if dith2 is important I can spend some more time with it... it seems more worthwhile testing files that I do get results for though.
listen, thanks for still spending time on this.
To ask your question from january: The samples are from
this Chesky DVD.
IIRC you weren't able to ABX 24/96 vs. 16/96 so far (I've read through the thread again, but maybe I've missed it), so it would be interesting to perform some how-high-can-you-hear test. If you don't have the necessary software, I can create some samples for this if you want.
QUOTE(listen @ Mar 31 2004, 05:57 AM)
I had a single attempt, listening to the last percussion sound in the file. 11/12.
I just noticed this. I hope this doesn't mean that you've performed multiple ABX sessions before and only reported successful results. In this case you need to add all results (e.g. 7/12 + 11/12 + 8/12 = 26/36) and the p-value must be calculated from the total score. If such "cherry-picking" is involved he p-value of the "successful" attempt is not statistically valid.
2Bdecided
Mar 31 2004, 03:51
QUOTE(listen @ Jan 10 2004, 11:57 PM)
About volume, yes I use just the master volume on Revo control panel. It's set on about 3/4.
Is this an analogue or digital control?
If it's like the typical M-audio controls, it's digital.
This means it will be scaling and re-dithering or truncating the signal. At a given bit-depth, over twice the dither or noise power will be falling into the audio band for 44.1kHz sampled material compared with 96kHz sampled material (since the dither noise or truncation artefacts will spread across the entire sampled bandwidth).
So, if the card dithers, there will be more noise at 44.1kHz than at 96kHz. If the card doesn't dither, then they'll be more distortion at 44.1kHz than at 96kHz, and that distortion will alias down, and hence be inharmonic.
Even if both samples are at 96kHz (as in most of these tests), a sample with energy above 22kHz may have a slightly different spectrum below 22kHz after truncation than a sample without anything above 22kHz.
This may or may not be an issue, but it seems it should be avoided.
Another interesting test will be to ABX dithered silence against a 20kHz high-pass filtered version.
Of course, the "picking" of ABX results needs to be clarified first.
Cheers,
David.
Please forgive my ignorance as a complete newbie to both ABX and statistics (and also I am aware that this is pushing slightly off-topic). But I am interested in the thinking behind simply adding together the results of multiple ABX sessions.
As an example, say I run three sessions, each of which I decide in advance will be 11 steps. And the results are 5/11, 5/11, 11/11. If I add these results I get 21/33 which indicates that I could not differentiate between the samples reliably at all. It is the same as I would get from 7/11, 7/11, 7/11.
But for me the first set of results does seem more interesting than the 7/11 set. Perhaps I was tired the day I did the two 5/11 sessions. Or was just getting used to what to listen out for. The fact is that in one session I was able to identify the samples in every case. I can see that this does not make things as clear cut as if I was able to ABX 11/11 first time. So it makes sense to have to provide the results of every session. But to me, adding them together seems to remove significance from the results.
I suppose one answer to this is to say that if I have now learnt to differentiate between the two samples then I can carry on doing more ABX sessions and adding the results and eventually if I keep getting 11/11 then the probability value will become low enough to show I can indeed differentiate them. But I could imagine I would tire after a while and perhaps not be able to sustain this for the amount of sessions needed.
Is there any statistical way of taking into account the split of results over different sessions? Or is my understanding of this simply invalid from a statistical point of view?
listen
Mar 31 2004, 05:36
Well yes you pretty much sum it up perfectly in my view phwip. I understand completely what you mean tigre, and I certainly wouldn't perform a whole bunch of tests to get one good result. I do often mess around with the sample for a while first though, because it takes time to learn and concentrate. It's a complete waste of time to start counting results when I still can't hear or even imagine what the difference is. I'm quite confident with my results though, because every now and then I can hear the difference quite easily (for just a few listens at a time), and in this case I
always guess correctly. It's all the rest of the time that's tiring and tedious.
Tigre, I think I've had a few good results for 16 vs. 24bit, and there was one time that it seemed really easy, but that hasn't happened again... i'm much more interested in sampling rate anyway. But I should keep testing both. What do you mean by how high I can hear? It sounds like frequency, but you are talking about 16 vs. 24bit??
2Bdecided, yes, I was just thinking about the volume control before.. I'm not really that keen to test on full volume because it's so loud

I wish I had a good amp....
phwip: immagine you throw a coin 10 times. The probability to get 10 times heads in a row is 1/2^10, so most likely this won't happen. If you repeat this again and again, you'll get 10 times heads in a row sooner or later for sure. It's too hard for me to give a detailed explanation in English why adding the scores and calculating the p-value from the sum works, but in fact the "probability to reach this result by guessing" is the same for 5/11, 5/11, 11/11 and 3x 7/11 (besides that you would probably give up ABXing the same position focusing on the same problem after having scored 5/11 2 times

).
Of course if you ABX a different position and/or focussing on a different problem, you don't have to add the scores of previous attempts (at least IMO, if you want to be uber-correct you'd have to).
QUOTE(listen @ Mar 31 2004, 01:36 PM)
Well yes you pretty much sum it up perfectly in my view phwip. I understand completely what you mean tigre, and I certainly wouldn't perform a whole bunch of tests to get one good result. I do often mess around with the sample for a while first though, because it takes time to learn and concentrate. It's a complete waste of time to start counting results when I still can't hear or even imagine what the difference is. I'm quite confident with my results though, because every now and then I can hear the difference quite easily (for just a few listens at a time), and in this case I always guess correctly. It's all the rest of the time that's tiring and tedious.
Sounds to me like (at least most of) your results are statistically valid.
QUOTE
But I should keep testing both. What do you mean by how high I can hear? It sounds like frequency, but you are talking about 16 vs. 24bit??
I'm talking about frequency. It would be interesting if you weren't able to hear the frequencies themselves (i.e. pure sine tones) but could hear a lowpass at the same frequency. This could be regarded as evidence for the theory that the ear's "amplifier" (there's been a thread with details about this, but I can't find it right now) can be trigered by frequencies that aren't audible themselves, so these fequencies would change the sound audibly.
OTH 2Bdecided's idea needs to be checked first before jumping to such a conclusion. Related to this: Doesn't have your soundcard 2 sliders to control volume, one in digital domain ("master volume" or similar that will be bypassed by kernel streaming) the other controlling the analog amplification after DAC (maybe I confuse some things here

).
I hate statistics:
The probability to get 11 correct flips in a row is approx 0.005%.
The probability to get 11 correct flips in a row after max 30 tries is max 1%.
The probability to get 11 correct flips in a row after max 50 tries is max 2%.
Pio2001
Mar 31 2004, 12:39
It is perfectly valid, and recommended, to train oneself before starting the real ABX test.
What must not be done is deciding if we are "aware" after the tests. For example, if I'm not sure to succeed, I try a round of 8 ABX, and look at my score. Then I must not say, if I get 16/16, that I succeeded, because it was a training ! It would allow me to throw out as many failures as I want, because "they were just trainings". If I succeed during a training, I must necessarily do it again, for real, to get a valid result.
In short, it is necessary to decide before starting, if the result will be thrown out, or kept. Then we must hold on to what was decided. If we get 16/16 while we decided to throw it away, let's throw it away ! If we get tired, can't concentrate, and get 5/16 while it was decided to keep the result, then keep it.
In the last case, the mistake was to go on with the test while tireness came. Once the difference can't be heard anymore, the test must be paused.
Only give answers about which you are certain. That's the key

When you doubt, don't click. Keep the current score, close the program, and get some rest.
Thanks Pio2001, that makes things much clearer for me. While I understand the mathematics in Tigre's coin example, I do think that the simplistic solution of adding together all the results only makes sense there because we are analysing a simple scenario (tossing a coin), where there are unlikely to be external factors that influence the result and vary over time.
For audio ABXing there are many additional factors such as tiredness, boredom, familiarity with the sample, peripheral noise, etc, which may need to be taken into account. However, I agree that if you are able to include or exclude sessions, as long as you decide before that session starts, and if you take breaks during a session if you feel it is necessary, these together should hopefully limit the effect of these other factors.
listen
Mar 31 2004, 15:44
I've got a master volume, and also faders for left and right channels. They all change the volume with Kernel Streaming...
For the low-pass, the problem is, you don't know if you're hearing the absence of high frequencies, or the effects of the filter itself. Right?
listen
Mar 31 2004, 22:39
Thanks for the stats sshd..
QUOTE(sshd @ Mar 31 2004, 05:33 AM)
The probability to get 11 correct flips in a row after max 50 tries is max 2%.
Well, then, even considering a hypothetical scenario where I had tried every sample 50 times in total, that's still a very low probability, once you consider how many of these results I've had.. I'm not at all ignorant of statistics, but it seems that most of this thread is an endless re-justification of results. It's reasonable for people to query things they can't reproduce themselves, but I'm certain it's not the results that are at fault.
The volume issue sounds like a much more likely cause.
Maybe I can listen with earplugs..
Ok, I've been quite absent from some time here (this is a very time-consuming hobby, and now I want to do other things), but now I'd just want to add a few things, which are the reason for the last test samples I posted.
According to published specs, listen headphones shouldn't have any significant response at the frequency cutoff at the lovely_lowpass2 sample (IIRC they are rated up to 21 KHz).
My attempt to explain this and previous results is that ultrasonic information may be getting audible due to intermodulation somewhere at the listening chain, causing very low level products at audible frequencies, together with what seems to be exceptionally good low-level hearing abilities of listen, and also his isolating DJ headphones. However, this is not more than the most reasonable explanation I can find for the results.
Also, according to published specs and my own measurements of the Revo soundcard, the noisefloor of lovely_dith2 sample should be quite below what card hardware can resolve, more taking into account he was not listening at full volume. Maybe that's the reason why you can't get a consistent ABX result here.
As a side note, M-Audio control panel attenuation is always performed digitally at 24-bit resolution or more, so quantization distortion should not ever be an issue here. A poorer effective dynamic range would.
I don't know when I will write again, I just exposed more information so that you all have more to think about.
Thanks for your answer, KikeG. I've been thinking about similar hardware related problems as possible reasons before too, but for some reason I forgot posting about it.
So I've got 2 questions:
1. To listen: Would you be ready (and do you have the equipment) to do some loopback tests, i.e. connect your headphone amp to your soundcard's linein (maybe it would be good to use some one-input-two-outputs adaptor to connect the headphones at the same time) and record some test signals to find out more about KikeG's ideas?
2. To KikeG, 2Bdecided: What kind of test signals would be good to detect intermodulation distortion (and other problems that could be caused by equipment)? Probably sine sweeps and single-click impulses (at different sampling rates) - but maybe something else additionally like combinations of several low and high frequency sine waves (or sweeps)?
Pio2001
Apr 2 2004, 13:56
When you add two sinusoides of different frequencies F1 and F2, if there is some intermodulation between them, then the frequencies F1-F2 and F1+F2 appear in addition.
You can choose 14 kHz and 18 kHz. A new tone should appear at 4 kHz.
The classic intermodulation experiment shows that this happens very easily even in high end gear, but disappears as soon as you play one tone on one speaker and the other tone in the other speaker, letting the frequencies add themselves in the room.
Recording in a loopback configuration might not show it, since most of the distortion happens in the transducter (headphone or speaker). One should use a microphone to detect it.
That's why, when possible, ultrasonics experiments are run with super tweeters amplified separately, and entierely dedicated to the ultrasonic content, so that no intermodulation occurs.
Hi tigre.. I don't have a separate headphone amp, but I guess I can plug line-in to line-out if I don't monitor the signal.. or Pio, should I try to record my headphones with a mic?

Hmmm..
KikeG's ideas.. he's saying that the true 96KHz file actually sounds worse.. I think we speculated that on afterdawn forums, but there's no way of really knowing which sample I think is better because it's subjective. If someone can tell me some tests to do I'll try, but I can't really think myself of where to start.
Well, if the difference frequencies appear mostly in the headphones, then it would be logical to say that either:
-the headphones are reproducing frequencies >29kHz, or
-there isn't any intermodulation distortion caused by >29kHz content.
Pio2001
Apr 6 2004, 04:47
What do you mean ? If the 4 kHz frequency of our exemple is audible only in headphones, it means that only headphones suffer from intermodulation distortion, since the 4 kHz tone is the distortion.
listen
Apr 14 2004, 16:31
Oh. .
No, I was thinking of KikeG's speculation, not your 18-14kHz example..
listen
May 13 2004, 00:30
Well.. I've been busy again, but I see I haven't missed anything
I'm completely unclear what I need to do next.. perhaps my equipment is not suitable.. maybe we shouldn't be listening to hi-res formats at all with todays speakers.
I ran a loopback test, just for the sake of completeness:
CODE
RMAA: M-Audio Revolution, 32bit(float), 96kHz.
Frequency Response, dB: +0.12, -0.06
Noise Level, dBA: -93.9
Dynamic Range, dBA: 91.2
THD, %: 0.0048
Intermodulation, %: 0.016
Stereo Crosstalk, dB: -94.9
I also spent some time listening to Seaside Rendezvous (Queen), and The Thin Line (Queensr˙che). Well, I wasn't very surprised that I couldn't hear any difference between 96 and 44.1kHz, because:
a) they are not exactly great sounding recordings, and
b) I could only get one channel off the dvd (right channel would only give me noise)
But then I thought, well, if my headphones are causing audible problems because of the high frequencies, why can't I notice it here? There certainly is a plethora of high frequencies in these two files.
(Actually I'm suspicious that the top half of the frequencies are just a mirror image of the bottom half, hmm..
) Hearing lovely_1 after these files was like listening to chocolate melting (
dark chocolate)
I was wondering, too.. In case I am just a lucky person, how many trials would be needed for me to have a fairly good chance of getting a total of say twenty 12/12s over the whole test period if I was just guessing.. maybe sshd could tell me?
listen
May 13 2004, 19:11
I thought I should clarify this 'result picking' issue with some things I might have forgotten to mention earlier.
My initial tests took me many hours.. I even spent a whole afternoon on the very first test, going away and coming back, making sure to choose only when I was certain. There is no way I would do this 5 times and choose the best one, it would take days and drive me crazy..
Then there was my filtered files, which I was pleased to find much easier, but turned out to be a waste of time..
Then KikeG's first batch of test files.. Well, I only tried these files once, except for the one I stuffed up. When I realised why I was getting it wrong every time I reset and started again. One or two of the files were very easy, but in total I spent several hours one evening working through these files. Again, I'm really not interested in multiple attempts on this sort of time scale.
And when I said recently that I had a single attempt at the 29kHz low-pass file, I was intending to clarify that result, not imply that the other tests were any different..
Anyway, sorry if I have been a little irritable over this issue. It seems I neglected to mention any of this (without realising), which made it hard for me to understand why there was so much agnosticism over the results.
So, I'm keen to do whatever tests are needed here (some more specific loopback tests?).. Or, if it's all going to be a waste of time, please somebody recommend me a good pair of headphones that are rated up to 48kHz and I can start over.
Pio2001
May 14 2004, 04:18
It's been a long time since I read this thread, but as far as I remember, this is the point. The main issue with high definition formats, after all, are the speakers / headphones frequency response.
The next time someone tries to sell me a DVD-A or an SACD player, I'll ask if the 100 kHz super speakers come with it.
listen
May 20 2004, 05:50
So, quite sincerely, if my Sennheiser's are not suitable for this test, what headphones should I buy? Even if all of the above has been a waste of time, I would prefer to prove myself wrong than to just forget about it...
-listen
Pio2001
May 20 2004, 09:09
I don't know, but for me, it was not a waste of time. It lead to interesting discussions. And now, we know better the pitfalls that appear in these kind of listening tests. It is interesting to note that
Oohashi's experiment carefully avoided all these pitfalls (Physically double blind test, bi-amplification, listening tests with the ultrasonic content alone, microphone recordings of the ultrasonic content at the listening position, same lowpass filter for the lowpassed version and the full version...).
If only it could be repeated by an independant team...
QUOTE(Pio2001 @ May 20 2004, 07:09 AM)
I don't know, but for me, it was not a waste of time. It lead to interesting discussions. And now, we know better the pitfalls that appear in these kind of listening tests. It is interesting to note that
Oohashi's experiment carefully avoided all these pitfalls (Physically double blind test, bi-amplification, listening tests with the ultrasonic content alone, microphone recordings of the ultrasonic content at the listening position, same lowpass filter for the lowpassed version and the full version...).
If only it could be repeated by an independant team...
QUOTE
It is interesting to note that Oohashi's experiment carefully avoided all these pitfalls (Physically double blind test, bi-amplification, listening tests with the ultrasonic content alone, microphone recordings of the ultrasonic content at the listening position, same lowpass filter for the lowpassed version and the full version...).
If only it could be repeated by an independant team...
I am alerting you too the issue of Oohashi's eperiment. First, he never acheived positive results in any standard accepted listening test that are of any signficance. His tests only foiund positive signficance in unconscious brain activity, not an actual audibility test. His test results in ths regard, are still suspect too me.... 2nd, after this paper, NHK labratories performed a controlled listening tests in response:
NHK Laboratories Note No. 486, "Perceptual Discrimination between Musical Sounds
with and without Very High Frequency Components", Toshiyuki Nishiguchi, Kimio Hamasaki, Masakazu Iwaki, and Akio Ando
http://www.nhk.or.jp/strl/publica/labnote/lab486.html 3rd, Oohashi attempts to critisize the original test of reference from 1978 don't seem to be warranted, especially considering the later NHK test. Perhaps you should read the original peer reviwed paper, which still stands as JAES standard:
JAES, "Which Bandwidth Is Necesarry for Optimal Sound Transmission?", G. Plenge, H. Jakubowski, and P. Schone
-Chris
Pio2001
Jun 1 2004, 18:11
Thank you for the link, WmAx, Very interesting.
In Oohashi's paper however, a high significance level was shown not only in the brain activity, but also in the subjective evaluation of sound quality by the subjects, as reported in table 2 of the version linked above.
This alone can't be considered as a scientific proof until the result is confirmed. It is just a "piece of proof". I was not aware that another team had reproduced the experiment and failed to achieve any positive result.
I wonder if the test of Oohashi et al. was flawed, or if there was something more in it that allowed it to succeed.
The first difference is the protocol : similar to ABC/HR in the one that failed, but without ranking the samples, just telling if they are different, and an A-B-A playback followed by a binary quality evaluation in the one that succeeded (soft/hard, balanced / unbalanced etc).
The duration might have been different. In the sucessful test, the samples were always played during 30 seconds. It is not mentionned in the other test.
The material was different too. I'd like to perform a spectrum analysis of a raw gamelan recording (the instrument recorded in the test that succeeded, and that was not present in the one that failed), but with a short analysis window. The overall analysis doesn't show any special high frequency content that would be missing in the other test, but since the gamelan is a percussive instrument (on metal), I wonder if it is possible for the high frequency content to be concentrated during the attacks only. This way, it would be very powerful at some given times, but the average power on the whole sample would not represent faithfully the instant HF level that is present during the attacks.
If it is the case, drawing a spectrogram with shorter analysis windows would show shorter but more powerful HF bursts, as long as the bursts are shorter than the window itself.
I tried with the only CD I have featuring Gamelan (Akira soundtrack, track 4 - Tetsuo), but it didn't show such a behaviour. However, this movie soundtrack is heavily processed, and the Gamelan sound might have been tampered with.
We could say also that a 10 ms analysis window (4096 samples in 44100 Hz) represents best the human hearing, but I don't think that it is a valid argument. Since we study the hypothesis of inaudible sounds possibly intermodulating in the audible range, the process is necessarily nonlinear, and the relevance of this 10 ms window might not stand in these conditions.
QUOTE(Pio2001 @ Jun 1 2004, 04:11 PM)
Thank you for the link, WmAx, Very interesting.
In Oohashi's paper however, a high significance level was shown not only in the brain activity, but also in the subjective evaluation of sound quality by the subjects, as reported in table 2 of the version linked above.
This alone can't be considered as a scientific proof until the result is confirmed. It is just a "piece of proof". I was not aware that another team had reproduced the experiment and failed to achieve any positive result.
I wonder if the test of Oohashi et al. was flawed, or if there was something more in it that allowed it to succeed.
The first difference is the protocol : similar to ABC/HR in the one that failed, but without ranking the samples, just telling if they are different, and an A-B-A playback followed by a binary quality evaluation in the one that succeeded (soft/hard, balanced / unbalanced etc).
The duration might have been different. In the sucessful test, the samples were always played during 30 seconds. It is not mentionned in the other test.
The material was different too. I'd like to perform a spectrum analysis of a raw gamelan recording (the instrument recorded in the test that succeeded, and that was not present in the one that failed), but with a short analysis window. The overall analysis doesn't show any special high frequency content that would be missing in the other test, but since the gamelan is a percussive instrument (on metal), I wonder if it is possible for the high frequency content to be concentrated during the attacks only. This way, it would be very powerful at some given times, but the average power on the whole sample would not represent faithfully the instant HF level that is present during the attacks.
If it is the case, drawing a spectrogram with shorter analysis windows would show shorter but more powerful HF bursts, as long as the bursts are shorter than the window itself.
I tried with the only CD I have featuring Gamelan (Akira soundtrack, track 4 - Tetsuo), but it didn't show such a behaviour. However, this movie soundtrack is heavily processed, and the Gamelan sound might have been tampered with.
We could say also that a 10 ms analysis window (4096 samples in 44100 Hz) represents best the human hearing, but I don't think that it is a valid argument. Since we study the hypothesis of inaudible sounds possibly intermodulating in the audible range, the process is necessarily nonlinear, and the relevance of this 10 ms window might not stand in these conditions.
A signficant issue is that Oohashi was not able to achieve postitive results with LCS compared to baseline. However, he was able to achieve positive results whith FRS vs. HCS. THis is not logical. I can not conclude his results have any validity in this circumstance. I suspect a distortion component to be responsible. Perhaps a compounded IMD, created from the combining acoustic sources(individual speakers)? I realize they used individual speakers in order to reduced IMD compentns. However, the main eefect this has is to prevent IMD that is resultant from transducer non-linearities, or from IMD/doppler artifacts due to the simultanious radiation of brodband directly from the same moving diaphgrahm, since the direct pistonic behaviour will impede/react upon the various frequency pressurations across the distributed band. However, intermodular artifacts are created from two discrete acoustic sources, too. Same as if Bob hums at one frequency and a Jon hums at a slightly different one, audible modulations will result as these pressurized waves combine/react. I wonder if the pre-recorded signals in this case, when re-assembled, created further IMD artifacts as compared to the low passed-only signal. Ooashi brings up the issue of intermodular distortion, but does not proceed to actually attempt to measure the system to explain this illogical result. At least, he did not disclose such an investigation in this paper. If you are aware of a report detailing this specific issue of compounded IMD products, I would like to read it. I may be wrong on this account; but the lack of positive results with LCS only raises more questions. If the high frequency content is directly exciting ANYTHING in a human, then why is it when isolated, no positive results were acheivable? What did cuase the positive results when HF was added to the high cut?
Addressing your comment:
"We could say also that a 10 ms analysis window (4096 samples in 44100 Hz) represents best the human hearing, but I don't think that it is a valid argument. Since we study the hypothesis of inaudible sounds possibly intermodulating in the audible range, the process is necessarily nonlinear, and the relevance of this 10 ms window might not stand in these conditions."
IF the standard 44.1khz sample rate represents human auditory range, then how can this be logical? If the original source has audible IMD componentes(I'm sure many do) as a result of inaudible and audible frequency reactions, then the audible components/modulations will still reside within the audible band. These will be recorded faithfully since the artifacts are created before recording. Maybe I did not understand you?
-Chris
This is a "lo-fi" version of our main content. To view the full version with more information, formatting and images, please
click here.