Help - Search - Members - Calendar
Full Version: Document about listening test conduction
Hydrogenaudio Forums > Hydrogenaudio Forum > Listening Tests
rjamorim
Hello, people.

These last few months I have been working on some sort of guide to help newcomers get their ways around listening test conduction. Hopefully it'll help spark interest in people that were still just wondering whether to conduct their own tests or not.

http://www.rarewares.org/rja/ListeningTest.pdf

It's still not "officially released". So, I'd like to ask you guys for suggestions on improvements and corrections, or just general comments on how do you like it.

Thank-you, and I hope you enjoy reading it.

Best regards;

Roberto.
rjamorim
Oops. I'm sorry, the version that was available there is outdated.

If you downloaded it already, please redownload. The current version is the correct one. Thanks.
ff123
QUOTE
Here is a list of places you should consider announcing your test at:
• Hydrogenaudio, of course
• rec.audio.opinion Usenet group
...


I once got reported to my ISP when I announced a new version of abc/hr at rec.audio.opinion, for violating the group's charter.

Announce listening tests there at your peril.

Other random thoughts:

Listener Training
It would be nice to have a section on listener training. In one of my tests, I had prospective listeners download a small training package before the main test started. Perhaps the instructions to acquiring the main test could be embedded in the training package.

sample01
A note on listener psychology: they will tend to download and listen to sample01 first, and then decide whether they want to continue based on their experience on that first sample. I know that's what I do ;-) Ideally, there would be some sort of randomizer which assigns different music to each of the samples dynamically, but that would require some way to sort things out in the end. Barring that, I would try to make sample01 as friendly as possible.

Keep the ball rolling
Try to keep the discussion thread going during the test to keep interest up.

sample durations
The samples should be about the same duration. The idea is that the average bitrate of the sample set depends on the individual sample durations as well as their difficulty -- the longer the sample, the more it affects the overall bitrate.

sample bitrate distribution
For a vbr codec, I think the distribution of bitrates in the small sample set (eg., 20 samples), supposing you draw a histogram of it, should resemble the distribution of a large sample set chosen from a wide variety of music.

ff123
ff123
Eliminate 1st second of encoder output
I seem to remember some codecs having problems during the 1st second or so of the output. The config file should start the sample such that the 1st second is not included in the listening test.
ErikS
QUOTE(ff123 @ Nov 20 2005, 05:06 AM)
Eliminate 1st second of encoder output
I seem to remember some codecs having problems during the 1st second or so of the output.  The config file should start the sample such that the 1st second is not included in the listening test.
*



Well, isn't that a flaw in the encoder which should be allowed to affect the result negatively? A more extreme analogue: Some codecs have problems during sharp attacks. One should take care to eliminate such attacks from all test samples.
rjamorim
Very big thanks for the usual awesome help, ff123 smile.gif
ff123
QUOTE(ErikS @ Nov 19 2005, 07:26 PM)
QUOTE(ff123 @ Nov 20 2005, 05:06 AM)
Eliminate 1st second of encoder output
I seem to remember some codecs having problems during the 1st second or so of the output.  The config file should start the sample such that the 1st second is not included in the listening test.
*



Well, isn't that a flaw in the encoder which should be allowed to affect the result negatively? A more extreme analogue: Some codecs have problems during sharp attacks. One should take care to eliminate such attacks from all test samples.
*



Except that 99.5% of the time, you won't be listening to that 1st second. It's unfair to feature such a fault in every sample.
ff123
sample content
The sample should be as homogenous as reasonably possible, otherwise the listener may have difficulty rating a codec (eg., the first part codec A was better, but in the last part, codec B was better).
ErikS
And now that I read through the whole document, I can only say: well done. I hope it will help to bring forward a successor that will pick up the testing business again.

Just one question: what does the title you write after your name on the front page, PITA, mean? "Pain in the ass" is the first thing that comes to my mind... dry.gif
rjamorim
QUOTE(ErikS @ Nov 20 2005, 01:48 AM)
Just one question: what does the title you write after your name on the front page, PITA, mean? "Pain in the ass" is the first thing that comes to my mind... dry.gif
*


That's precisely it!

I looked at these articles you find around in the web, and most of them have the name followed by fancy acronyms, like "John Doe, PhD". Since I'm no PhD or anything like that, PITA is what comes closest tongue.gif

Obviously, I will remove it from the final version...
dreamliner77
Not PITA for those that know...
Gabriel
1st second removal:
This is not because of problems in codecs, but to allow a realistic behaviour. With real encoding, encoders might be adapting themselves to content. In real tracks, this adaptation can be progressively done as the track is starting, thus the beginning of a extract from a track is not representative of the encoding of this part inside the full length sample.
MaB_fr
Completelly off topic, but i must do it :

QUOTE
rjamorim: At low bitrates nobody is interested,
but the results are easy to obtain
rjamorim: At high bitrates everyone is interested,
but you practically can't obtain usable results
ff123: s/bitrates/beauty and s/results/fucks


This, indeed, is SYSTEMIC ! ;)

For the rest, it's very very good !

Things like the High Bitrate question always seems to me like a paradox...i always thought you have to slip from "how good this sounds" to "how big the file size is"...cause as you said, most people won't notice a difference...

But then it's not listening tests....

MaB_fr
Jan S.
Here are some random thoughts and nitpicking - hopefully some of them are actually useful...
  • Mention bias perhaps - not just placebo.
  • QUOTE
    To counter the claims of the subjectivists, the objectivists created a method to reliably compare two audio signals called ABX.
    Wouldn't this sentence strictly be saying that the audio signals are called ABX and not the method?
  • QUOTE
    Still on the samples subject: avoid the obvious choice of problem
    samples (samples that trip codecs producing very nasty artifacts) like Kalifornia,
    Castanets and IDM stuff because their artifacts are easily detectable, and
    therefore less fatiguing for your listeners.
    Maybe you don't want to mention sample names that only people that have been around for years will know... dunno. Perhaps it just adds to the confusion.
  • QUOTE
    <nostalgia>
    Stuff like this makes it seem unserious IMO. I think the anecdotes should be left out if I understand what you want with this paper...
  • Add an Index perhaps
  • I think a more comprehensive explanation of what you actually do in the ABX would be good.. what buttons do what. How you ABX and rate. That part was very unclear to me.
  • More excel guide! You can't just use the normal x,y-graphs so if you want to make it easy for people you should add steps for the graph creation. I think this is a bigger issue than you suggest in the paper.
  • Where to get programs
  • Link to ITU doc
ff123
QUOTE
To calculate error margins, you must use ff123’s statistical analysis tool
from the command prompt. Run it as:

friedman -tp resultsXX.txt

and it’ll print to screen the analysis done on that results table. If you want
Friedman to save the analysis to a file, use output redirection:

friedman -tp resultsXX.txt > analysisXX.txt\


Since the time you ran your early tests, I made the parametric Tukey's HSD option available on the web-based tool and made it the default:

http://ff123.net/friedman/stats.html
krabapple
The document could still use a bit of proofreading, e.g., I see 'conduce' used where I believe you mean 'conduct'. I'll help you out with that if you like.
NoXFeR
Compliments on a good document!

Suggestion: Place links and references at the end of the document. To doom9, HA, programs' homepages, etc...
rjamorim
Hello.

I would like to apologize for not producing an updated version of this document yet. I am now facing finals and papers, besides working 6 hours per day at Siemens. If that wasn't enough, I am helping Sebastian with his listening test and creating a new site design for LAME.

I guarantee you all your comments are being taken into account, and I hope to be able to release a new Work In Progress soon.

Thank-you very much.

Best regards;

Roberto.
pepoluan
Hey Roberto, any update?

Care to put some in here:

http://wiki.hydrogenaudio.org/index.php?ti...listening_tests

rjamorim
OMG! A new version at least!


QUOTE(ff123 @ Nov 20 2005, 00:01) *

I once got reported to my ISP when I announced a new version of abc/hr at rec.audio.opinion, for violating the group's charter.

Announce listening tests there at your peril.


Added a warning there.

QUOTE
Other random thoughts:

Listener Training
It would be nice to have a section on listener training. In one of my tests, I had prospective listeners download a small training package before the main test started. Perhaps the instructions to acquiring the main test could be embedded in the training package.

sample01
A note on listener psychology: they will tend to download and listen to sample01 first, and then decide whether they want to continue based on their experience on that first sample. I know that's what I do ;-) Ideally, there would be some sort of randomizer which assigns different music to each of the samples dynamically, but that would require some way to sort things out in the end. Barring that, I would try to make sample01 as friendly as possible.

Keep the ball rolling
Try to keep the discussion thread going during the test to keep interest up.

sample durations
The samples should be about the same duration. The idea is that the average bitrate of the sample set depends on the individual sample durations as well as their difficulty -- the longer the sample, the more it affects the overall bitrate.

sample bitrate distribution
For a vbr codec, I think the distribution of bitrates in the small sample set (eg., 20 samples), supposing you draw a histogram of it, should resemble the distribution of a large sample set chosen from a wide variety of music.

ff123


Added all of these. Thank-you very much!

QUOTE(Gabriel @ Nov 20 2005, 07:20) *

1st second removal:
This is not because of problems in codecs, but to allow a realistic behaviour. With real encoding, encoders might be adapting themselves to content. In real tracks, this adaptation can be progressively done as the track is starting, thus the beginning of a extract from a track is not representative of the encoding of this part inside the full length sample.


Added it. thanks!

QUOTE(Jan S. @ Nov 20 2005, 13:29) *

Here are some random thoughts and nitpicking - hopefully some of them are actually useful...
Mention bias perhaps - not just placebo.


Done

QUOTE
Wouldn't this sentence strictly be saying that the audio signals are called ABX and not the method?


Good point! I removed the ambiguity.

QUOTE
Maybe you don't want to mention sample names that only people that have been around for years will know... dunno. Perhaps it just adds to the confusion.

Done

QUOTE
Stuff like this makes it seem unserious IMO. I think the anecdotes should be left out if I understand what you want with this paper...

Bummer tongue.gif

OK, removed it smile.gif

QUOTE
Add an Index perhaps

Done

QUOTE
I think a more comprehensive explanation of what you actually do in the ABX would be good.. what buttons do what. How you ABX and rate. That part was very unclear to me.


Well, I think that part belongs more in the listener training part. Remember, that document is for test conductors, not test participants.

QUOTE
More excel guide! You can't just use the normal x,y-graphs so if you want to make it easy for people you should add steps for the graph creation. I think this is a bigger issue than you suggest in the paper.

Augh. Maybe later. Guiding step-by-step in Excel is quite the pain :B

QUOTE
Where to get programs

Done (most of it)

QUOTE
Link to ITU doc

Done

Thank-you very much for all your suggestions, Jan!

Moving on...

QUOTE(krabapple @ Nov 20 2005, 19:43) *
The document could still use a bit of proofreading, e.g., I see 'conduce' used where I believe you mean 'conduct'. I'll help you out with that if you like.


Yes, please! All feedback related to grammar (and everything else, really) is welcome!

QUOTE(NoXFeR @ Nov 21 2005, 22:03) *
Suggestion: Place links and references at the end of the document. To doom9, HA, programs' homepages, etc...


Added them as footer notes.

QUOTE(pepoluan @ Dec 8 2006, 08:38) *

Hey Roberto, any update?

Care to put some in here:

http://wiki.hydrogenaudio.org/index.php?ti...listening_tests


To be quite honest, I'm not too fond of the idea of wikifying it. I want to have responsability and authorship on this document, so that people can easily come to me if they need help. If wikifyed, both responsability and authorship get diluted...


Anyway, the new version is already uploaded, at the same location. Please download, read and send in your comments!

I promise I'll try to respond to the comments faster this time tongue.gif
rjamorim
New version up, fixed several small errors spotted by Sebastian Mares.
Gabriel
Perhaps I should document the GnuPlot way to produce graphs (way easier than with Excel or OOorg...
rjamorim
QUOTE(Gabriel @ Dec 9 2006, 12:57) *
Perhaps I should document the GnuPlot way to produce graphs (way easier than with Excel or OOorg...


I would be very grateful smile.gif
ff123
About running friedman.exe

"friedman -tp" which selects Tukey's parametric analysis is statistically more "proper" than "friedman -a" which selects the Anova analysis with a Fischer LSD. The former corrects for multiple codec comparisons, while the latter does not.

I've also made the Tukey's HSD the default analysis on my web page.

ff123
rjamorim
QUOTE(ff123 @ Dec 10 2006, 02:08) *
About running friedman.exe

"friedman -tp" which selects Tukey's parametric analysis is statistically more "proper" than "friedman -a" which selects the Anova analysis with a Fischer LSD. The former corrects for multiple codec comparisons, while the latter does not.

I've also made the Tukey's HSD the default analysis on my web page.

ff123


Ah, thanks. Fixed that on the document.

I already read parts of the comments on friedman.c to try to figure out what's the difference between all those modes, but because of the considerable amount of statistical terms (and I don't know much about statistics), I couldn't understand everything.
This is a "lo-fi" version of our main content. To view the full version with more information, formatting and images, please click here.
Invision Power Board © 2001-2008 Invision Power Services, Inc.