IPB

Welcome Guest ( Log In | Register )

3 Pages V  < 1 2 3  
Reply to this topicStart new topic
Are my ears broken?
greynol
post Mar 14 2010, 22:23
Post #51





Group: Super Moderator
Posts: 10000
Joined: 1-April 04
From: San Francisco
Member No.: 13167



QUOTE (solive @ Mar 14 2010, 13:40) *
So what criteria do you use in defining the performance of a standard playback system for testing codecs?

That's a good question, one for which I don't have an answer. I believe people should conduct DBTs on their own equipment set up in the most typical fashion for them. While I think public listening tests are useful and can provide a general baseline for people, I put more emphasis on personal listening tests. In the case of public listening tests, I think there should be some type of control over playback hardware and environments, but I don't think it's a make or break situation. If people really want to know what codec and settings to use, they should perform their own personal tests rather than rely on results from public tests.

My initial point to which you objected was to address the audiophile myth that audiophile grade components are required to most easily distinguish lossless from lossy. The people who perpetuate this myth are usually the very same people who think they can tell night and day differences between lossy and lossless but have never conducted a well-controlled double blind test.


--------------------
Your eyes cannot hear.
Go to the top of the page
+Quote Post
solive
post Mar 15 2010, 00:02
Post #52





Group: Members
Posts: 162
Joined: 21-February 04
From: Los Angeles
Member No.: 12173



QUOTE (greynol @ Mar 14 2010, 14:23) *
QUOTE (solive @ Mar 14 2010, 13:40) *
So what criteria do you use in defining the performance of a standard playback system for testing codecs?

That's a good question, one for which I don't have an answer. I believe people should conduct DBTs on their own equipment set up in the most typical fashion for them. While I think public listening tests are useful and can provide a general baseline for people, I put more emphasis on personal listening tests. In the case of public listening tests, I think there should be some type of control over playback hardware and environments, but I don't think it's a make or break situation. If people really want to know what codec and settings to use, they should perform their own personal tests rather than rely on results from public tests.

My initial point to which you objected was to address the audiophile myth that audiophile grade components are required to most easily distinguish lossless from lossy. The people who perpetuate this myth are usually the very same people who think they can tell night and day differences between lossy and lossless but have never conducted a well-controlled double blind test.


Thanks. I don't disagree with you regarding the myth of needing audiophile grade component to hear codec artifacts. The problem I had with the term "audiophile component" is that as defined it's too vague and meaningless in terms of implied performance. My experience is that the term audiophile is often misused, and only implies "high price tag" which is not necessarily a good indicator of the component's sound quality and reliability.

I think we agree that frequency response aberrations, distortion\noise, room reflections, and listener training/hearing/aptitude are all nuisance variables that can influence the results of codec listening tests. High priced audio components don't necessarily help control these variables, and may in fact contribute to the problem.

If these nuisance variables are not well-controlled in public tests, then you may expect to get different results from different sites/people, and it may be difficult to reach consensus about the quality of the codec.


Cheers
Sean
Audio Musings

This post has been edited by solive: Mar 15 2010, 00:05


--------------------
Sean Olive
[url="http://seanolive.com"]Audio Musings[/url]
Go to the top of the page
+Quote Post
Notat
post Mar 16 2010, 01:04
Post #53





Group: Members
Posts: 581
Joined: 17-August 09
Member No.: 72373



QUOTE (solive @ Mar 14 2010, 17:02) *
If these nuisance variables are not well-controlled in public tests, then you may expect to get different results from different sites/people, and it may be difficult to reach consensus about the quality of the codec.

I hope by consensus you mean that one skilled listener can overrule your more mediocre participants. At higher bit rates you should expect results to diverge as people reach the limits of their hearing abilities at different points.
Go to the top of the page
+Quote Post
solive
post Mar 16 2010, 05:37
Post #54





Group: Members
Posts: 162
Joined: 21-February 04
From: Los Angeles
Member No.: 12173



QUOTE (Notat @ Mar 15 2010, 17:04) *
QUOTE (solive @ Mar 14 2010, 17:02) *
If these nuisance variables are not well-controlled in public tests, then you may expect to get different results from different sites/people, and it may be difficult to reach consensus about the quality of the codec.

I hope by consensus you mean that one skilled listener can overrule your more mediocre participants. At higher bit rates you should expect results to diverge as people reach the limits of their hearing abilities at different points.


By consensus, I meant that you should get better agreement in test results (based on the number of ABX positive responses, similar MOS or MUSHRA ratings) from different sites\listeners when all variables are well controlled.

If the nuisance variables are not well controlled, you would expect to get less agreement and more noise in the test results.

Cheers
Sean
Audio Musings

This post has been edited by solive: Mar 16 2010, 05:39


--------------------
Sean Olive
[url="http://seanolive.com"]Audio Musings[/url]
Go to the top of the page
+Quote Post
Notat
post Mar 16 2010, 15:45
Post #55





Group: Members
Posts: 581
Joined: 17-August 09
Member No.: 72373



QUOTE (solive @ Mar 15 2010, 22:37) *
By consensus, I meant that you should get better agreement in test results (based on the number of ABX positive responses, similar MOS or MUSHRA ratings) from different sites\listeners when all variables are well controlled.

If the nuisance variables are not well controlled, you would expect to get less agreement and more noise in the test results.

There are other things that can cause noise in test results. Don't you also expect to get more noise in the results as you approach transparency? For a codec that is nearly transparent, you'll have some listers that can reliably distinguish the difference and some that cannot. If you've removed every nuisance variable you can think of you can assume that noise is not due to those variables. You don't definitively know what variables you have not eliminated. When someone fails to distinguish a difference, you don't know if that is due to a limitation in the test or a limitation of the listener. When someone is able to reliably distinguish, in a properly calibrated and executed ABX, you know that it is due to the codec. In your analysis of results, you need to ignore the noise of those who could not distinguish and concentrate on those who could. This is not a consensus process as you've described it. Building consensus typically involves persuasion. We know that listeners are easily persuaded. We don't want to persuade anyone. We want to know what they're hearing.

If you're testing for transparency, you might actually want to leave some of the "nuisance variables" in as they make or break distinguishibility for some listeners. Some examples: You will most likely degrade listening performance if you require subjects to listen on headphones, or prohibit them from adjusting their listening position to their liking. Audiophiles and recording engineers will tell you that they can't do their best listening on an unfamiliar system. There is some evidence indicating that acoustic reflections in the listening space enhances audibility of timebase errors.

So, in answer to a question you asked earlier, I'm going with the gist of greynol's answer: You want to do codec tests with a variety of listeners in a variety of realistic listening scenarios. Give your listeners latitude to attack the challenge creatively. Results may look noisier, and you'll have more work to do in back-end analysis, but you're casting a wider net and that's what you need to do to at this point to advance the art.
Go to the top of the page
+Quote Post
greynol
post Mar 16 2010, 18:31
Post #56





Group: Super Moderator
Posts: 10000
Joined: 1-April 04
From: San Francisco
Member No.: 13167



My point was that for any given individual, personal tests should take precedence over public tests and that those personal tests be conducted with the equipment and levels (volume, eq) that the listener typically uses. I do agree with all that you've said, though.


--------------------
Your eyes cannot hear.
Go to the top of the page
+Quote Post
solive
post Mar 17 2010, 07:17
Post #57





Group: Members
Posts: 162
Joined: 21-February 04
From: Los Angeles
Member No.: 12173



QUOTE
There are other things that can cause noise in test results. Don't you also expect to get more noise in the results as you approach transparency?...

Yes, agreed.
QUOTE
This is not a consensus process as you've described it. Building consensus typically involves persuasion. We know that listeners are easily persuaded. We don't want to persuade anyone. We want to know what they're hearing.


I'm not talking about persuading listeners but rather convincing ourselves (i.e. scientists) that the results from the CODEC tests are valid and can be explained by differences in the CODECS and not some uncontrolled variable.

QUOTE
If you're testing for transparency, you might actually want to leave some of the "nuisance variables" in as they make or break distinguishibility for some listeners. Some examples: You will most likely degrade listening performance if you require subjects to listen on headphones, or prohibit them from adjusting their listening position to their liking. Audiophiles and recording engineers will tell you that they can't do their best listening on an unfamiliar system. There is some evidence indicating that acoustic reflections in the listening space enhances audibility of timebase errors.


I agree with you 100% that we want to know what the listeners hearing perceptually, but let's make sure that the signals delivered to their ears are physically well-defined and controlled so we can a) learn something about the psychoacoustics of the codecs and b) ensure the experiment is repeatable.

QUOTE
So, in answer to a question you asked earlier, I'm going with the gist of greynol's answer: You want to do codec tests with a variety of listeners in a variety of realistic listening scenarios. Give your listeners latitude to attack the challenge creatively. Results may look noisier, and you'll have more work to do in back-end analysis, but you're casting a wider net and that's what you need to do to at this point to advance the art.


Again, I have no problem with this (note: I do similar experiments to study psychoacoustic interactions between different loudspeakers, room acoustics, trained vs naive listeners) as long as this is part of the experimental design and analysis. Otherwise you are missing a great opportunity to better understand how these variables (trained vs untrained listeners, loudspeakers versus headphones, room acoustics, different program,etc] influence the perception of CODECS. This is my main point.

If you are already doing this in the public tests, then please forgive me for stating the obvious.

Cheers
Sean
Audio Musings

This post has been edited by solive: Mar 17 2010, 07:25


--------------------
Sean Olive
[url="http://seanolive.com"]Audio Musings[/url]
Go to the top of the page
+Quote Post
TapeHissOrchestr...
post Mar 17 2010, 08:25
Post #58





Group: Members
Posts: 2
Joined: 16-March 10
Member No.: 79044



So I just did the audiocheck.net frequency test that was supplied earlier in this thread and I can hear the sweep right from the beginning, when it's at 22 khz. Also, as it goes down, it seems higher and louder to me and actually hurts my ears. I can see why it would hurt my ears ascending because as the frequency moves more into my audible range, it would be louder I think? How is this possible though? I'm aware that nobody can hear up to 22k and especially not me considering I did a test the other day with a pure sine wave and could only hear up to 16.5. What am I doing wrong? Sorry if I should've started another thread, this is my first time posting on this site btw.
Go to the top of the page
+Quote Post
TapeHissOrchestr...
post Mar 17 2010, 08:27
Post #59





Group: Members
Posts: 2
Joined: 16-March 10
Member No.: 79044



QUOTE (TapeHissOrchestra @ Mar 17 2010, 08:25) *
So I just did the audiocheck.net frequency test that was supplied earlier in this thread and I can hear the sweep right from the beginning, when it's at 22 khz. Also, as it goes down, it seems higher and louder to me and actually hurts my ears. I can see why it would hurt my ears ascending because as the frequency moves more into my audible range, it would be louder I think? How is this possible though? I'm aware that nobody can hear up to 22k and especially not me considering I did a test the other day with a pure sine wave and could only hear up to 16.5. What am I doing wrong? Sorry if I should've started another thread, this is my first time posting on this site btw.


Okay, and now I can only hear up to 19. This is very confusing!

Edit : Nevermind, I just replicated what I heard the first time at 22 k with the Aliasing test. Looks like I need a new soundcard.

This post has been edited by TapeHissOrchestra: Mar 17 2010, 08:36
Go to the top of the page
+Quote Post
Notat
post Mar 17 2010, 15:39
Post #60





Group: Members
Posts: 581
Joined: 17-August 09
Member No.: 72373



QUOTE (solive @ Mar 17 2010, 00:17) *
I'm not talking about persuading listeners but rather convincing ourselves (i.e. scientists) that the results from the CODEC tests are valid and can be explained by differences in the CODECS and not some uncontrolled variable.

Isn't this handled by the ABX? You can let listeners introduce any variables they like. Then they sit down and do the testing and any uncontrolled variable affects both A and B equally.

You would definitely want to tighten things if you need to move from determining whether there's a transparency problem to determining why there's a problem.

And, no, I do not do public testing. I don't consider my listening skills to be extraordinary. But, it is an interesting topic for me. Understanding the state of the art in listening motivates realistic design goals for systems I work on.
Go to the top of the page
+Quote Post
pdq
post Mar 17 2010, 15:58
Post #61





Group: Members
Posts: 3304
Joined: 1-September 05
From: SE Pennsylvania
Member No.: 24233



QUOTE (Notat @ Mar 17 2010, 10:39) *
Isn't this handled by the ABX? You can let listeners introduce any variables they like. Then they sit down and do the testing and any uncontrolled variable affects both A and B equally.

"Affects both A and B" can be quite different from "affects both A and B equally". It may be that a particular setup emphasizes (or suppresses) one kind of artifact more than another, for example.
Go to the top of the page
+Quote Post
Notat
post Mar 17 2010, 16:46
Post #62





Group: Members
Posts: 581
Joined: 17-August 09
Member No.: 72373



I would say that such transforms are a valid part of the test as long as it remains a realistic listening environment. It is not cheating to test the envelope of how people listen. Some people do crank the tone controls. Only spoilers listen to L-R.
Go to the top of the page
+Quote Post
greynol
post Mar 17 2010, 22:23
Post #63





Group: Super Moderator
Posts: 10000
Joined: 1-April 04
From: San Francisco
Member No.: 13167



QUOTE (Notat @ Mar 17 2010, 07:39) *
And, no, I do not do public testing. I don't consider my listening skills to be extraordinary. But, it is an interesting topic for me. Understanding the state of the art in listening motivates realistic design goals for systems I work on.

I think Sean might mean private tests conducted during development, in which case he makes a very excellent point.

EDIT: I don't mean to give the wrong impression here, his points are excellent regardless.

This post has been edited by greynol: Mar 17 2010, 22:24


--------------------
Your eyes cannot hear.
Go to the top of the page
+Quote Post
solive
post Mar 17 2010, 22:47
Post #64





Group: Members
Posts: 162
Joined: 21-February 04
From: Los Angeles
Member No.: 12173



QUOTE (Notat @ Mar 17 2010, 07:39) *
QUOTE (solive @ Mar 17 2010, 00:17) *
I'm not talking about persuading listeners but rather convincing ourselves (i.e. scientists) that the results from the CODEC tests are valid and can be explained by differences in the CODECS and not some uncontrolled variable.

Isn't this handled by the ABX? You can let listeners introduce any variables they like. Then they sit down and do the testing and any uncontrolled variable affects both A and B equally.

You would definitely want to tighten things if you need to move from determining whether there's a transparency problem to determining why there's a problem.

And, no, I do not do public testing. I don't consider my listening skills to be extraordinary. But, it is an interesting topic for me. Understanding the state of the art in listening motivates realistic design goals for systems I work on.


Of course, I agree with you - that for any given ABX test setup, the nuisance variables are being held constant for both A and B.

Sorry, if I wasn't clear: my concern about controlling nuisance variable arises from pooling the results of public codec tests (using either ABC, ABC or MUSHRA methods) that are being conducted at multiple sites using different playback setups and listeners of unknown quality (e.g hearing, training, ability). When you start pooling the results together from these different tests, unless these different playback setups are well-defined and accounted for in the design and statistical analysis you risk getting increased, systematic errors/biases and possibly come to erroneous conclusions based on invalid results.


Cheers
Sean
Audio Musings


--------------------
Sean Olive
[url="http://seanolive.com"]Audio Musings[/url]
Go to the top of the page
+Quote Post
solive
post Mar 17 2010, 22:59
Post #65





Group: Members
Posts: 162
Joined: 21-February 04
From: Los Angeles
Member No.: 12173



QUOTE (greynol @ Mar 17 2010, 14:23) *
QUOTE (Notat @ Mar 17 2010, 07:39) *
And, no, I do not do public testing. I don't consider my listening skills to be extraordinary. But, it is an interesting topic for me. Understanding the state of the art in listening motivates realistic design goals for systems I work on.

I think Sean might mean private tests conducted during development, in which case he makes a very excellent point.

EDIT: I don't mean to give the wrong impression here, his points are excellent regardless.


Yes, I guess that's what I mean -- a private test - like the kind used for product development and validation, and the sorts of CODEC tests that are published by standards groups like ITU where they conduct CODEC tests at different sites throughout the world using a standard setup.

I'm quite ignorant about how Hydrogen audio organizes public listening tests, what their purpose is, and how they use the results. Is this strictly for hobbyists, or do companies actually use the results to help tweak the performance of their CODECS?

Perhaps you've published a document on recommended practices for public CODEC tests somewhere on HA that answers all these questions?

Thanks!

Cheers
Sean


--------------------
Sean Olive
[url="http://seanolive.com"]Audio Musings[/url]
Go to the top of the page
+Quote Post
X-Fi6
post Mar 31 2010, 05:46
Post #66





Group: Members
Posts: 13
Joined: 8-October 09
Member No.: 73798



QUOTE (kornchild2002 @ Dec 16 2009, 22:59) *
You really should conduct blind ABX tests though as sighted tests are flawed by the placebo affect.
Though you're correct, it's not all that hard to identify what codec was used by listening to the artifacts, at least at lower bitrates.

This post has been edited by X-Fi6: Mar 31 2010, 05:47


--------------------
Mixing audio perfectly doesn't take more than onboard.
Go to the top of the page
+Quote Post
pdq
post Mar 31 2010, 11:30
Post #67





Group: Members
Posts: 3304
Joined: 1-September 05
From: SE Pennsylvania
Member No.: 24233



QUOTE (X-Fi6 @ Mar 31 2010, 00:46) *
QUOTE (kornchild2002 @ Dec 16 2009, 22:59) *
You really should conduct blind ABX tests though as sighted tests are flawed by the placebo affect.
Though you're correct, it's not all that hard to identify what codec was used by listening to the artifacts, at least at lower bitrates.

This is not an issue in ABX tests, since the only thing being tested is whether the compressed file can be distinguished from the original.

In ABC/HR testing OTOH there certainly can be a bias based on recognising the codec, but then ABC/HR is asking for people's preferences, which can extend to factors beyond simple audible quality.
Go to the top of the page
+Quote Post

3 Pages V  < 1 2 3
Reply to this topicStart new topic
1 User(s) are reading this topic (1 Guests and 0 Anonymous Users)
0 Members:

 



RSS Lo-Fi Version Time is now: 17th April 2014 - 12:08