double blind testing...
Posted by: ken c on 13 November 2003
well, i just thought that the issue of dbt is interesting enough in its own right not to be polluted by mains issues.
to kick this off, for my own selfish reasons really:
(a) what IS double blind testing?
(b) has anyone got any documented evidence of where it has been used successfully and to draw important conclusions in the area of Hifi and music? what was the objective and what were the conclusions? please point me to a reference...
(c) has anyone got any documented evidence of where it has been used successfully and to draw important conclusions in other areas? again, please provide references.
(d) just to nail down things some more. if i wanted to use DBT to see if there is any difference between a live band and a recording, how would such a test be conducted? would i have to make sure that the live band played behind a screen? what other DBT type controls would i put in place?
(e) does anyone know if manufactures of hifi use this method to confirm actual differences between major designs? after all they have a vested interest in knowing that differences are really there and better still, that these differences are "improvements"...
well, i guess that will do for starters...
enjoy
ken
to kick this off, for my own selfish reasons really:
(a) what IS double blind testing?
(b) has anyone got any documented evidence of where it has been used successfully and to draw important conclusions in the area of Hifi and music? what was the objective and what were the conclusions? please point me to a reference...
(c) has anyone got any documented evidence of where it has been used successfully and to draw important conclusions in other areas? again, please provide references.
(d) just to nail down things some more. if i wanted to use DBT to see if there is any difference between a live band and a recording, how would such a test be conducted? would i have to make sure that the live band played behind a screen? what other DBT type controls would i put in place?
(e) does anyone know if manufactures of hifi use this method to confirm actual differences between major designs? after all they have a vested interest in knowing that differences are really there and better still, that these differences are "improvements"...
well, i guess that will do for starters...
enjoy
ken
Posted on: 13 November 2003 by Mekon
(a)A test where the experimenter and participant are naive to the manipulation that is taking place.
(b) Markus Sauer has a contact who's PhD used double blind testing, IIRC, he gave me a link to a Stereophile article he wrote that mentioned it, which if you dig around, might turn up. In the mean time, this is interesting.
(c) I use a double blind procedure when conducting my health interventions. Sadly I can't give you a reference, as it is currently under review. Fingers crossed, I should hear back this month. Anyway, any experimental design in psychology should use it. Off the top of my head, The original Festinger and Carlsmith (Festinger, L. & Carlsmith, J. (1959) Cognitive Consequences of Forced Compliance. Journal of Abnormal and Social Psychology, Vol. 58, pp. 203–210) work on cognitive dissonance is a nice lab-based study where both the experimenter and participants were blind to which condition they were in.
(d) Just ensure that ceteris paribus assumptions were adhered to, i.e absolutely everything that could conceivably effect the results should be equal between conditions (e.g. temperature, SPL, etc), other than the manipulation you are interested in.
(e) Dunno, but from what I've heard from Richard Dane, Naim do on cables, at least.
(b) Markus Sauer has a contact who's PhD used double blind testing, IIRC, he gave me a link to a Stereophile article he wrote that mentioned it, which if you dig around, might turn up. In the mean time, this is interesting.
(c) I use a double blind procedure when conducting my health interventions. Sadly I can't give you a reference, as it is currently under review. Fingers crossed, I should hear back this month. Anyway, any experimental design in psychology should use it. Off the top of my head, The original Festinger and Carlsmith (Festinger, L. & Carlsmith, J. (1959) Cognitive Consequences of Forced Compliance. Journal of Abnormal and Social Psychology, Vol. 58, pp. 203–210) work on cognitive dissonance is a nice lab-based study where both the experimenter and participants were blind to which condition they were in.
(d) Just ensure that ceteris paribus assumptions were adhered to, i.e absolutely everything that could conceivably effect the results should be equal between conditions (e.g. temperature, SPL, etc), other than the manipulation you are interested in.
(e) Dunno, but from what I've heard from Richard Dane, Naim do on cables, at least.
Posted on: 13 November 2003 by ken c
mekon, many thanks for the response and the references...
enjoy
ken
enjoy
ken
Posted on: 13 November 2003 by Mekon
You need to be more imaginative, Ross. For instance, evaluation periods aren't fixed to how long a participant can be expected to stay in a lab, are they? With some predictive studies and a bit of pilot testing, it shouldn't be hard to come up with some measures that are sensitive to differences that predict long-term satisfaction.
Posted on: 13 November 2003 by Mekon
There isn't even a research question on the table, so it is a bit premature coming up with a methodology.
It looks like you are thinking in terms of a repeated measures design, which could present some problems, but none that are insurmountable. However, my initial inclination given the sort of question I'd anticipate (as well as for reasons of validity) would be for a between groups design, with measures taken over a number of timepoints, but obviously it would depend on the research question.
It looks like you are thinking in terms of a repeated measures design, which could present some problems, but none that are insurmountable. However, my initial inclination given the sort of question I'd anticipate (as well as for reasons of validity) would be for a between groups design, with measures taken over a number of timepoints, but obviously it would depend on the research question.
Posted on: 13 November 2003 by matthewr
Ross,
I would make the following points.
1. Most advocates of what we might call "controvertial" effects do not say these are long term subtle difficult to detect effects. Usually its quite the opposite and they are very keen to tell you how amazingly large and dramatically obvious such differences are. There are enough people with this view to mean you can't really get away, IMHO, with this idea ABX has nothing to offer as the effects are so obvious. Its a bit like people who advocate Herbal medicines claim there is no point in trialing them as its obvious that lots of people have had their depression cured with St John's Wort.
2. Double Blind doesn't mandate swapping of component A and B in the manner you imply. Rather its a procedural thing for reasons of experimental rigor and elimiantion of bias and error. So clinical trails are double blind but you only give half the group the drug (ie. its not like you do a month on potential cure and then a month on placebo). Such experimental designs would be ideal for investigating hi-fi issues if you, say, designed a "Musical Enjoyment/Appreciation" questionnaire to detemermine the outcome. Such things are obviously prohibitively expensive for an area as realtively obscure and unimportant as high end hi-fi.
3. If the effects you describe are subtle and long term it raises questions about VFM and how most people choose them in a 2 hour session at the dealers or perhaps a home dem over the weekend. Surely the world would be full of people rejecting cable upgrades if ABX testing were not at least partially effective?
Matthew
I would make the following points.
1. Most advocates of what we might call "controvertial" effects do not say these are long term subtle difficult to detect effects. Usually its quite the opposite and they are very keen to tell you how amazingly large and dramatically obvious such differences are. There are enough people with this view to mean you can't really get away, IMHO, with this idea ABX has nothing to offer as the effects are so obvious. Its a bit like people who advocate Herbal medicines claim there is no point in trialing them as its obvious that lots of people have had their depression cured with St John's Wort.
2. Double Blind doesn't mandate swapping of component A and B in the manner you imply. Rather its a procedural thing for reasons of experimental rigor and elimiantion of bias and error. So clinical trails are double blind but you only give half the group the drug (ie. its not like you do a month on potential cure and then a month on placebo). Such experimental designs would be ideal for investigating hi-fi issues if you, say, designed a "Musical Enjoyment/Appreciation" questionnaire to detemermine the outcome. Such things are obviously prohibitively expensive for an area as realtively obscure and unimportant as high end hi-fi.
3. If the effects you describe are subtle and long term it raises questions about VFM and how most people choose them in a 2 hour session at the dealers or perhaps a home dem over the weekend. Surely the world would be full of people rejecting cable upgrades if ABX testing were not at least partially effective?
Matthew
Posted on: 13 November 2003 by Mekon
You can test whatever measures and model is arrived at in a regression analysis, given suitable outcome measures. I suggest there are ways of operationalising 'satifaction' that can be measured objectively, such as non-attendance of dealer's open days, or a music to hifi spending ratio
.
Not that I have any great problem with decent self-report measures, given their proven potential for accounting for behaviour.
Also, just because the fact that ANOVA was 'designed' to investigate agriculture, is it's application limited to field trials? The history of research tools doesn't have to have impact upon the ecological validity of a design it is used in.
Not that I have any great problem with decent self-report measures, given their proven potential for accounting for behaviour.
Also, just because the fact that ANOVA was 'designed' to investigate agriculture, is it's application limited to field trials? The history of research tools doesn't have to have impact upon the ecological validity of a design it is used in.
Posted on: 13 November 2003 by ken c
phew..., all very interesting.
lets do a thought experiemnt that involves this question i asked earlier.
(d) just to nail down things some more. if i wanted to use DBT to see if there is any difference between a live band and a recording, how would such a test be conducted? would i have to make sure that the live band played behind a screen? what other DBT type controls would i put in place?
another question -- has any of you folks ever been involved in a DBT? if so, let us know how it went and what you felt about the whole exercise.
also what would be wrong with a blind (i.e. SBT -- single blind testing) in the context of hifi?
enjoy
ken
lets do a thought experiemnt that involves this question i asked earlier.
(d) just to nail down things some more. if i wanted to use DBT to see if there is any difference between a live band and a recording, how would such a test be conducted? would i have to make sure that the live band played behind a screen? what other DBT type controls would i put in place?
another question -- has any of you folks ever been involved in a DBT? if so, let us know how it went and what you felt about the whole exercise.
also what would be wrong with a blind (i.e. SBT -- single blind testing) in the context of hifi?
enjoy
ken
Posted on: 13 November 2003 by Mekon
single blind is open to experiment effects (e.g. unintentionally being more meticulous in setting up the system you think is better) and demand characteristics (e.g dealer dem foot tapping).
Posted on: 13 November 2003 by syd
quote:
Originally posted by AndrewThomas:
[QUOTE]Originally posted by ken c:
another question -- has any of you folks ever been involved in a DBT? if so, let us know how it went and what you felt about the whole exercise.
Let me answer that in a different way: let's suppose that you have faith in the idea that there is a difference between cables/stands/spurs/etc and that you have a social circle of like minded people who "understand". Now, imagine being tested and presented with a certificate that says "After proper testing XXX cannot tell the difference between cables". Well, you can see the problem right? It would be humiliating to find out that you can't tell the difference, after beleiving you can and perhaps preaching the Gospel.. My God what about the ridicule of my freinds.
For instance, I posted a comment on this forum that I couldn't tell the difference between a $20 Ikea table and $900 of Hutter to be told by a _naim dealer_ that I was deaf.. Now, I don't take this forum seriously, so the "abuse" didn't bother me in the least.
Other people have more to loose.
My point is, some double-blind tests can have a dramatic impact on your beliefs (or your belief in your ability) which you might not want to discover. Other similar tests have no associated psychological cost and people would be more willing to participate..
Andrew
You did'nt answer the question in a different way. You did'nt answer it at all. Have you taken part in DBTs or not.
BTW I haven't taken part but I would like to.
Yours in Music
Syd
Posted on: 13 November 2003 by TomK
Double blind tests are obviously a normal part of the scientific world and are probably quite valid when we're talking about entry level hifi - yes I can consistently hear more base from this source etc. However as many have already said higher end audio requires more subtle criteria. My best test when I've made a major change is "what time do I get to bed when I have a listening session". The answer for me, after all the changes I've made this year, is "hours later than before". When I play a CD now I have to be prepared to get to bed late. Some stuff I can quantify, e.g. I don't remember hearing that third rhythm guitar etc but it's mostly a cumulative effect which keeps me stuck to the couch, mesmerised by the new sounds I'm hearing. New layers in Pet Sounds, instruments I'd not picked up before from the Beatles, Bruce Springsteen's voice now sounding completely unstrained etc. Hope this makes sense as the hifi has kept me up late!
Posted on: 13 November 2003 by syd
quote:
Originally posted by AndrewThomas:quote:
Originally posted by syd:
Have you taken part in DBTs or not.
Yes. It's ironic that one of the tests was on "long term" vs. "short term" memory which Andrew Weekes was building his new theory of audio cognition on.Unfortunately, I can't say too much about the tests I've been involved in because they were paid for by my employer (it's why I didn't answer the question directly in the first place).
That's a pity as you are probably the one person who goes on about them the most on this forum but can't or won't tell us the results. How much can you tell us then. Was music in any shape or form included in it, anything to do with sound or was it just pictures or objects. It would be fascinating to know about the methodology, the numbers of people involved etc.
Yours in Music
Syd
Posted on: 13 November 2003 by Steve Toy
quote:
1. Most advocates of what we might call "controvertial" effects do not say these are long term subtle difficult to detect effects. Usually its quite the opposite and they are very keen to tell you how amazingly large and dramatically obvious such differences are.
The hyperbole/exaggerated claims used to describe perceived ABX differences is often more pronounced during any kind of blind testing than with hands-on testing for the simple reason that with a blind test, the listener is going to feel more than slightly apprehensive about not being able to reach the "right" conclusion without the aid of their eyes. So when distinguishing between A and B turns out not to be that difficult, the emotional satisfaction derived from this successful distinction tends to predispose the listener to exaggeration.
Regards,
Steve.
Posted on: 13 November 2003 by Steve Toy
The answer to the problem I outlined above regarding exaggerated claims by blind listeners would be to let them have a hands-on go first before being subject to the ordeals of the blind or double blind test. This would build confidence and make them approach the hands-on test more honestly and enable them to avoid the use of exaggeration later in the blind testing.
Regards,
Steve.
Regards,
Steve.
Posted on: 13 November 2003 by Steve Toy
quote:
How long does it take you to form an opinion on a cable?
With the Omiga i/c I received last week it took me just a few minutes to pick out that it did leading edges better than the other one previously sent to me, as well as my Anthem. It took an hour or so to notice more detail in the higher frequencies, and several days before I noticed better vocal presence and separation.
I believe that spot-the-difference tests using ears is actually easier than with your eyes.
How many spot-the-difference puzzles using two near-identical photos have you done where they tell you that there are, say eight differences and you have to circle them with a pen.
Now lets make it more difficult with the visual puzzle:
1) Remove the circling with the pen - you have to give a verbal description instead.
2) Don't state how many differences there are in the two pics.
3) Impose a time limit of, say, ten minutes to complete the task.
The auditive task will be easier to complete but harder to verify.
The visual task will be extremely difficult but the results will be easier to check.
Blind auditive testing makes the checking easier but the task itself more difficult and less accurate regarding actual findings from the part of the listener.
The issue of absolutism versus subjectivism will always prevail regarding listening whether the tests are hands-on or blind.
Regards,
Steve.
[This message was edited by Steven Toy on FRIDAY 14 November 2003 at 06:30.]
Posted on: 13 November 2003 by Steve Toy
quote:
In either test, can I choose the music or is there some piece that you would prefer to make it easier for you?
In these tests I'm not interested in your subjective opinion of which is best, but whether you can tell me which is in use.
After lengthy hands-on tests using different pieces of music I could then go on with confidence to undertake a blind test provided I chose the music.
Regards,
Steve.
Posted on: 13 November 2003 by Laurie Saunders
With something that is subjective , I suspect that there is a natural hurdle to overcome, viz we tend to dislike the unfamiliar, whether "better" or "worse". Trying to achieve absolute objectivity in areas that intrisically have a significant subjective element are, I believe, largely futile. We can agree or diagree on particular characteristics, but not so easily on which is "better". I can`t stand milk or sugar in coffee. Others have the reverse opnion, though the presence or absence of milk and sugar is fairly easy to identify.
When a colleague who has significant appropriate experience recommends a particular bottle of wine, do you insist on "double blind testing"?......or simply try it yourself?
This is why a lot of the so called "double blind tests" carried out by Hi-Fi journals using "experts", are IMHO largely irrelevant for me. I make my own judgements and choices. I suggest you have the courage to do the same, and pay little heed to others`opinions
Laurie S
When a colleague who has significant appropriate experience recommends a particular bottle of wine, do you insist on "double blind testing"?......or simply try it yourself?
This is why a lot of the so called "double blind tests" carried out by Hi-Fi journals using "experts", are IMHO largely irrelevant for me. I make my own judgements and choices. I suggest you have the courage to do the same, and pay little heed to others`opinions
Laurie S
Posted on: 14 November 2003 by Laurie Saunders
Andrew, whether people listen to my opinions is indeed their choice. Many have listened to my SUGGESTIONS and then gone on to form their own opinions of their veracity. Many have actually reported gains as a result of reading my postings. Can you give an example of another individual whose quality of life you have helped move forward? I very much doubt it. It would appear that your net contribution seems be limited to nothing more than atempting to display how clever (you think) you are.
Laurie S
Laurie S
Posted on: 14 November 2003 by Nigel Cavendish
I suspect many people on this forum would be afraid to take part in any tests where they did not know the provenance of the source because they might discover that they cannot distinguish seriously expensive kit from cheaper stuff.
And I would also say that any kit that takes months of audition to be distinguished from any other is probably not worthy of consideration particularly if it comes at a premium price.
cheers
Nigel
And I would also say that any kit that takes months of audition to be distinguished from any other is probably not worthy of consideration particularly if it comes at a premium price.
cheers
Nigel
Posted on: 14 November 2003 by matthewr
Andrew has improved my quality of life by posting intelligently on the subject of science in the face of overwhelming and often aggressive opposition.
Obviously the actual increase in quality is relatively small but all my friends agree its at least as big as the differences between 6mm and 10mm mains spurs.
Matthew
Obviously the actual increase in quality is relatively small but all my friends agree its at least as big as the differences between 6mm and 10mm mains spurs.
Matthew
Posted on: 14 November 2003 by Laurie Saunders
Matthew...that is your OPINION....which I would grant you are perfectly entitled to. Your use of the word "intelligent" in this context is meant, I agree, in the narrowest possible, self - serving (ie useless) sense. Perhaps we ought to carry out double blind(in the metaphorical sense) testing on Andrew T`s postings to see if their efficacy can be proven scientifically
For my part their value is not easy for me to discern. I would need strong persuasion by a seasoned expert, such as your good self, of any merit in them at all
laurie s
[This message was edited by Laurie Saunders on FRIDAY 14 November 2003 at 12:35.]
[This message was edited by Laurie Saunders on FRIDAY 14 November 2003 at 12:40.]
For my part their value is not easy for me to discern. I would need strong persuasion by a seasoned expert, such as your good self, of any merit in them at all
laurie s
[This message was edited by Laurie Saunders on FRIDAY 14 November 2003 at 12:35.]
[This message was edited by Laurie Saunders on FRIDAY 14 November 2003 at 12:40.]
Posted on: 14 November 2003 by syd
quote:
Originally posted by AndrewThomas:quote:
Originally posted by syd:
That's a pity as you are probably the one person who goes on about them the most on this forum but can't or won't tell us the results. How much can you tell us then. Was music in any shape or form included in it, anything to do with sound or was it just pictures or objects. It would be fascinating to know about the methodology, the numbers of people involved etc.
You'll find plenty of information on results, methodology, numbers etc, etc on this topic if you pursue the two references I gave above.
Andrew are these two references available over the net or will I need to purchase them. Also did you take part in them.
Yours in Music
Syd
Posted on: 14 November 2003 by Mekon
Syd, I presume you are specifically interested in experiences of DBTs related to Hi-Fi, right? If you are interest in what it's like taking part in DBTs generally, I have participated in a whole load of them. PM me if you are interested.
Posted on: 14 November 2003 by matthewr
Laurie,
Surely if nothing else Andrew's sceptical stance keeps you on your toes, encourages you to further critically appriase your investigations and, at least potentially, improves the quality of your advice by increasing the likelihood that it is correct.
Also if it turns out that your advice is not correct Andrew's scepticism may well have saved some people a few hundred pounds and whole of lot of hassle by discouraging them from installing multiple high spec spurs.
All of which is the point of peer review and a dose of healthy scepticism.
Matthew
Surely if nothing else Andrew's sceptical stance keeps you on your toes, encourages you to further critically appriase your investigations and, at least potentially, improves the quality of your advice by increasing the likelihood that it is correct.
Also if it turns out that your advice is not correct Andrew's scepticism may well have saved some people a few hundred pounds and whole of lot of hassle by discouraging them from installing multiple high spec spurs.
All of which is the point of peer review and a dose of healthy scepticism.
Matthew
Posted on: 14 November 2003 by Laurie Saunders
Matthew, with all due respect the points you raise could not be more wrong......how do Andrew`s comments keep me "on my toes"?
.........and as far as "encouraging me to further appraise my investigations".....
Matthew.....don`t pretend to be so naive.
The only thing that Andrew`s postings encourage me to do is to write the sort of responses to them that you see here
Andrew`s stance is utterly indefensible IMHO
His postings reek of the sort of inward looking intellectual - self - indulgance that I encounter all too frequently, and find increasingly tiresome as I get older
As I have said again and again, and you well know this Matthew.... others are free to try or not, and draw their own conclusions and then proceed accordingly, as they choose. I simply report my own perception of improvements after making some modifications. I do have some (sketchy) theories to try to explain the effects I percieve, but these are really not of overwhelming importance for me
Laurie S
.........and as far as "encouraging me to further appraise my investigations".....
Matthew.....don`t pretend to be so naive.
The only thing that Andrew`s postings encourage me to do is to write the sort of responses to them that you see here
Andrew`s stance is utterly indefensible IMHO
His postings reek of the sort of inward looking intellectual - self - indulgance that I encounter all too frequently, and find increasingly tiresome as I get older
As I have said again and again, and you well know this Matthew.... others are free to try or not, and draw their own conclusions and then proceed accordingly, as they choose. I simply report my own perception of improvements after making some modifications. I do have some (sketchy) theories to try to explain the effects I percieve, but these are really not of overwhelming importance for me
Laurie S
Posted on: 14 November 2003 by ken c
quote:
Originally posted by Nigel Cavendish:
I suspect many people on this forum would be afraid to take part in any tests where they did not know the provenance of the source because they might discover that they cannot distinguish seriously expensive kit from cheaper stuff.
And I would also say that any kit that takes months of audition to be distinguished from any other is probably not worthy of consideration particularly if it comes at a premium price.
cheers
Nigel
nigel, i would be very interested (definitely not AFRAID) to take part in a DBT involving different equipment, provided the test involved actual listening (what else, i hear you ask?). the higher the difference in price, the higher my motivation. however, a proper DBT appears to be very time consuming to do it properly -- and seems to requite at least 2 people to administer.
is there a "poor man's version" of DBT that we can all try? using say SWMBO as administrator? just to bring it to the domain of a practical everyday technique?? right now, the methodology seems rather complicated... yes?
my gut feel is that DBT will probably not produce results that are very different from normal tests that we all currently use, with all the so called statistical bias, noise, etc.
i agree that if an expensive piece of kit takes long to establish its superiority, then some explanations are called for. for example, listener is deaf, or its too cold, or listener in a bad mood, or equipment not set up properly, or room acoustics not right, or mains supply of poor quality, or, dare i suggest it, lack of multiple spurs...!!!
enjoy
ken