Pain RCT’s commonly sacrifice validity for reliability

Randomised controlled trials (RCTs) are one of the most important tools we have to understand what might work in healthcare. They have limitations but to understand this kind of evidence is to grapple with these. The key think all RCTs are aiming for is validity - does the study show what we think it does. There is one flaw we see that I have never seen recognised seems to sacrifice validity for reliability. Reliability in research just means that the study is repeatable about would likely get the same result. Reliability without validity is like having a thing that works every time but is useless.

The typical experimental design in studies that explore how an intervention affects pain looks like this.  Recruit participants with a common pain or pathology, say lower back pain and randomise them into one of two groups.  Group A, the experimental group is given the intervention you want to test e.g., ABT (a therapy I invented in another blog) and group B, the control group is given a placebo, a sham, or no treatment.

 

You measure their pain.  Well, you don’t measure pain because you can’t.  There is no objective measure, no blood markers, no clear neural correlate so you ask. ‘How painful is your back?’.  In order to gather useful data that you can analyse this has to be a number, hence the visual analogue scale (VAS).  This is a 100mm horizontal line on a piece of paper, the leftmost point meaning ‘no pain’ the right, ‘the worse imaginable pain’.  The participant marks on the line and the researcher measures its distance from the left, voila – a pain score out of a 100.   Reminder, this is a measure of self-reported levels of pain, not pain itself, it’s subjective, it’s a behaviour. This is ok, it’s probably the best we have.

 

The method goes like this. Measure VAS scores in both groups before the intervention, this gives you baseline scores and once again after the intervention. This gets you the before and after score for both groups giving you everything you need to calculate the means, confidence intervals, significance and effect sizes to determine if the intervention performs better than a placebo/nothing/usual, and by how much.  

 

There is an issue here.  You only need to gather the post-intervention scores.  Each group will produce a distribution of scores and the usual statistical test for difference, which the t-test does very well.  We only care about the DIFFERENCES between the two groups.  We wouldn't care about within-group changes if there was no control/comparator group and would call it a very poor study.

 

But… the baseline scores might be different! 

…you may shout.  Indeed, but the group allocation is, and must always be randomised (this puts the R in RCTs).  So long as participants are randomly allocated then any differences between the groups are by definition due to sampling error alone. The statistics we use are designed to account for sampling error. If a statistically significant difference is found then that is merely a type I error.

 

Why not just take baseline scores anyway? 

Sounds reasonable, after all, you’ve got your sample and more data must be better.  The answer is ‘no’, because there is no gain and there are risks.  Risks that psychologists have been considering for decades. 

 

 

Likely large but unclear problems

 

The famous Hawthorn Effect has been commonplace in psychology books since the 1950s.  The short version of the story is that a series of studies at the Hawthorn Works – an electrical factory near Chicago were carried out in the 1920s to improve productivity.  This was done by changing the brightness of the ambient lighting in the factory.  They found changes brought about increased productivity again and again… then when the study ended, the improvements vanished.  While the study itself had some serious methodological problems and the data somewhat illusive since, an idea caught on.  Given the ubiquity in 50+ years of psychology text books there is considerable variability in its definition and perhaps may be as simple as - novelty decreases boredom and increases production (Chiesa & Hobbs, 2006).  Perhaps then the Hawthorn Effect is really just academic folklore, yet a potentially more useful concept emerged from this. The idea that simply knowing that you are part of a study will affect how you behave without even being conscious of it.  For example, well-being and post-op knee pain (VAS) were better in participants who knew they were in a longitudinal study (De Amici et al 2000).

 

 

Demand Characteristics

When I first trained in massage we would spend a lot of time examining each others’ postures - something I’m pleased to say I’ve long left by the wayside. What I did learn is that when you’re having your posture assessed, standing normally and naturally becomes elusive. We become deliberate and unusually upright.

Demand characteristics are any cues in a study that might give the game away as to what the study is really about and cause participants to behave differently, e.g., to please the researchers, not wanting to look silly, who knows.  Given how seriously this risks the validity of any given study, people do think about this. 

Critical evaluation of demand characteristics appears to come predominantly from a number of helpful systematic reviews lead by Jim McCambridge the Chair in Addictive Behaviours and Public Health at the University of York.  These reviews showed that in non-lab settings there are insufficient studies to justify use of the term demand characteristics (McCambridge et al 2012).  In answering their own calling for further reviews of in-lab research, they determined that the size of the effect is unclear and the construct not yet properly defined. They concluded “Awareness of being observed or having behavior assessed engenders beliefs about researcher expectations. Conformity and social desirability considerations then lead behavior to change in line with these expectations”.   In an opinion piece, McCambridge et al (2014b) warned “Researchers may too readily overlook the extent to which research studies are unusual contexts, and that people may react in unexpected ways to what we invite them to do, introducing a range of biases” and also suggested the need for “further study of whether, when, how, how much, and for whom research participation may impact on behavior or other study outcomes” (McCambridge et al 2014a)

 

What we may surmise is that there is a lack of clarity over how research studies affect the behaviours of those taking part – but they are serious nonetheless. 

 

Randomised controlled trials have great strengths and at the same time limitations that should be understood.  A crucial one is that of ecological validity – the extent to which any given study can represent the real world, and thus the extent to which drawn conclusions can reasonably be generalised to a population. Put simply, a research setting is a highly unusual place to be so serious consideration must be given to how this affects someone.

 

Back to the baseline scores

 

Taking baseline scores gives the game away.  It provides cues to the nature of the study and risks its validity.  Probably in the direction of seeing an improvement.  Demand characteristics, or something strongly resembling them do not affect objective measures (there may be exceptions), just the subjective ones, like self-reported levels of pain.

“Good afternoon, thank you for taking part in this study.  How painful is you back right now?”

Baseline scores sacrifice validity for reliability.  They introduce risk and we don’t even need to do it because we only care about the between-group differences. Now for the good news. You can overcome this problem by not taking baseline scores.

 

Thanks for getting to here.

Tris

Thanks to my mate Dr Graham Smith, psychologist at the University of Northampton for chatting this over with lots of beer.

 

Chiesa, M. and Hobbs, S. (2008), Making sense of social research: how useful is the Hawthorne Effect?. Eur. J. Soc. Psychol., 38: 67-74.

De Amici, D., Klersy, C., Ramajoli, F., Brustia, L., and Politi,P., (2000) Impact of the Hawthorne Effect in a Longitudinal Clinical Study: The Case of Anesthesia,

Controlled Clinical Trials. Volume 21(2), pp 103-114

McCambridge J, de Bruin M, Witton J. (2012) The effects of demand characteristics on research participant behaviours in non-laboratory settings: a systematic review. PLoS One. 2012;7(6):e39116

McCambridge, J., Kypri, K., and Elbourne, D. R., (2014b) Research participation effects: a skeleton in the methodological cupboard, Journal of Clinical Epidemiology, Volume 67, Issue 8, pp 845-849

McCambridge, J., Witton, J. and Elbourne, D. R., (2014a) Systematic review of the Hawthorne effect: New concepts are needed to study research participation effects, Journal of Clinical Epidemiology, Volume 67(3), pp267-277

Previous
Previous

What I don’t do

Next
Next

If professional associations don’t read the studies they share, then who does?