Philosophical, scientific and statistical background to evidence based medicine
Mr James W Fairley BSc MBBS FRCS MS
Consultant ENT Surgeon
Last updated 16 June 2009 (illustration added – The Double Blind Surgeon)
© 1990 – 2019 JW Fairley
- Popperian logic
- Probability & statistics
- Ignorance & truth
- Randomised Controlled Trials
- The null hypothesis
- Why RCT’s?
- The double blind surgeon
- Why Statisitics? RCT of parachutes
- Conflicts with practice
- Hierarchy of Evidence
- Nihilism vs. wisdom
- Has the hierarchy been tested scientifically?
- EBM in Victorian times
- The meta-analyst
- Archie Cochrane
- Cochrane Collaboration
- Outcome measures
- Surgical skill/preference
- Choice of technique
- Surgeons preference
- RCT results & individual surgical practice
- Professional judgement
- Teamwork & control of the performance environment
- Evolving techniques
- Recognizing advances
- Conclusions & recommendations
- Justification of historical research
- Patrick Watson-Williams
- Historical development of the theory of focal sepsis
- Focal sepsis in the sinuses
- Watson-Williams’ method of Suction-exploration
- Contemporary medical press reviews of Watson-Williams’ work
- Similarities between focal sepsis and functional endoscopic sinus surgery
- Rationalism versus empiricism in medicine
- Historical References
Thesis submitted for the degree of Master of Surgery, University of London, 1993.
- Historical review
- Importance of symptoms
- Value and dangers of new techniques
- Development of nasal sinus endoscopy
- Risks of over-interpretation of nasal sinus endoscopy
- Focal sepsis
- Similarities between FESS and focal sepsis
- Notes on diagnostic classification of rhinosinusitis
- The purpose of a classification of disease in clinical research
- Use of symptoms in diagnosis of rhinosinusitis
- Headaches and facial pain of nasal origin
- Sluder’s headache and therapeutic trial of cocaine in diagnosis
- Yankauer’s therapeutic trial of steam inhalations in diagnosis of sinus pain
- Brown Kelly’s therapeutic trial of ephedrine in diagnosis of sinus headache
- General comments of methods and external validity of results from observational studies on patients attending the Nasal Research Clinic
- Correlation of subjective sensation of nasal patency with nasal inspiratory peak flow rate in healthy volunteers
- Nasal pressure probe studies using a new device in healthy volunteers: Pressure applied to middle turbinate causes pain at lower threshold than inferior turbinate or nasal septum (full text)
- Reliability and validity of a nasal symptom questionnaire for use as an outcome measure in clinical research and audit
- The relationship between pain projected on a diagram of the face and systematically documented findings using rigid nasendoscopy
- The relationship between symptom scores on a specially designed questionnaire and corresponding objective measurements: Nasal inspiratory peak flow and subjective sensation of nasal obstruction
- The relationship between symptom scores on a specially designed questionnaire and corresponding objective measurements: Postnasal drip, rhinorrhoea, nasal obstruction, cough and mucociliary clearance time
- The effect on symptoms of facial pain and headache of medical treatment and operations designed to remove endoscopically documented areas of mucosal contact between the turbinates and nasal septum
- A prospective randomized controlled trial of Functional Endoscopic Sinus Surgery: Endoscopic middle meatal antrostomy versus conventional inferior meatal antrostomy. Interim results. (full text)
- Thesis Reference List
Popperian logic and Evidence Based Medicine
The same philosophy that lies behind advances in scientific knowledge in general forms the basis of evidence based medicine (EBM).
The principle of falsifiability was described by Karl Popper in his book “The logic of scientific discovery” in 1959, translated into English from the original German edition of 1934. The principle of falsifiability is that, if you want to make scientific knowledge, the first thing you need is an idea. An idea of what might be true is a theory. The statement
“I believe that x is true”
is not, as it stands, a scientific theory, because it cannot be tested by experiment. A scientist must state a theory in such a way that it can generate a hypothesis. A hypothesis is a statement that can be proved false by an experiment. Falsifiable hypotheses take the form of an “if – then” statement –
“if x is true … then y should happen”.
You (or someone else – or preferably several independent people) then carry out experiments, to see whether y does or does not happen.
If y does happen, the theory is supported – for the moment. It is not proven true, but it has not been falsified. If y does not happen, the theory has been falsified, we know it is wrong, and the theory must be changed.
- A theory that cannot generate falsifiable hypotheses is not scientific, no matter how strong your belief in it.
- Contrary to what most people think, scientists do not spend their time proving their theories true, they are trying to prove them false.
- A theory that has been tested by lots of experiments and never been proved false is a strong scientic theory, but it is not proven to be true.
- We never reach that stage, we can just say that it is probably true, because it has not yet been falsified.
Statistical probability theory: Are there really lies, damned lies and statistics?
All statistical tests of probablity also take the form of an “if – then” statement. The “if” part is one of the assumptions underlying the test. There are often other assumptions, for example that the different factors that may influence the result act independently of one another. So,
- You should never believe the results of a statistical test without knowing what assumptions were made in applying it.
- It is technically incorrect to stay that “you can prove anything with statistics”.
- Statistics don’t lie, people lie. Mostly, it comes down to a question of trust.
In fact, you can never prove anything to be true with statistics. You just reach a known level of probability, and even that known level of probability is based on assumptions that always have to be made in designing the hypothesis.
Ignorance and Truth
This fundamental tenet of a scientific statement means that,
within science, we never really know anything.
The same conclusion has been reached by many religious thinkers, especially the Hindu and Buddhist traditions.
- What is different about the scientific statement, as against the religious belief, is that it is falsifiable.
- It is worded in such a way that it can be disproved.
- To establish Truth, however, is a matter of belief and faith, and is beyond science.
When Popper published these ideas, they went against the generally accepted view within science, which was that
- you build a theory based on observations
- collect further observations to support your theory
- end up with scientifically based knowledge which is Truth.
The idea that scientific statements are true is still by far and away the most widely accepted view, even within the scientific community.
Very few lay persons – including doctors who have been “converted” to EBM – are comfortable with the combination of wildly creative theorizing and profound skepticism underlying Popperian logic. Misunderstandings abound.
Principle of falsifiability underlies the Randomised Controlled Trial
The principle of falsifiability forms the foundation of the randomised controlled trial (RCT). All RCT’s are designed around a null hypothesis.
The null hypothesis
The null hypothesis is a sub-set of the Popperian falsifiable scientific statement. In establishing whether or not treatment x is beneficial, we have to compare it with something else – treatment y. Let us say that treatment x is a new pill for condition a, treatment y is the currently accepted treatment. The null hypothesis takes the form of
"the outcome of treatment x is the same as treatment y".
We then design an experiment to try and disprove (falsify) the null hypothesis. That experiment is the randomised controlled clinical trial.
Why do you need a randomised controlled trial to show x is better than y?
The trial has to be randomised to avoid bias and the placebo effect.
- Bias would occur if the doctor or company advocating treatment x were allowed to choose which patients received it.
- There might be a tendency to select patients who were thought likely to do better for the new treatment.
- The doctor might somehow impress on the patients receiving the new treatment that they should do better
- This could induce a placebo effect which could make them feel better about the treatment
- That is why the randomised controlled trial should also be done "double blind"
- Neither the patient nor the doctor should know which treatment is being given.
The double blind randomised controlled trial is where the treatment is chosen at random, and neither the patient nor the doctor knows which treatment has been given. It is regarded as the highest quality method for obtaining robust and reliable evidence about the effectiveness of treatments. The well conducted, high methodological quality RCT sits at the top of the hierarchy of evidence based medicine.
The double blind surgeon
The reader will now perhaps begin to see the flaws and difficulties that come into designing such a “high methodological quality” trial for surgical treatments – who wants a blindfolded surgeon operating on them?
Wanted: outcome measures, dead or alive
What is more, the choice of outcome measures plays a crucial part in the results. The development of the RCT model in medicine was based largely on drug trials in otherwise fatal conditions – especially respiratory infections such as pneumonia and tuberculosis. The outcome measure was simple – the patient was either alive or dead. But the bulk of modern surgical interventions are not to avoid death, they are to improve the quality of life. Choice of outcome measures is subjective and is invariably influenced by the sponsors of the trial.
How to choose an outcome measure: Crutches for broken legs
Health insurers and governments funding health expenditure worldwide are looking to EBM to cut expenditure on self limiting conditions. They might save money by not paying for crutches for patients with broken legs. How about an RCT of crutches? Of course, the patients denied crutches would not be able to walk for a while, but, once the leg had healed, and certainly by one year, they should be walking again. By choosing an outcome measure
“ability to walk one year following the injury”
and comparing patients randomly allocated either to receieve or not receive crutches, the trial would probably conclude “no evidence of benefit” from crutches in the treatment of broken leg, a self limiting condition. But surely no one would take such a trial seriously. Well, look at the outcome measures chosen in trials of grommet insertion, sponsored by the UK Government, for children with hearing loss due to glue ear. Following grommet insertion, most children get a dramatic improvement in hearing. The average grommet lasts nine months, during which hearing remains good. Once the grommets come out, a minority will get further glue ear. Meanwhile, a large proportion of the children who did not receive grommets will slowly clear the fluid and their hearing will improve. Those who don’t are often given grommets anyway, but the results are reported on the basis of “intention to treat” – so the benefit accrues to the non-treatment group. The trials report hearing results at one and two years, when most of the grommets have fallen out. Dramatic and consistent short term improvements are ignored in the conclusions.
These examples illustrate how the choice of outcome measures can be manipulated to favour the answer a trial sponsor wishes to get.
When do we need statisitics: a randomised controlled trial of parachutes
To judge the results of a trial, we usually (but not always) need statistics. If, let us say, we were conducting a randomised controlled trial of the effectiveness of parachutes on survival when jumping out of an aeroplane at 10,000 feet, we would have the following null hypothesis:
“if parachutes are ineffective, then the mortality rate will be the same, whether or not the parachute is worn”
When the first randomly assigned participant without a parachute hit the ground at terminal velocity, we might decide that we didn’t need any statistics, perhaps not even a trial, to decide this question.
No evidence base for much of surgical practice
Now you may say the parachute trial is an extreme example, but when surgeons are told that there is “no evidence base” for the majority of their work, it is because they don’t need a trial to tell them that controlling that bleeding artery is the right thing to do.
Effect size and time interval
In controlling a bleeding artery, the effect size is large, and the time interval between intervention and observable result is very short.
Skill, training and judgement are needed to achieve the result, and none of these are amenable to double blind randomised controlled trial.
The RCT of bleeding arteries, like the RCT of parachutes, will never, ever, be done. If someone was foolish enough to look for, fail to find, then publish the fact that there is no RCT evidence for the benefit of controlling a bleeding artery, Archie Cochrane would turn in his grave. He was a practising doctor, who served his time burying his tuberculous patients as a prisoner in the Second World War.
The sort of cases where statistics are needed are where the effect size is small, and the time interval between intervention and result is long – like most drug trials. That is what RCT’s were designed for, and that is what they are good at. The model can, sometimes, be applied to surgical interventions, but it is very, very difficult. That is no reason not to try, but the absence of RCT evidence is to be expected in much of surgical practice.
When we talk about strong evidence, what we essentially mean is that
- the null hypothesis has been falsified – to a known degree of probability
- we have an estimate of the effect size.
- Strong evidence is not the same as a big important effect.
- You can get strong evidence by having lots of patients in your trial, even though the size of the effect is small.
- Strong evidence does not mean good medicine
- Neither does absence of strong evidence mean bad medicine.
Popperian logic conflicts with practical application of science
Popper’s work jars uncomfortably with the small minority of practical applied scientists who take the trouble to read it. It is especially irksome to those in the applied branches of science such as engineering and medicine.
- An engineer has to know whether or not it is true that, if he builds his suspension bridge with steel of such-and-such a composition to such-and-such a design, it will stay up and won’t collapse when the hurricane comes.
- A surgeon has to know that he can stitch up a wound in this way and it will heal up.
There is a mismatch between the need for practical confidence, while acknowledging theoretical uncertainty. This lies at the root of much discontent in the current application of evidence based medicine into surgical practice. The difficulty occurs when those ignorant of the complex theoretical background to EBM seek out the answers to questions which cannot always be answered in a scientific way.
Unfortunately, zealots in the cause of EBM have published, primarily on the Cochrane website but also in other arenas, the results of reviews which conclude there is “no evidence” for treatment x.
If one reads the detail, the main cause of there being no evidence is that there are no published trials using the randomised control methodology.
- The absence of this particular sub-type of evidence does not mean that treatment x does not work, it just means we don’t know whether or not it works.
- More accurately, it means we can’t put a statistical figure on how ignorant we are of whether or not treatment x works.
- But then we don’t really know anything about anything anyway…
This, of course, is too subtle for most people who actually need to know whether or not treatment x works. Those responsible for prioritising resource allocation are not always philosophers of science. They are easily misled by the published evidence, they further simplify the conclusions. They begin to entertain the illusion that, if there really is no evidence, then they must know as much – or as little – as those experienced in the field. They are then apt to make ill informed pronouncements and bad decisions, on matters about which they are ill educated and ill equipped to judge.
Further EBM pages authored by JW Fairley
Previous EBM-related publications by JW Fairley
Popper KR 1959 The logic of scientific discovery. Hutchinson & Co, London. 8th Impession 1975 ISBN 0 09 111721 6
Cochrane AL 1972 Effectiveness And Efficiency: Random Reflections on Health Services. Facsimile Edn, additional contributions Silagy C, Chalmers I, 1999 RSM Press ISBN 185315394X
Sackett D.L., Straus S.E., Richardson W.S, Rosenberg W., Haynes R.B. Evidence Based Medicine: How to practice and teach EBM 2nd Edn 2000 Churchill Livingstone, London ISBN: 0 443 06240 4
All information and advice on this website is of a general nature and may not apply to you. There is no substitute for an individual consultation. We recommend that you see your General Practitioner if you would like to be referred.