Not finding a difference doesn’t prove equivalence

January 28, 2014 by  
Filed under Acute Med, All Updates, EMS, Resus

Image from

The recent LINC trial was a randomised controlled trial comparing a mechanical chest compression device (LUCAS) with manual CPR(1). “No significant difference” was found for any of the main outcome measures considered.

So do you think the LINC trial demonstrated that mechanical CPR using the LUCAS device is equivalent, or at least not inferior, to manual CPR?

This was an interesting and important trial for those of us who manage prehospital cardiac arrest patients. In some social media discussions, it appears to have been interpreted by some as evidence that they are equivalent resuscitative techniques or that LUCAS is not inferior to manual CPR.


However, unless you see a p-value less than 0.05 in the table above, (issues of multiple hypotheses testing aside) no evidence of anything was demonstrated; not of difference and certainly not of equivalence. When faced with 2-sided p values >5%, investigators often conclude that there is “no difference” between the treatments, leading to an assumption among readers that the treatments are equivalent. A better conclusion is that there is “no evidence” of a difference between treatments (see opinion piece by Sackett, 2004(2)). In order to determine if treatments are equivalent, equivalence must be tested directly.

How can we test for equivalence?
First, we must define equivalence. It is crucial that this definition is provided a priori i.e. defined before the data are examined. As the focus of the LINC study was on superiority the investigators did not offer an a priori definition of equivalence. However, the CIRC study(3), conducted some time earlier and similar in design, did. (This study examined an alternative mechanical CPR device, the Zoll AutoPulse).

When establishing equivalence between treatments, instead of the more customary null hypothesis of no difference between treatments, the hypothesis that the true difference is equal to a specified ‘delta’ (δ) is tested (4).

To analyse the LINC results to look for equivalence, we can derive our delta values from the CIRC study, which as we’ve said did offer an a priori definition of equivalence. For the purpose of illustration, we will use the risk-difference stopping boundaries calculated for the CIRC study, rather than the odds ratio based equivalence margins, on the grounds of greater simplicity and clinical appropriateness. Therefore, we set our equivalence margins at -δ=-1.4% and δ=1.6%, meaning, where LUCAS fared no worse than manual CPR by 1.4% and no better by 1.6%, we will consider the two techniques equally efficacious. Thus, we will declare equivalence between LUCAS and manual CPR if the 2-sided 95% CI for the treatment difference lies entirely within -1.4% and 1.6%, and noninferiority if the one-sided 97.5% CI for the treatment difference (equivalent to the lower limit of the two-sided 95% CI) lies above -1.4%. (5).

These concepts and how they differ from a traditional comparison are more readily appreciated graphically (Fig. 1).

Figure 1. Two one-sided test procedure and the equivalence margin in equivalence/noninferiority testing between LUCAS and manual CPR

1a Traditional comparative study, such as the LINC trial, shows results with confidence intervals that show no evidence of a difference as they encompass 0.


1b. Using equivalence margins (-δ and δ) derived from a similar study (CIRC), we show that the LINC trial does not demonstrate that LUCAS and manual CPR are equally efficacious, since the 95% CI do not lie completely within the equivalence margins.

1c. The one sided CI lies above -δ for some outcomes, allowing us to declare non-inferiority on those measures.

The presentation of the LINC trial’s results shows no evidence of a difference in outcomes between mechanical and manual CPR, which is not the same as showing they are equivalent or that mechanical CPR is non-inferior. However if we re-examine their data using equivalence margins (-δ, δ) derived from a similar study (CIRC), there is some evidence that the LUCAS device is not inferior to manual CPR (but not necessarily equivalent) with respect to longer term good neurological outcome.


1. Rubertsson S, Lindgren E, Smekal D, er al. Mechanical Chest Compressions and Simultaneous Defibrillation vs Conventional Cardiopulmonary Resuscitation in Out-of-Hospital Cardiac Arrest
JAMA. 2014 Jan 1;311(1):53-61

2. Sackett D. Superiority trials, non-inferiority trials, and prisoners of the 2-sided null hypothesis
Evid Based Med 2004;9:38-39 [Open Access]

3. Lerner EB, Persse D, Souders CM, et al. Design of the Circulation Improving Resuscitation Care (CIRC) Trial: a new state of the art design for out-of-hospital cardiac arrest research
Resuscitation. 2011 Mar;82(3):294-9

4. Dunnett CW, Gent M. Significance testing to establish equivalence between treatments, with special reference to data in the form of 2X2 tables. Biometrics. 1977 Dec;33(4):593-602

5. Piaggio G, Elbourne DR, Pocock SJ, et al. Reporting of noninferiority and equivalence randomized trials: extension of the CONSORT 2010 statement. JAMA. 2012;308(24):2594-604. [Open Access]


3 Responses to “Not finding a difference doesn’t prove equivalence”

  1. brooks walsh on January 29th, 2014 01:58

    I appreciate the fuller explanation of the LINC trial, and how to understand the results. I’m not sure how I would boil this down for EMS agencies, nursing staff, not to mentioned those physicians who are less statistically inclined. Most folks just want to know “Is LUCAS better, or about the same?” They don’t care about the finer points, they want to know of they should spend the $$$ for the machine.

    So, could your explanation here help inform.that discussion?

  2. Ryan James MD on January 29th, 2014 15:48

    Fantastic review guys! and Great explaination of Noninferiority stats. For all those who are too busy to read the full paper (as I was) , WikEM has a nice summary of

  3. DocXology on February 6th, 2014 09:59

    Thank you for distinguishing ‘equivalence’ from ‘non-inferiority’.

    In real-world terms, would there be good reasons to devise a test to prove an intervention is ‘equivalent’ rather than ‘non-inferior’ or ‘superior’?

    If any new treatment has proposed benefits in cost, ease-of-administration, accessibility, convenience etc, AND was shown to be at least ‘non-inferior’, if not ‘superior’ wouldn’t that possibly lend some weight to adopting it?

    I note that if you were to extend the delta to around 3% for ‘non-inferiority’ in your graph, that it would encompass all endpoints that were recorded.

    In the case of a costly automated device, it could be argued that better coverage of a region can be achieved with the same pre-hospital workforce because of the ability to deploy single rescuers.