Skip to content
Licensed Unlicensed Requires Authentication Published by De Gruyter September 6, 2023

Bland and Altman agreement method: to plot differences against means or differences against standard? An endless tale?

  • Bruno Mario Cesana ORCID logo EMAIL logo and Paolo Antonelli

Abstract

Objectives

In the Bland and Altman analysis of agreement studies, there is some controversy whether “to plot the differences between the Standard/actual measurement method and the test/new measurement method against their mean” or “to plot the differences against the standard method”. Of course, this is not just a “graphic dispute” as a regression model is inherent in the graphical choice to test the proportional and systematic biases.

Methods

We revised two relevant papers claiming to plot the differences against the standard and outlined their pitfalls taking into account the underlying statistical methodology. Furthermore, we have considered the conditions (correlation between the two measurement methods and ratio of their variances) leading correlation coefficient and regression slope between differences and means or differences and standard different from zero.

Results

We have shown the situations in which the regression slope and the correlation coefficient calculated from the differences and means according to Bland and Altman approach or calculated from the differences and standard are closer to zero giving the minimum possible value of spurious proportional error between the two methods.

Conclusions

We highlighted how the calculation of the expected values of the correlation coefficients and, above all, of the regression slope can be very useful for choosing the statistical model in the context of an agreement study between two measurement methods. Finally, we outlined some recommendations for understanding the real possibility of carrying out agreement or calibration studies.


Corresponding author: Bruno Mario Cesana, MD, Retired Associate Professor of Medical Statistics, Department of Clinical Sciences and Community Health, Unit of Medical Statistics, Biometry and Bioinformatics “Giulio A. Maccacaro”, University of Milan, Via Giovanni Celoria 22, 20133 Milan, Italy, Phone: 02503 20854/02503 20855, E-mail:

  1. Research ethics: Not applicable.

  2. Informed consent: Not applicable.

  3. Author contributions: The authors have accepted responsibility for the entire content of this manuscript and approved its submission.

  4. Competing interests: The authors state no conflict of interest.

  5. Research funding: None declared.

  6. Data availability: SAS code or/and R code for doing the calculations shown in the paper are available on request.

Appendix: Simulation study of Hopkins paper [9]

Hopkins [9] firstly simulated a “True Value” (V1, say) Gaussian distributed as: G∼( μ V = 50 , σ V 2 = 169 with the variance expressing only the biological variability). Then, by adding a measurement error, he calculated:

  1. -Y1 = β 0Y⋅V1 + α 0Y + ε 0Y, Gaussian distributed as: G ( β 0 Y μ V + α 0 Y ; β 0 Y 2 σ V 2 + σ 0 Y 2 ) , with β 0 Y = 1 , α 0 Y = 0 , a n d σ 0 Y 2 = 9 ; and

  2. -X1 = β 0X⋅V1 + α 0X + ε 0X, Gaussian distributed as G∼( β 0 X μ V + α 0 X ; β 0 X 2 σ V 2 + σ 0 X 2 with β 0 X = 30 , α 0 X = 100 , a n d σ 0 X 2 = 40,000 ).

It has to be noted that we above used Y1 and X1 instead of Hopkins’ weird notation of Y-Criterion (first step) and X-Practical, respectively [9].

Then, Hopkins [9] made a not standard “calibration study” between Y1 and X1, by fitting an OLS regression, instead of the appropriate Deming’s regression [11], and obtained the OLS slope (b1, say) and intercept (a1, say). It has to be noted that the expected value of b1 is given by the covariance between Y1 and X1 equal to β 0 Y β 0 X σ V 2 divided by the variance of X1 given by σ 0 X 2 ; then, the expected value of the intercept (a1) can be calculated.

At the second step, Hopkins [9] simulated a new sample of “True Value” (V2, say) and calculated from V2 a second couple of variables (Y2, and X2, say for avoiding the Hopkins’s definition [9] of “Criterion Y”, already used in Step 1, and of “Hidden X” in Excel® spreadsheet) following the previously reported method. Then, by using the coefficients (b1 and a1) of the calibration straight line previously obtained, Hopkins [9] calculated a variable, defined “Practical-Y”, as: “Practical-Y”=b1⋅X2 + a1, distributed as G∼( b 1 μ X + a 1 ; b 1 2 σ X 2 ) that shows, as expected, a perfect regression (slope=1, and intercept=0) with Y1, used in the calibration regression. Furthermore, it has to be stressed that “Practical-Y” has an expected lower variance than Y1 (about 3/4, from Hopkins’ simulation parameters [9]) leading to the surprising result of having obtained a more precise measurement method than the one used for calculating the calibration equation and, in addition, an expected regression slope (correlation coefficient) of the DM analysis different from zero.

Actually, from the parameters of Hopkins’ simulation [9] it is possible to calculate an expected correlation coefficient between Y2 and “Practical Y” of 0.867030 that with the expected values of their variances of 178 and 133.784606, respectively, gives for the DM approach a correlation coefficient of −0.2763283 and a regression coefficient of −0.152629. Otherwise, for the DS approach the values are 0.000165 and 0.000949, respectively.

However, it must be noted that the values selected by Hopkins [9] allow to obtain always progressively lower slope and correlation coefficient for the DS approach with the consequent claim that DS approach gives less biased results than the B&A DM proposal [1, 2].

It has to be noted that the final steps of Hopkins’s paper [9] are in according to the sentence in Krouwer’s paper [8]: “Then, in order to gain regulatory approval, the manufacturer will repeat the study but this time as a method comparison study (e.g. using the commercial assay with its built in calibration equations).” and to some requirements from regulatory and scientific authorities reported also in the paper of Stevens et al. [17].

References

1. Altman, DG, Bland, JM. Measurement in medicine: the analysis of method comparison studies. J R Stat Soc – Ser D Statistician 1983;32:307–17. https://doi.org/10.2307/2987937.Search in Google Scholar

2. Bland, JM, Altman, DG. Statistical methods for assessing agreement between two methods of clinical measurement. Lancet 1986;8476:307–10. https://doi.org/10.1016/s0140-6736(86)90837-8.Search in Google Scholar

3. Hauck, WW, Anderson, S. New statistical procedure for testing equivalence in two-group comparative bioavailability trials. J Pharmacokinet Biopharm 1984;12:83–91. https://doi.org/10.1007/bf01063612.Search in Google Scholar

4. Shieh, G. Assessing individual equivalence in parallel group and crossover designs: exact test and sample size procedures. PLoS One 2022;17:e0269128. https://doi.org/10.1371/journal.pone.0269128.Search in Google Scholar PubMed PubMed Central

5. Bland, JM. Website. https://www-users.york.ac.uk/∼mb55/meas/sizemeth.htm [Accessed 28 Jul 2023].Search in Google Scholar

6. Cesana, BM, Antonelli, P. Agreement analysis: further statistical insights. Ophthalmic Physiol Opt 2012;32:436–40. https://doi.org/10.1111/j.1475-1313.2012.00916.x.Search in Google Scholar PubMed

7. CLSI. Measurement procedure comparison and bias estimation using patient samples, 3rd ed. CLSI guideline EP09c. Wayne, PA: Clinical and Laboratory Standard Institute; 2018.Search in Google Scholar

8. Krouwer, JS. Letter to the editor: why Bland–Altman plots should use S, not (Y + X)/2 when X is a reference method. Stat Med 2008;27:778–80. https://doi.org/10.1002/sim.3086.Search in Google Scholar PubMed

9. Hopkins, WG. Bias in Bland–Altman but not regression validity analyses. Sports Med 2004;8:42–6.Search in Google Scholar

10. Fuller, WA. Measurement error models. New York: Wiley; 1987.10.1002/9780470316665Search in Google Scholar

11. Deming, WE. Statistical adjustment of data. New York: Wiley; 1943.Search in Google Scholar

12. Bland, JM, Altman, DG. Comparing methods of measurement: why plotting difference against standard method is misleading. Lancet 1995;346:1085–7. https://doi.org/10.1016/s0140-6736(95)91748-9.Search in Google Scholar PubMed

13. Bland, JM, Altman, DG. Measuring agreement in method comparison studies. Stat Methods Med Res 1999;8:135–60. https://doi.org/10.1177/096228029900800204.Search in Google Scholar PubMed

14. Ferraro, S, Biganzoli, G, Bussetti, M, Castaldi, S, Biganzoli, EM, Plebani, M. Managing the impact of inter-method bias of prostate specific antigen assays on biopsy referral: the key to move towards precision health in prostate cancer management. Clin Chem Lab Med 2023;61:142–53. https://doi.org/10.1515/cclm-2022-0874.Search in Google Scholar PubMed

15. Choudhary, PK, Nagaraja, HN. Measuring agreement models, methods, and applications. Hoboken, NJ: John Wiley & Sons, Inc.; 2017:25 p.10.1002/9781118553282Search in Google Scholar

16. Mansournia, MA, Waters, R, Nazemipour, M, Bland, M, Altman, DG. Bland-Altman methods for comparing methods of measurement and response to criticisms. Global Epidemiol 2021;3:100045. https://doi.org/10.1016/j.gloepi.2020.100045.Search in Google Scholar PubMed PubMed Central

17. Stevens, NT, Steiner, SH, MacKay, RJ. Assessing agreement between two measurement systems: an alternative to the limits of agreement approach. Stat Methods Med Res 2017;26:2487–504. https://doi.org/10.1177/0962280215601133.Search in Google Scholar PubMed

18. Stevens, NT, Steiner, SH, MacKay, RJ. Comparing heteroscedastic measurement systems with the probability of agreement. Stat Methods Med Res 2018;27:3420–35. https://doi.org/10.1177/0962280217702540.Search in Google Scholar PubMed

19. Taffè, P. Effective plots to assess bias and precision in method comparison studies. Stat Methods Med Res 2018;27:1650–60. https://doi.org/10.1177/0962280216666667.Search in Google Scholar PubMed

20. Nawarathna, LS, Choudhary, PK. A heteroscedastic measurement error model for method comparison data with replicate measurements. Stat Med 2015;34:1242–58. https://doi.org/10.1002/sim.6424.Search in Google Scholar PubMed

21. Taffé, P, Peng, M, Stagg, V, Williamson, T. biasplot: a package to effective plots to assess bias and precision in method comparison studies. Stata J 2017;17:208–21. https://doi.org/10.1177/1536867x1701700111.Search in Google Scholar

22. Taffé, P, Peng, M, Stagg, V, Williamson, T. MethodCompare: an R package to assess bias and precision in method comparison studies. Stat Methods Med Res 2018;28:1–9. https://doi.org/10.1177/0962280218759693.Search in Google Scholar PubMed

23. Taffé, P, Halfon, P, Halfon, M. A new statistical methodology overcame the defects of the Bland & Altman method. J Clin Epidemiol 2020;124:1–21. https://doi.org/10.1016/j.jclinepi.2020.03.018.Search in Google Scholar PubMed


Supplementary Material

This article contains supplementary material (https://doi.org/10.1515/cclm-2023-0306).


Received: 2023-03-24
Accepted: 2023-08-20
Published Online: 2023-09-06
Published in Print: 2024-01-26

© 2023 Walter de Gruyter GmbH, Berlin/Boston

Downloaded on 9.5.2024 from https://www.degruyter.com/document/doi/10.1515/cclm-2023-0306/html
Scroll to top button