Statistics in diagnostic medicine

Peter Schlattmann

doi:10.1515/cclm-2022-0225

Publicly Available Published by De Gruyter March 31, 2022

Statistics in diagnostic medicine

Peter Schlattmann

From the journal Clinical Chemistry and Laboratory Medicine (CCLM)

https://doi.org/10.1515/cclm-2022-0225

Abstract

This tutorial gives an introduction into statistical methods for diagnostic medicine. The validity of a diagnostic test can be assessed using sensitivity and specificity which are defined for a binary diagnostic test with known reference or gold standard. As an example we use Procalcitonin with a cut off value ≥ 0.5 g/L as a test and Sepsis-2 criteria as a reference standard for the diagnosis of sepsis. Next likelihood ratios are introduced which combine the information given by sensitivity and specificity. For these measures the construction of confidence intervals is demonstrated. Then, we introduce predictive values using Bayes’ theorem. Predictive values are sometimes difficult to communicate. This can be improved using natural frequencies which are applied to our example. Procalcitonin is actually a continuous biomarker, hence we introduce the use of receiver operator curves (ROC) and the area under the curve (AUC). Finally we discuss sample size estimation for diagnostic studies. In order to show how to apply these concepts in practice we explain how to use the freely available software R.

Keywords: likelihood ratio; predictive values; receiver operator curve; sample size estimation; sensitivity; software R; specificity

Motivating example

Worldwide, sepsis and its sequelae still remain a frequent cause of acute illness and death in patients with community and nosocomial acquired infections [1]. Sepsis may be seen as systemic inflammatory response due to infection. However, a gold standard for the proof of infection is missing. Depending on prior antibiotic therapy, bacteremia is found only in approximately 30% of patients with sepsis. Furthermore, early clinical signs of sepsis, like fever, tachycardia, and leucocytosis, are unspecific and overlap with signs also seen in a multitude of systemic inflammatory response syndromes (SIRS) in the absence of infection, especially in surgical patients. Other signs, such as arterial hypotension, thrombocytopenia, or elevated lactate levels indicate, too late, the progression to organ dysfunction [2]. Thus, delay in diagnosis and treatment of sepsis causes increased mortality.

In sepsis numerous humoral and cellular systems are activated, followed by a release of a multitude of mediators and other molecules that mediate the host response to infection. Several potential diagnostic indicators measured in the bloodstream have been evaluated for their clinical ability to assess the diagnosis and severity of sepsis. One of these, the 116 amino acid polypeptide procalcitonin (PCT) is frequently used when it comes to identify bacterial infections.

In this tutorial we will use Procalcitonin as an example for the use of statistics in diagnostic medicine using data from a study by Ljungstroem et al. (2017) [3]. Also we will show how to use the freely available software R [4] for the necessary calculations.

Sensitivity and specificity of a diagnostic test

Sensitivity and specificity are key parameters when evaluating the validity of a binary diagnostic test [5] which requires knowledge of a reference or gold standard which denotes the disease status D ⁺ if sepsis is present and D ⁻ otherwise. The potential outcomes of a 2 × 2 table showing the disease status D in the columns and test results (T) in the rows are shown in Table 1. The establishment of such a reference standard is difficult when diagnosing sepsis [6, 7]. For our example we will apply the Sepsis-2 criteria as used by Ljungstroem et al. (2017) i.e. verified bacterial infection and systemic inflammatory response syndrome (SIRS). As diagnostic test we apply PCT ≥0.5 g/L indicating a positive test result (T ⁺) and values <0.5 g/L indicating a negative test result (T ⁻).

Table 1:

Potential outcomes of a diagnostic test.

	Disease present D ⁺	Disease absent D ⁻
Test positive T ⁺	True positive (TP)	False positive (FP)
Test negative T ⁻	False negative (FN)	True negative (TN)

When evaluating a diagnostic test a population of diseased persons and a population of healthy individuals is considered. Since no test is perfect a 2 × 2 table is constructed as shown in Table 1 which shows the potential outcomes of a diagnostic test. In a perfect world a diagnostic test would identify all diseased person as ill. That is, we would have only true positives (TP). In our example a patient who suffers from sepsis according to the Sepsis-2 criteria and the corresponding PCT-value of that patient is ≥ 0.5 g/L denotes a true positive result. From Table 1 we can define the sensitivity (sens) of our diagnostic test $sens = TP TP + FN$ . This implies the probability to identify diseased persons correctly using a PCT level with at least 0.5 g/L as diagnostic test.

Likewise, a non-diseased person should be identified correctly as well. This leads to the specificity (spec) of a diagnostic test given by $spec = TN TN + FP$ .

Based on Table 2 we obtain 296 true positives (TP) and a total of 667 diseased persons (TP + FN) according to Sepsis-2 criteria. Thus, the sensitivity is given as $296 667 = 0 . 44 = 44 %$ . Likewise the specificity is, given as the ratio of 664 true negatives (TN) with a total of 870 healthy persons. This leads to a specificity of $641 870 = 0 . 74 = 74 %$ .

Table 2:

Procalcitonin (PCT) and Sepsis-2 with cut off value 0.5 g/L.

	Sepsis-2⁺	Sepsis-2⁻	Total
PCT ≥0.5 g/L	296	229	525
PCT <0.5 g/L	371	641	1,012
Total	667	870	1,537

In order to quantify statistical uncertainty sensitivity and specificity should be reported together with a confidence interval. Usually a 95% interval is applied. Statistically speaking sensitvity and specificity are proportions and confidence intervals can be constructed accordingly. The estimate of a proportion p is given as $p ˆ = k n$ , k=number of events and n=total number. The corresponding standard error σ is given by

(1)

$σ p = p ˆ ( 1 − p ˆ ) n$

Then a 95% confidence interval (CI) is given by

(2)

$95 % CI = p ˆ ± 1.96 σ p$

For sensitivity a 95% CI is constructed as follows with $p ˆ = TP TP + FN$

(3)

$95 % CI = sens ± 1.96 sens ( 1 − sens ) TP + FN$

For our data we obtain sens = $296 667 = 0 . 44$ . The 95% CI is given by $0.44 ± 1.96 0.44 ( 1 − 0.44 ) 667 = ( 0.41 , 0.48 )$ . For the specificity equal to 0.74 we obtain a 95% CI (0.71, 0.77). There a are several ways to construct a confidence interval for a binomial proportion with different statistical properties [8].

Likelihood ratios

In order to create an overall measure of diagnostic performance frequently likelihood ratios are considered [9]. These have the advantage that they combine the information obtained from sensitivity and specificiy. One way to do this is the positive likelihood ratio $L R + = sensitivity 1 − specificity$ . The LR⁺ summarizes how many times more likely patients with the disease are to have that particular result than patients without the disease. More formally this is the ratio of the proportion of true positives divided by the proportion of false positives. A LR⁺>1 indicates that the test result is associated with the presence of the disease. For our data we obtain LR⁺=0.44/(1–0.74)=1.70. What does this mean? According to Jaeschke et al. (1994) [10] a LR⁺≥10 would be conclusive. The likelihood ratio equal to 1.70 observed here thus adds little information.

Again the corresponding uncertainty should be addressed using 95% confidence intervals. Formally the LR⁺ is a ratio of binomial proportions [11] where a confidence interval can be constructed on the scale of the natural logarithm. On the log scale the variance of the LR⁺ is given by

(4)

$var l o g ( L R + ) = σ L R + 2 = 1 TP + 1 TP + FN + 1 FP + 1 FP + TN$

For our example we obtain $σ L R + 2 = 1 296 + 1 296 + 371 + 1 229 + 1 229 + 641 = 0 . 0051$ .

Then a 95% CI is given by

(5)

$L R + ± 1.96 *exp ( σ L R + )$

For our example a 95% CI is obtained as $1.70 ± 1.96 * exp ( 0.0051 ) = ( 1.47 , 1.95 )$ .

Positive and negative predictive values

Bayes’ theorem

Sensitivity and specificity are used to describe the validity of a diagnostic test. These quantities can be expressed as conditional probabilities: $sensitivity = P ( T + | D + ) = TP TP + FN$ . That is the probability that a diseased person will have a positive test result. $The specificity equals = P ( T − | D − ) = TN TN + FP$ . This is the probability that a healthy person will have a negative test result. From a clinical perspective the important question is whether a positive test indicates a diseased person. This is the positive predictive value $PPV = P ( D + | T + ) = TP [ TP + FP ]$ . This probability is obtained using Bayes’s theorem. In order to apply Bayes’ theorem we need to define the prior probability of disease which is given by the prevalence $pre = P D + = TP + FN TP + FN + FP + TN$ .

(6)

$P D + | T + = P D + * P T + | D + P D + * P ( T + | D + ) + P D - * P T + | D -$

The PPV can be expressed in terms of sensitivity (sens) and specificity (spec):

(7)

$PPV = pre * sens pre* sens + ( 1 − pre ) * ( 1 − spec )$

For our data we obtain a prevalence $pre = 667 1537 = 0.43 = 43 %$ . Plugging this into the formula leads to

(8)

$PPV = 0.43 * 0.44 0.43 * 0.44 + ( 1 − 0.43 ) * ( 1 − 0.74 ) = 0.57 = 57 %$

Thus, our diagnostic PCT test does little to improve our prior knowledge which is given by the prevalence equal to 43%. Performing the test tells us that the probability that the patient suffers from sepsis given that the PCT value is larger than 0.5 g/L is 57%.

As Figure 1 shows the PPV depends on the prevalence of disease as well on sensitivity and specificity combined as LR⁺. The PPV increases with increasing prevalence. Also a larger positive likelihood ratio leads to higher predictive values.

Figure 1:

Bayes theorem: relationship between positive predictive value and prevalence for various values of LR⁺.

Fagan plot

A Fagan plot [12] uses the pretest (prior) probability together with the LR⁺ and is a graphical tool for estimating how much the result of a diagnostic test changes the probability that a patient has a disease. The rationale for a Fagan plot is given by a formulation of Bayes’s theorem as follows: posterior ∝ prior × likelihood ratio. Again, this describes how our prior knowledge is improved by applying a diagnostic test and leads to the posterior knowlegde after performing the test. In order to create a Fagan nomogramm we need to apply odds.

Odds are defined as odds = $p 1 − p$ where p is a probability. If we have a fair coin the odds for head are $0.5 1 − 0.5 = 1$ meaning that head and tail are equally likely. The prior odds in our example are given by $0.43 1 − 0.43$ =0.75. Formulating Bayes’s theorem in terms of odds leads to $p o s t e r i o r o d d s = p r i o r o d d s × l i k e l i h o o d r a t i o$ . In our example we obtain:

$posterior odds = prior odds × likelihood ratio$

$= 0.75 × 1.70$

$= 1.28$

Odds can be transformed into probabilities using the following formula $p = odds 1 + odds$ . For our data we obtain the $PPV = 1.28 ( 1 + 1.28 ) = 0.56$ .

A Fagan plot works on the log scale. Thus we use

(9)

$log ( posterior odds ) = log ( prior odds ) + log ( L R + )$

Thus taking logs creates a linear equation. Hence, a Fagan plot consists of a vertical axis on the left with the pretest probability, an axis in the middle representing the LR⁺ and a vertical line showing the PPV. By connecting the pretest probability and the LR⁺ the post test probability is obtained. Please note, that although the labels on the left and right are written in terms of probability, the tick marks are spaced at the log odds scale. The Fagan plot of our example is shown in Figure 2.

Figure 2:

Fagan plot: relationship between prevalence, likelihood ratio and positive predictive value.

The PPV tells us the probability whether a patient with positive test result really has the disease. The negative predictive value (NPV) is the probability that a person with a negative test result is really healthy. That is $NPV = P ( D − | T − ) = TN [ TN + FN ]$ . In our example we obtain a NPV=641/1012=0.63.

Communicating predictive probabilities

Looking at Eq. (7) the use of Bayes’s appears tedious, sometimes hard to understand and difficult to communicate. Thus, Hoffrage and Gigerenzer [13] introduced the concept of natural frequencies. Traditionally medical doctors are told: The prevalence of sepsis is 43%, the sensitivity is 44% and the specificity is 74%. Please tell me the probability that the patient suffers from sepsis. This is error prone especially if the disease of interest is rare and sensitivity and specificity of the test are high [14].

Thus, we proceed as follows. Assume that we have 1,000 subjects where 434 suffer from sepsis (prevalence 43%). Of these 190 will be test positive (sensitivity=44%). Of the 570 patients without sepsis 148 will be false positive (specificity=74%). Then, the PPV equals $191 191 + 147 = 0.565$ . This means that of 100 patients with a positive test 57 suffer from sepsis and 43 are false positive. The application of natural frequencies is also shown in Figure 3.

Figure 3:

Application of natural frequencies to calculate predictive values.

Receiver operator curves

Until now we have assumed that we are dealing with a binary diagnostic test. By using a cut off 0.5 g/L we have transformed the continuous marker Procalcitonin into a binary test. Obviously, other cut-off values could be used. For example we could apply a cut off value ≥ 2.0 g/L. Then we would obtain Table 3. This leads to a sensitivity $sens = 176 667 = 0.26$ and a specificity $spec = 771 870 = 0 . 89$ . As a result increasing the cut off value form 0.5 g/L to 2.0 g/L led to a decreased sensitivity and an increased specificity. Looking at descriptive statistics of the Procalcitonin data we observe a median PCT value equal to 0.2 g/L with a minimum equal to 0.01 g/L and a maximum of 200 g/L. Obviously, we could use any value between minimum and maximum as a cut off value and calculate the corresponding sensitivity and specificity.

Table 3:

Procalcitonin (PCT) and Sepsis-2 with cut off value 2.0 ng/mL.

	Sepsis-2⁺	Sepsis-2⁻	Total
PCT ≥2.0 g/L	176	99	275
PCT <2.0 g/L	491	771	1,262
Total	667	870	1,537

This is done when we create a receiver operator curve (ROC) [15] which is obtained by calculating the sensitivity and specificity of every observed data value and plotting sensitivity against 1-specificity. A test that perfectly discriminates between the two groups would yield a “curve” that coincided with the left and top sides of the plot since we would not have any false negative (FN) are false positive (FP) values. A useless test would give a straight line from the bottom left corner to the top right. This implies that a true positive and a false positive test result are equally likely.

The performance of the test can be assessed by using the area under the receiver operating characteristic curve (AUC). This area may be interpreted as the probability that a random person with the disease has a higher value of the measurement than a random person without the disease. A perfect test would have an AUC=1 and a useless test has an AUC=0.5. For our example we obtain an AUC=0.64 with 95% CI (0.61, 0.67). A good review for the construction of confidence intervals for the AUC is given by Cho et al. (2018) [16]. In conclusion Procalcitonin offers moderate diagnostic discrimination at best as shown in Figure 4.

Figure 4:

Procaclition as biomarker for sepsis: receiver operator curve.

However, after having determined that a test provides good discrimination the best cut off point for clinical needs has to be chosen. One possible approach is given by maximizing the sum of the sensitivity and specificity. This leads to the so called Youden Index J=sens + spec − 1. Hence for each possible cut off value J is calculated and the value which leads to a maximum of J is chosen. For the data at hand we obtain a cut off value equal to 0.175 g/L with a sensitivity of 0.65 and a specificity of 0.56.

An approach which is just data driven is not helpful because also the clinical situation needs to taken into account. Schuetz et al. (2019) [17] for example “refined the established PCT algorithms by incorporating severity of illness and probability of bacterial infection and reducing the fixed cut-offs to only one for mild to moderate and one for severe disease 0.25 g/L and 0.5 g/L, respectively”.

Sample size estimation

Like in therapeutic clinical trials sample size estimation should be performed for diagnostic studies. Knottnerus and Muris (2003) [18] present the whole strategy needed for the development of diagnostic tests. This involves the selection of cases and controls and ensuring that a correct reference standard is defined. From a statistical point of view the allowable Type I and Type II errors, the primary outcome of interest together with a relevant effect size and is variability need to be defined in advance. Formulas and tables for the planning of binary tests may be found in the paper by Flahault et al. (2003) [19].

For continuous biomarkers sample size estimation can be based on the AUC of the ROC curve. Let us consider a phase I diagnostic study where we want to determine whether the new diagnostic test has any ability to discriminate diseased patients from healthy controls. Then the null hypothesis is that the AUC equals 0.5 vs. the alternative hypothesis that the AUC is ≠0.5. Formulas for sample size estimation may be found in Obuchowski et al. (2004, page 1123 Eqs. (2) and (3)) [20].

Let us assume that our new biomarker performs better than Procalcitonin with an AUC=0.7. We accept a Type I error of 5% (two-sided) and we want a power of 90%. Then we need 41 cases and 41 controls.

Using R for the calculations

The freely statistical available package R [4] may be used to perform the necessary calculations for our example. The package can be obtained at https://cran.r-project.org. A useful integrated software environment is given by by RStudio https://www.rstudio.com/. Using RStudio R scripts can easily be used to run the respective R commands. Data and a R script for our example are given in the supplementary material.

Importing and manipulating data

The data from our example are read from an Excel csv file and stored as an object named “diag.data”. The command “read.csv2” reads Excel files in .csv format. First comes the name of the file. Next “header=T” implies that the first line of the file contains the variable names. Finally “na.string=. ‘means missing values are indicated by to “.”

diag.data<-read.csv2(“pct_example.csv”,header=T,na.string=“.”)

This object “diag.data” contains the data and can be modified. Here, the data column “Procalcitonin” contains the biomarker values in g/L. In a first step we create a new binary indicator named “PCT” which is a new column of our data. This indicator takes the value “1” if the Procalcitonin level is ≥ 0.5 g/L and 0 otherwise. Then, we attach value labels. First, we declare the variable as a “factor” and assign the value labels.

## Apply the widely used cutoff value 0.5 g/L

diag.data$PCT<-ifelse(diag.data$Procalcitonin<0.5,0,1)

# Define the variable PCT as a factor

diag.data$PCT<-as.factor(diag.data$PCT)

# assign value labels levels(diag.data$PCT)<-c(‘<0.5 g/L’,‘>=0.5 g/L’)

attach(diag.data)

The command “attach” provides access to the individual elements of the data object “diag.data”. Now we can perform basic descriptive statistics using the command “summary”. For the variable “Proacalcitonin” we obtain

summary(Procalcitonin)
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.010 0.060 0.200 3.376 1.070 200.000

Calculating diagnostic parameters

In the next step we create a 2 × 2 table with our diagnostic test variable “PCT” vs. “sepsis2”. We exclude missing values indicated by “.” and change the ordering of the table and obtain

table05<-table(PCT,sepsis2,exclude=“.”)[2:1,2:1]
table05
sepsis2
PCT Yes No
>=0.5 g/L 296 229
<0.5 g/L 371 641

The package “epiR” is used to calculate sensitivity specificity etc. First we need to install the package (only once) and then load its functionality using the command “library”. Finally we submit our table named “table05” to the function “epi.tests” which calculates sensitivity etc.

install.packages("epiR"}
library(epiR)
epi.tests(table05)
This gives the (shortened) result
Point estimates and 95% CIs.
--------------------------------

`True prevalence *`	`0.43 (0.41, 0.46)`
`Sensitivity *`	`0.44 (0.41, 0.48)`
`Specificity *`	`0.74 (0.71, 0.77)`
`Positive predictive value *`	`0.56 (0.52, 0.61)`
`Negative predictive value *`	`0.63 (0.60, 0.66)`
`Positive likelihood ratio`	`1.69 (1.47, 1.94)`
`Negative likelihood ratio`	`0.75 (0.70, 0.82)`

Construction of a Fagan plot

A Fagan plot is constructed using the library “Teachingdemos” and submitting a prevalence of 0.43 and LR⁺=1.7 to the function “fagan.plot”.

library(TeachingDemos)
fagan.plot(0.43,1.7)

The result is shown in Figure 2.

Receiver operator curves

In the next step we use Procalcitonin as a continuous biomarker and construct a ROC-curve and calculate the AUC together with a 95% CI using the package pROC [21].

library(pROC)
cut1<-roc(sepsis2∼Procalcitonin,data=diag.data, percent=F,print.auc=T,ci=T) print(cut1)
Data: Procalcitonin in 870 controls (sepsis No) < 667 cases (sepsis Yes).
Area under the curve: 0.6407
95% CI: 0.613–0.6683 (DeLong)

In the next step we obtain the optimal cut off value based on Youden’s index J and plot the ROC curve

coords(cut1, “best”, “threshold”,best.method=“youden”) plot.roc(sepsisProcalcitonin, print.auc=T,ci=T,data=pct,percent=F,legacy.axes=T,grid=T)

The ROC curve is shown in Figure 4 and the cut off value is given by

threshold specificity sensitivity
0.175 0.562069 0.6521739

Sample size estimation

If we want to estimate the necessary sample size for a diagnostic Phase I study assuming an AUC = 0.7, 90% power and two sided significance level of 5% we can use:

library(pROC)
power.roc.test(auc=0.70, sig.level=0.05, power=0.90, alternative=“two.sided”)
One ROC curve power calculation
ncases = 40.21369
ncontrols = 40.21369
auc = 0.7
sig.level = 0.05
power = 0.9

Hence we would include 82 subjects into our study.

Corresponding author: Peter Schlattmann, Institute of Medical Statistics, Computer and Data Sciences Jena Bachstr. 18, 07743 Jena, Germany, E-mail: peter.schlattmann@med.uni-jena.de

Acknowledgements

I would like to thank Dr. Ljungstroem [3] and colleagues for the allowance to use the data from their study as an example for this article.

Research funding: None declared.
Author contributions: Single author statement.
Competing interests: Author states no conflict of interest.
Informed consent: Not applicable.
Ethical approval: Not applicable.

References

1. Fleischmann, C, Scherag, A, Adhikari, NKJ, Hartog, CS, Tsaganos, T, Schlattmann, P, et al.. Assessment of global incidence and mortality of hospital-treated sepsis. Current estimates and limitations. Am J Respir Crit Care Med 2016;193:259–72, https://doi.org/10.1164/rccm.201504-0781oc.Search in Google Scholar

2. Wacker, C, Prkno, A, Brunkhorst, FM, Schlattmann, P. Procalcitonin as a diagnostic marker for sepsis: a systematic review and meta-analysis. Lancet Infect Dis 2013;13:426–35, https://doi.org/10.1016/s1473-3099(12)70323-7.Search in Google Scholar

3. Ljungström, L, Pernestig, AK, Jacobsson, G, Andersson, R, Usener, B, Tilevik, D. Diagnostic accuracy of procalcitonin, neutrophil-lymphocyte count ratio, C-reactive protein, and lactate in patients with suspected bacterial sepsis. PLoS One 2017;12:e0181704, https://doi.org/10.1371/journal.pone.0181704.Search in Google Scholar PubMed PubMed Central

4. R Core Team. R: A Language and Environment for Statistical Computing. Vienna, Austria; 2019. Available from: https://www.R-project.org/.Search in Google Scholar

5. Altman, DG, Bland, JM. Statistics Notes: diagnostic tests 1: sensitivity and specificity. BMJ 1994;308:1552, https://doi.org/10.1136/bmj.308.6943.1552.Search in Google Scholar PubMed PubMed Central

6. Levy, MM, Fink, MP, Marshall, JC, Abraham, E, Angus, D, Cook, D, et al.. 2001 SCCM/ESICM/ACCP/ATS/SIS international sepsis definitions conference. Crit Care Med 2003;31:1250–6, https://doi.org/10.1097/01.CCM.0000050454.01978.3B.Search in Google Scholar PubMed

7. Singer, M, Deutschman, CS, Seymour, CW, Shankar-Hari, M, Annane, D, Bauer, M, et al.. The third international consensus definitions for sepsis and septic shock (Sepsis-3). JAMA 2016;315:801, https://doi.org/10.1001/jama.2016.0287.Search in Google Scholar PubMed PubMed Central

8. Agresti, A, Coull, BA. Approximate is better than “exact” for interval estimation of binomial proportions. Am Statistician 1998;52:119–26, https://doi.org/10.2307/2685469.Search in Google Scholar

9. Deeks, JJ, Altman, DG. Diagnostic tests 4: likelihood ratios. BMJ 2004;329:168–9, https://doi.org/10.1136/bmj.329.7458.168.Search in Google Scholar PubMed PubMed Central

10. Jaeschke, R. Users’ guides to the medical literature. III. How to use an article about a diagnostic test. B. What are the results and will they help me in caring for my patients? The Evidence-Based Medicine Working Group. JAMA: J Am Med Assoc 1994;271:703–7, https://doi.org/10.1001/jama.271.9.703.Search in Google Scholar PubMed

11. Koopman, PAR. Confidence intervals for the ratio of two binomial proportions. Biometrics 1984;40:513–7, https://doi.org/10.2307/2531405.Search in Google Scholar

12. Fagan, T. Nomogram for Bayes’s theorem. N Engl J Med 1975;293:257, https://doi.org/10.1056/NEJM197507312930513.Search in Google Scholar

13. Gigerenzer, G, Hoffrage, U. How to improve Bayesian reasoning without instruction – frequency formats [journal article]. Psychol Rev 1995;102:684–704, https://doi.org/10.1037/0033-295x.102.4.684.Search in Google Scholar

14. Gigerenzer, G What are natural frequencies? BMJ (Clinical research ed). 2011;343:d6386, https://doi.org/10.1136/bmj.d6386.Search in Google Scholar

15. Altman, DG, Bland, JM. Statistics Notes: diagnostic tests 3: receiver operating characteristic plots. BMJ 1994;309:188, https://doi.org/10.1136/bmj.309.6948.188.Search in Google Scholar

16. Cho, H, Matthews, GJ, Harel, O. Confidence intervals for the area under the receiver operating characteristic curve in the presence of ignorable missing data. Int Stat Rev 2018;87:152–77, https://doi.org/10.1111/insr.12277.Search in Google Scholar

17. Schuetz, P, Beishuizen, A, Broyles, M, Ferrer, R, Gavazzi, G, Gluck, EH, et al.. Procalcitonin (PCT)-guided antibiotic stewardship: an international experts consensus on optimized clinical use. Clin Chem Lab Med 2019;57:1308–18, https://doi.org/10.1515/cclm-2018-1181.Search in Google Scholar

18. Knottnerus, JA, Muris, JW. Assessment of the accuracy of diagnostic tests: the cross-sectional study. J Clin Epidemiol 2003;56:1118–28, https://doi.org/10.1016/s0895-4356(03)00206-3.Search in Google Scholar

19. Flahault, A, Cadilhac, M, Thomas, G. Sample size calculation should be performed for design accuracy in diagnostic test studies. J Clin Epidemiol 2005;58:859–62, https://doi.org/10.1016/j.jclinepi.2004.12.009.Search in Google Scholar PubMed

20. Obuchowski, NA, Lieber, ML, Wians, FH. ROC curves in clinical chemistry: uses, misuses, and possible solutions. Clin Chem 2004;50:1118–25, https://doi.org/10.1373/clinchem.2004.031823.Search in Google Scholar PubMed

21. Robin, X, Turck, N, Hainard, A, Tiberti, N, Lisacek, F, Sanchez, JC, et al.. pROC: an open-source package for R and S+ to analyze and compare ROC curves. BMC Bioinf 2011;12:77, https://doi.org/10.1186/1471-2105-12-77.Search in Google Scholar PubMed PubMed Central

Received: 2022-03-10

Accepted: 2022-03-10

Published Online: 2022-03-31

Published in Print: 2022-05-25

Statistics in diagnostic medicine

Abstract

Motivating example

Sensitivity and specificity of a diagnostic test

Likelihood ratios

Positive and negative predictive values

Bayes’ theorem

Fagan plot

Communicating predictive probabilities

Receiver operator curves

Sample size estimation

Using R for the calculations

Importing and manipulating data

Calculating diagnostic parameters

Construction of a Fagan plot

Receiver operator curves

Sample size estimation

Acknowledgements

References

Journal and Issue

Articles in the same Issue