Skip to content
Publicly Available Published by De Gruyter August 4, 2023

A guide to conducting systematic reviews of clinical laboratory tests

  • Andrew C. Don-Wauchope ORCID logo EMAIL logo , Karina Rodriguez-Capote ORCID logo , Ramy Samir Assaad , Seema Bhargava ORCID logo and Annalise E. Zemlin ORCID logo

Abstract

Clinical laboratory professionals have an instrumental role in supporting clinical decision making with the optimal use of laboratory testing for screening, risk stratification, diagnostic, prognostic, treatment selection and monitoring of different states of health and disease. Delivering evidence-based laboratory medicine relies on review of available data and literature. The information derived, supports many national policies to improve patient care through clinical practice guidelines or best practice recommendations. The quality, validity and bias of this literature is variable. Hence, there is a need to collate similar studies and data and analyse them critically. Systematic review, thus, becomes the most important source of evidence. A systematic review, unlike a scoping or narrative review, involves a thorough understanding of the procedure involved and a stepwise methodology. There are nuances that need some consideration for laboratory medicine systematic reviews. The purpose of this article is to describe the process of performing a systematic review in the field of laboratory medicine, describing the available methodologies, tools and software packages that can be used to facilitate this process.

Introduction

Clinical laboratory professionals have an instrumental role in supporting clinical decision making with the optimal use of laboratory testing for screening, risk stratification, diagnostic, prognostic, treatment selection, and monitoring of different states of health and disease. The quality and consistency of medical practice is increasingly becoming an important topic. This is reflected in many national policies to improve patient care using clinical practice guidelines or best practice recommendations. The development of these guidelines requires synthesis of the different scientific studies that build the evidence for clinical practice. A systematic review is an established part of guideline development and is an integral part of understanding the evidence that supports the clinical use of a test [1]. Implementation of laboratory test results into clinical practice requires appraisal of the available evidence. Systematic attempts to record observations in a reproducible unbiased fashion markedly increase the confidence one can have in knowledge about patient prognosis, the value of diagnostic tests and the efficacy of treatment [2]. The goal of evidence-based laboratory medicine is to improve clinical outcomes in patient care while ensuring the effective use of healthcare resources [3]. The development of the GRADE system [4] for evaluating evidence is an important part of the process for a systematic review, and the application of the GRADE scoring to evidence based laboratory medicine is a key component of clinical practice guideline development.

The scope of laboratory medicine includes a wide range of tests from different pathology disciplines and a variety of reasons for using each test [5]. Diagnostic test accuracy (DTA) should be evaluated for a question related to the specific purpose of a test [6]. For example, its efficiency to confirm or exclude a disease or to stratify prognosis or treatment response. The guidance given in this review cannot be comprehensive for all applications of laboratory medicine and, thus, focuses on aspects that are more common. This will enable a wider application across the spectrum of laboratory medicine disciplines.

Firstly, the use of the test can be described under a few categories. Diagnostic tests should be evaluated for diagnostic accuracy for the condition(s) where clinical utility in support of making diagnostic decisions is expected. An example is the use of natriuretic peptides for heart failure diagnosis [7, 8]. Monitoring tests can be evaluated for use in management or monitoring of disease or process. A well known example is HbA1c in diabetes management [9]. Predictive tests can be used to predict a variety of outcomes to therapy or other interventions, e.g. companion diagnostics to predict disease free survival in colon cancer [10]. For early disease diagnosis, the concept of screening tests is applied, e.g. Human Papilloma Virus detection or cytology of cervical samples for early detection of cervical cancer [11]. Prediction of clinical outcomes tests are used on their own or in combination with other factors, e.g. the Framingham Risk Score to predict cardiovascular outcome [12].

Secondly, a comparison between two tests is often made to determine which test is better at determining the diagnosis or desired clinical outcome. One of the tests may be accepted as the standard, but there are other possibilities. For example, the test can be evaluated against a defined outcome, often considered a higher level of evidence [13]. In these situations, comparison between tests is not a requirement. Tissue tests from anatomical pathology are often the gold standard for diagnostic comparison in systematic reviews that are not specifically looking at laboratory diagnostic testing. However, the diagnostic accuracy of these gold standard tests has not always been confirmed in outcome studies.

Even though there are several reviews on the topic of how to perform a systematic review for diagnostic testing [6, 14], [15], [16], [17], there is a gap in this form of evidence to support the use of laboratory medicine testing for many tests and diseases. Thus, laboratory professionals should be seeking opportunities to improve the evidence synthesis.

The Cochrane Collaboration formed a DTA working group in the late 1990s and started publishing on-line guides in 1996. The Cochrane Methods Group on Systematic Review of Screening and Diagnostic Tests, had provided guidance on doing systematic reviews of diagnostic tests on the Cochrane website [18]. A review of the systematic reviews of diagnostic tests published in 2000, pointed out that many systematic reviews did not meet the specified standards [19]. The quality of systematic reviews published since then has been inconsistent and many of them have not used appropriate methodology [20]; however some improvement has been noted since the publication of Standards for Reporting Diagnostic accuracy studies (STARD) guidelines [21].

A helpful development in the systematic review field is the availability of software packages that assist in the processing of systematic review. These include tools to support searching for literature, screening the literature and analysing the data [22]. An example of a commercial software package is available at the following link. https://www.evidencepartners.com/resources/guides-white-papers/buyers-guide-to-systematic-review-software [23]. There are several websites with tools that are useful in this area. University libraries may have a resource page similar to the one at University College London https://library-guides.ucl.ac.uk/systematic-reviews/software [24]. The systematic review toolbox is an on-line catalogue that provides a number of resources http://systematicreviewtools.com/index.php [25].

This review is aimed at providing the basic information required to perform a systematic review in laboratory medicine. A recent publication described the process of systematic review in 24 steps [26] and we have adapted this approach to the clinical laboratory and cross referenced the Cochrane Guide to Systematic Reviews in Diagnostic testing [27].

Setting the scenario for the systematic review

Define the research question

Ideas for research questions arise from many sources, including inquiries received by the clinical laboratory professional. Transforming the idea into a good research question is the first and most critical step when performing a systematic review. Poorly framed questions make it difficult to design the search strategy, decide on inclusion and exclusion criteria, or make appropriate conclusions from the data. To formulate a good question the use of a structured framework such as the Population, Intervention (or Index test), Comparator, Outcome, Time and Setting (PICOTS) criteria (Figure 1) should be defined [28], [29], [30]. There are several other mnemonics which can be used in place of PICOTS, e.g. SPIDER [31], SPICE [32], etc. which are better suited for research questions dealing with qualitative analysis, or ECLIPSE [33] which is suited for health policy/management information. The value of a well-structured question using a framework such as PICOTS keeps the focus on the question at hand and helps to prevent diluting the information with interesting, but only semi-related studies [34].

Figure 1: 
Graphical description of the systematic review process.
Figure 1:

Graphical description of the systematic review process.

Before initiating a detailed literature search of the defined research question, it is pertinent to ascertain that this research question has not already been successfully dealt with. A preliminary search of the literature will provide an idea about the amount, scope and depth of the literature pertaining to the question. Critically appraising any previously conducted systematic reviews against the defined question will allow the team to establish if an update or repeat evaluation is required. As of 2002, the National Center for Biotechnology Information (PubMed) database has a systematic reviews search filter so that citations for this type of article can be more easily identified (filter terms meta-analysis and systematic review under type of article). Cochrane Collaboration [35] has many diagnostic accuracy systematic reviews in the database [36]. Using federated search engines such as ACCESSS (ACCESSSS | Home) [37] and Trip Medical Database (https://www.tripdatabase.com/) [38] will help identify systematic reviews related to the specific question. The PROSPERO https://www.crd.york.ac.uk/prospero/ platform is one registry of systematic reviews that are in progress [39].

If the question has not already been answered, it should now be edited with the knowledge gained from the initial literature search. The refined question should clearly identify the population, the diagnostic test investigated, the reference or comparator test, and the outcome of interest [40]. Other aspects relevant to testing (pre-analytical and post analytical) such as timing and the setting of the test should be considered.

The question may address etiology, diagnosis, prognosis, treatment, or prevention. Some possible areas to focus a question are:

  1. analytical performance of a test e.g. impact of test imprecision or timing on outcome

  2. mode of delivery of a test e.g. use of point-of-care testing in reducing time of diagnosis

  3. use of a test for diagnosis e.g. rule in or rule out

  4. use of a test for prognosis i.e. predicting clinical outcomes

  5. use of a test for treatment selection i.e. personalised medicine

  6. use of test for treatment optimisation and compliance

Table 1 provides some examples of PICOTS criteria in a selection of systematic reviews. In one example, a pathology group was interested in the use of imprint cytology to examine the diagnostic performance of sentinel node(s) in women with breast cancer, compared to paraffin section examination to identify women at risk of metastatic breast cancer [41]. They set Population, Intervention (or Index test), Comparator and Outcome (PICO) criteria (see Table 1 row 8). The question formulated was: In patients undergoing breast biopsy, what is the diagnostic performance of imprint cytology compared to paraffin embedded sections for the diagnosis of metastatic disease [41].

Table 1:

Descriptions from some examples of systematic reviews of the different population, index test, comparator test, outcome, timing and setting.

Study P I C O T S
Smedemark et al. [44] People presenting with symptoms of acute respiratory infections CRP (POCT) guided antibiotic therapy Guideline based antibiotic administration
  1. Participant recovery in seven day follow up

  2. Participant mortality in 28 day follow up

At presentation Primary care
Haraka et al. [76] People being investigated for tuberculosis Xpert MTB/RIF Sputum smear microscopy
  1. Successful treatment outcome

  2. All-cause mortality

With sputum collection
Colli et al. [77] Adults with chronic liver disease AFP (cut-off 20 ng/mL)

AFP (cut-off 200 ng/mL)

US abdomen

AFP + US
Diagnosis of hepatocellular carcinoma Secondary or tertiary care or surveillance
Andriolo et al. [42] Adults with a diagnosis of sepsis, severe sepsis or septic shock PCT (at least one measurement) Three different possible groups i) standard methods to diagnose and stage sepsis; ii) serum PCT evaluation or PCT-guided therapy; iii) other biomarkers (e.g. CRP, interleukins etc.) 1) Mortality at up to 28 days; 2) time receiving antimicrobial therapy; 3) change in anti-microbial regimen (broad to narrow spectrum) Intensive care
Nagar et al. [44] Infants ≥34 weeks of gestational age receiving phototherapy or in the post-phototherapy phase Transcutaneous bilirubin (TcB) Total serum bilirubin (TSB) Agreement statistic between TcB and TSB measurements provided either as correlation coefficient or as the mean and standard deviations of absolute differences in bilirubin values by the two methods During phototherapy or in the post-phototherapy phase
Sethi et al. [78] Patients suspected of ACS s/hsT I/T Conventional tropinin Diagnosis of ACS or AMI At the time of presentation
Matthaiou et al. [79] Critically ill adults with suspected or proven sepsis PCT-guided antibiotic therapy Empirical or guideline based antibiotic administration
  1. Duration of antibiotic therapy for first episode of infection

  2. Mortality at 28 days

Not defined Intensive care
Tew et al. [41] Patients undergoing examination of sentinel nodes during breast biopsy Imprint cytology Paraffin embedded sections Diagnosis of metastatic disease
  1. P, population; I, intervention; C, comparator; O, outcome; T, timing; S, setting; CRP, C-reactive protein; POCT, point of care test; MTB, mycobacterium tuberculosis; RIF, resistance to rifampin; AFP, alpha fetoprotein; US, ultrasound; ACS, acute coronary syndrome; s/hsT, sensitive/high sensitivity troponin; AMI, acute myocardial infarct; PCT, serum procalcitonin.

When there is only one comparator test and one outcome of interest, the research question is simpler. However, when there are several comparator tests and several outcomes, then the question may be single, but the objectives would be several. For example, when a group wants to evaluate the diagnostic and prognostic value of procalcitonin in patients with sepsis [42], the details (see Table 1 row 4) demonstrate multiple possible combinations of Index test to comparator test and the review reported on five different combinations. The outcomes are also categorised into primary and secondary and the five comparator groups are assessed against the multiple outcomes [42].

Build a team

Systematic reviews are not solitary endeavours and a team should aim to gather the following range of skills.

  1. Content expert. These team members know the area of medicine of interest. For a review of a diagnostic laboratory test, a mix of clinicians and laboratory physicians or scientists will provide the wide range of expertise needed.

  2. Process expert. These team members know the process and pitfalls of systematic reviews.

  3. Librarian. A professional librarian can be very helpful in designing and performing the initial search for candidate studies.

  4. Statistical experts. A statistician will be valuable in analysing the data and making inferences. If pooling the data using meta-analysis is planned, then statistical advice is essential to guide the appropriate data analysis with consideration of reporting and other bias.

  5. Support. Large reviews will require support staff to help with the acquisition of papers, extraction of data and record keeping. Smaller studies can often be managed by one or two interested investigators.

Knowing that the team, have the expertise and the ability to avoid bringing their pre-conceived bias to the project is important. Some expert reviewers to support the team may also be helpful and should be consulted at key stages of the process to ensure that the systematic review is conducted to the standards that will allow publication. These people are more likely to be content experts, but the addition of a process or method expert is also helpful.

Define criteria for selection of studies

The research question is now set and the selection criteria along with inclusion and exclusion criteria and timing and setting components need to be established. In the examples (Table 1), the population defined is often broad and there could be modulating factors. Hence, it is important to clearly define the inclusion and exclusion criteria to get papers with data that can be summarised and combined for potential meta-analysis. With respect to population, it may be important to define age sex, health status, presence, or absence of specific medical condition. The methodology of the index and reference tests should be defined as clearly as possible. A systematic review of the accuracy of transcutaneous bilirubin in neonates exposed to phototherapy is an example of well formulated inclusion and exclusion criteria for population, intervention and comparator test. The inclusion criteria were primary studies or case series with 20 or more human newborns (≥34 weeks); transcutaneous bilirubin measurement on all subjects; serum or whole blood total bilirubin on all subjects; and a comparison between those two methods [43]. In a systematic review looking at the use of point of care C-reactive protein, several clinical outcomes were well defined. These included the number of participants given an antibiotic prescription at the index consultation; the number of participants given an antibiotic prescription within 28 days; clinical recovery within seven days; mortality within eight days; the number of participants needing hospital admission within 28 days; and clinical recovery within 28 days. The timing is specified, and the setting is defined as primary care elsewhere [44].

Create the data collection tool

Planning the data collection helps in establishing the selection criteria as well as defining the search strategy. The Data collection tool can be built in several different applications such as spreadsheets, database applications or specific software such as Covidence (Veritas Health Innovation Ltd, Melbourne, Australia) or DistillerSR (Distiller SR Inc, Ottawa, Canada). Collecting all the details for the PICO components, inclusion and exclusion criteria, the study design and methods, and study quality is important. This process should be clearly defined so that it can be used by a wide range of potential data extractors. The tool should be tested out on a few of the studies that the team has used to formulate the question.

Design the search strategy

The components of the PICO question define the key words for an effective search strategy.

There are several approaches to locating candidate papers [45] including; electronic searching of the databases of publications [46]; searching the reference lists of relevant papers; communication with well-known authors in the field of interest. The use of the so called ‘grey literature’ includes, but is not limited to, non-peer reviewed publications (conference proceedings, letters to the editor), product monographs, presentations, and personal communications. In the field of laboratory medicine and diagnostic testing, this ‘grey literature’ may well form a significant source of appropriate information and a decision to use or exclude this must be made early on [47].

Searching the databases of published papers to locate the candidate papers of interest usually requires the assistance of a librarian. There is no single search term that can be used to denote diagnostic studies [45] and several trial searches may be necessary to refine the search terms. Appropriate selection of the key words and use of the Boolean operators (AND and OR) will ensure that the search is wide enough to capture all relevant studies. Terms that are too narrow may exclude appropriate studies, whereas terms that are too general will yield large numbers of irrelevant papers. The most widely used databases, PUBMED, MEDLINE, EMBASE, and GoogleScholar have significant overlap but there are papers that may be missed by using only one database [35].

Getting the systematic review started

Document and register the study protocol

A well written protocol is helpful for the investigators as well as for registration with a database that records the systematic reviews that are being conducted. The PRISMA-P https://prisma-statement.org/Extensions/Protocols [48] provide a list of items that should be included in the review protocol [49]. To ensure clarity, the Cochrane training guide recommends using experts to review the protocol to make sure that it is understandable and complete. Registering the protocol with a platform such as PROSPERO https://www.crd.york.ac.uk/prospero/ [50] is important [39].

Perform the literature search and build the database

Using the criteria defined for the PICO question each of the selected databases must be searched. Defining a search strategy in each of the databases is required, as these are not identical. The search results of should be saved in a file format that is compatible with the selected software. Import the saved files into a software application such as Covidence, DistillerSR or Rayyan (Rayyan Systems, Inc, Cambridge MA, USA) or into a reference manager software such as EndNote™ (Clarivate PLC, London, UK) or Zotero (Corporation for Digital Scholarship, Vienna, VA, USA). Each package will require a specific file format, and this should be included in the study protocol. The software applications provide instructions about exporting the files in the correct format from the searchable databases for import.

Identify and remove duplicates

The use of multiple databases may result in replication of the citations imported into the software application for managing the references and a defined strategy and systematic approach to the removal of duplicates will save time [51].

Reviewing the evidence

Title and abstract review

The vast majority of the comprehensively collected studies can be excluded by applying the inclusion and exclusion criteria in a title and abstract screen. A screening questionnaire that itemises the inclusion and exclusion questions is a good tool for documenting this phase. This tool should be tested on a few articles prior to full implementation. Further screening of the full text of remaining articles will identify those papers that meet the criteria for the review. The screening phase should be conducted by two independent reviewers to increase reliability. The title and abstract should be screened simultaneously and if title only is available, then these should move forward to the next phase of screening with the exception of those clearly meeting the exclusion criteria. Software packages that facilitate this have been mentioned in the introduction and artificial intelligence (AI) assisted learning applications such as ASReview https://asreview.nl/ [52, 53] are available to assist with sorting the references into ones that are more likely to meet the selection criteria. All screened articles that are selected by both primary reviewers will proceed to the full text review. Discrepancies can be adjudicated by a third reviewer who should be an expert in the field. Alternatively, the two reviewers can meet and agree on the final decision.

Collect full texts using predefined collection criteria

All the relevant manuscripts can be collected in the reference manager or via other software applications. A librarian and library access to electronic archives of published manuscripts enables this. Some articles require the assistance of library resources to locate the full text through interlibrary loan. In some instances, authors may need to be contacted to obtain a copy of the full text. The screening questionnaire designed for title and abstract review can be enhanced with the additional details expected to be found in the full text for the inclusion and exclusion criteria. The enhanced questionnaire can be documented in the software package to facilitate the selection of articles that meet the criteria. This phase of screening is also done by two independent reviewers, with discrepancies managed as described previously. Studies that meet the criteria move forward to the data collection phase while those that are excluded by both reviewers are removed. The reasons for exclusion are documented for the report.

Identify grey literature with the expert reviewers

The expert reviewers (members of the systematic review team) may be able to identify ongoing research and grey literature that is not easily identified in the standard databases. They would also be able to review the collected set of manuscripts to ensure that no important studies are missed and that the best report is included in the final data set. Additional references should be placed into a data set and run through the process from Title and Abstract review.

Confirm the completeness of the literature collection

Once the updated list is available it is helpful to look for publications that have been either cited in or that cite the included list. Although these may have already been through the screening process this is a good check for completeness of the data set. Using a citation database can facilitate this action as many of these now include the citation data e.g. Scopus and Web of Science. Previous reviews and systematic reviews should also be checked for references to make sure that these have been put through the screening process. Any new manuscript identified needs to be fully screened from Title and Abstract review.

Identify the included literature selection

The final list of manuscripts should now be defined. Each stage of the screening process should be documented in a flow chart that describes the different inclusion and exclusion stages, the number of manuscripts reviewed at each stage and the reasons for exclusion. This is an important part of any published systematic review. Figure 2 is an example of a flow chart.

Figure 2: 
An example of a flow chart. Used with permission from Smedemark SA, Aabenhus R, Llor C, Fournaise A, Olsen O, Jørgensen KJ. Biomarkers as point-of-care tests to guide prescription of antibiotics in people with acute respiratory infections in primary care. Cochrane Db Syst Rev. 2022;2022(10):CD010130. Copyright © 2019 The Cochrane Collaboration. Published by John Wiley & Sons, Ltd.
Figure 2:

An example of a flow chart. Used with permission from Smedemark SA, Aabenhus R, Llor C, Fournaise A, Olsen O, Jørgensen KJ. Biomarkers as point-of-care tests to guide prescription of antibiotics in people with acute respiratory infections in primary care. Cochrane Db Syst Rev. 2022;2022(10):CD010130. Copyright © 2019 The Cochrane Collaboration. Published by John Wiley & Sons, Ltd.

Processing the evidence

Data extraction

Extraction of the data is best done using the pre-determined series of questions in the data collection tool. Easily retrievable information such as country of origin, year of publication, details of study design and recruitment can be extracted by content experts as well as non-experts (students, support personnel), whereas detailed content specific information is best extracted by either content or process experts on the team. Each item should be independently extracted by two people and then matched. Discrepancies should be corrected by consensus or a third person. Data should be collected in a consistent way which may include the need to convert units of measurement, maintain consistent definitions, abbreviations etc. The use of experts to guide the collection of data is particularly important if there are known analytical method differences. The content should be collected in a way that facilitates the analysis.

Risk of bias evaluation for individual studies

The oft quoted phrase of “garbage in-garbage out” applies equally to systematic reviews. Grouping and pooling data from several manuscripts can only be done if the quality of the underlying papers is considered in the appropriate context. The methodological quality of each paper selected for inclusion is critically appraised for potential sources of bias and variation that may have influenced their estimates of test accuracy. The applicability to the review question is concurrently evaluated. There are guidelines and tools such as. the Quality Assessment tool for Diagnostic Accuracy Studies (QUADAS/QUADAS-2) [54, 55] that are useful in designing the assessment of quality.

Ideally, studies evaluating diagnostic tests must include the characteristics of the study in enough detail that readers can evaluate the quality of the data and conclusions. The STARD criteria provide a checklist of 25 items, which greatly improve the quality of studies evaluating diagnostic tests [56]. STARD was updated in 2015 with additional five criteria bringing the total to 30 with some revisions to other criteria [57]. However, poor reporting of original diagnostic test accuracy research remains a challenge [20].

The QUADAS-2 is the current version of the QUADAS tool, developed by Whiting et al. [55]. This tool consists in a checklist of questions or items structured into four key domains, each of these rated in terms of risk of bias and applicability to the research question.

  1. Patient selection

  2. Index test(s)

  3. Reference standard

  4. Flow and timing

Each question is answered as yes, no, or unclear. Depending on the diagnostic test under evaluation, some of the QUADAS questions will be more relevant than others in assessing study quality. It is not necessary to use all the questions, but only those applicable to the study. In a systematic review one should, however, use the same questions on all included papers. An overall or cumulative score is not recommended [58]. Table 2 is a previously published example of a QUADAS-2 table. Since 2011, QUADAS-2 has been adopted widely and applied in reviews of diagnostic accuracy. There has been some concern raised regarding QUADAS-2 by some authors. Schueler et al. indicated a limitation associated with calculating inter-rater agreement only on the domain questions [59]. Cook et al. felt that the tool was not able to discriminate between poorly and strongly designed studies [60], and that the QUADAS-2 offered no obvious advantage over to the original 14-item QUADAS [54]. They further criticized the purposively qualitative nature of the QUADAS-2, which does not recommend scoring a study using a numeric value, a fundamental quality of assessment scales [60].

Table 2:

Example of a published QUADAS-2 table.

Study Patient selection Index text Reference standard Time and flow
Risk of bias Concerns about applicability Risk of bias Concerns about applicability Risk of bias Concerns about applicability Risk of bias
Catalona 2003 High Unclear Unclear Unclear Unclear Unclear Unclear
Catalona 2011 Unclear Low Unclear Low Unclear Low Low
Filella 2014 Unclear Low Low Low Unclear Unclear Low
Guazzoni 2011 Low Low Low Low Low Low Low
Ito 2013 Low Low Unclear Low Unclear Unclear Low
Jansen 2010 Unclear Low Unclear Low Unclear Unclear Low
Khan 2003 Unclear Low High Unclear Unclear Unclear Low
Lazzeri 2012 Unclear Low Low Low Low Low Low
Lazzeri 2013 Low Low Low Low Unclear Low Low
Le 2010 Low Low Unclear Low Low Low Low
Maerini 2014 Low Low Unclear Low Unclear Low Low
Mikolajczyk 2004 High Unclear High Unclear Low Unclear Unclear
Miyakubo 2011 Unclear Low Unclear Low Unclear Low Unclear
Ng 2014 High Unclear Low Low Low Low Low
Sokoll 2008 High Unclear Unclear Low Unclear Low Low
Sokoll 2010 Low Low Unclear Unclear Unclear Unclear Low
Stephan 2009 Low Low High Unclear Unclear Low Low
  1. Used with permission from V. Pecoraro, L. Roli, M. Plebani, T. Trenti, Clinical utility of the (-2)proPSA and evaluation of the evidence: a systematic review. Clin Chem Lab Med 2016;54:1123–32. https://doi.org/10.1515/cclm-2015-0876. Supplementary Table 2: Methodological quality assessment of included studies.

There are also adaptations of the tool for other types of study such as QUADAS-C for comparative diagnostic accuracy studies [61] and QUAPAS for prognostic accuracy studies [62].

Regardless of the tool used, the risk of bias and applicability to the research question should be clearly reported for each included study, summarised, and discussed in the systematic review report. The tool used should be clearly reported. Authors need to define in the protocol what quality items they will include and how these items will be used specific to their review. Assessments should be undertaken by at least two authors, and there should be an explicit procedure to resolve disagreements. Table 3 describes the different types of bias that should be considered.

Table 3:

Types of bias.

Types of bias Example
Population selection bias Recruiting online and biasing those without internet
Statistical bias Poor study design
Verification/performance bias No gold standard/differences in care
Citation bias Positive results more likely to be cited
Publication bias Positive results more likely to be published
Language bias Studies more likely to be published in English/more likely to use studies published in own language
Database bias Some databases more likely to index certain languages or journals
Truncation bias Study published in shorter form with less details
Time-lag bias Delayed publication of bias leading to outdated evidence
  1. Adapted from Rothstein DHR, Sutton DAJ and Borenstein DM (2006). Chapter 1. Publication Bias in Meta‐Analysis (pp 1–7). In: Publication Bias in Meta‐Analysis: Prevention, Assessment and Adjustments. doi:10.1002/0470870168.ch1.

Data analysis

Studies of diagnostic tests involve comparing the test result in groups of patients with and without the outcome of interest. The outcome of interest is determined by an independent reference or “gold standard” test. In this context the test results are classified into dichotomous groups based on expected performance for diagnosis. Systematic review allows for the development of more widely applicable 2 × 2 tables and the calculation of false positive (FP), true positive (TP), false negative (FN), true negative (TN), positive predictive values (PPV), negative predictive values (NPV), likelihood ratios (LR), diagnostic odds ratios (OR) and receiver operator characteristics (ROC) curves.

Sensitivity refers to the proportion of individuals with the outcome who return a positive test (TP/TP + FN). Specificity refers to the proportion of individuals without the outcome of interest who return a negative test (TN/TN + FP).

Likelihood ratios refer to the ability of positive and negative tests to rule in or rule out the outcome of interest. The describe how many more times the test result is likely to be found in patients with the outcome than in a patient without the outcome of interest. Ratios greater than 10 or less than 0.1 provide convincing evidence, those greater than 5 or less than 0.2 provide strong evidence. Ratios less than 5 or greater than 0.2 provide only equivocal evidence. A convenient nomogram by Fagan [63] that will convert pre-test probability and likelihood ratio into post-test probability, has been reprinted in numerous publications and textbooks.

In the context of using laboratory test for screening, prognosis, and monitoring of disease the studied outcomes are more complicated, especially those evaluating predictive health outcomes. In such studies, other calculations and metrics are used, such as overall survival (OS), disease free survival (DFS), hazard ratio (HR), odds ratio (OR) and relative risk reduction (RRR). There may be health economic data that could be captured for example cost effectiveness, admission and/or re-admission rates.

Decide about the appropriateness of meta analysis

One of the important decisions is whether to combine the data and perform a meta-analysis. Meta-analysis is a statistical technique for combining the results of independent studies that will result in a summary statistic [64]. The most important factor in deciding whether to combine studies is the heterogeneity between the studies. If the studies are very similar with respect to patient characteristics, study design, reference test, and outcomes, it makes sense to combine them. Hatala et al. provided an easy to read guide on how to approach combining studies [65]. Although the increased number of studies reflects the expanded interest for the test of concern, two studies is enough to perform a meta-analysis. There are key criteria that should be met for a meta-analysis [66]. All the individual study outcomes need to be comparable and described in a format that allows meaningful pooling. The interventions and comparators need to be similar enough and the data needs to be reported adequately for pooling. The detail for techniques of meta-analysis are beyond the scope of this manuscript. However, for example in a DTA meta-analysis, there are several types including the following [67]:

  1. Univariate logistic regression without and then with random effects

  2. Diagnostic odds ratio (OR) to give a combined measure of sensitivity and specificity.

  3. Bivariate meta-analysis to ascertain within-study variability using the binomial likelihood model

  4. Summary receiver operating characteristics (sROC) curves derived using regression models

In addition, there are ways to use modification of meta-analysis such as random-effects model or meta-regression [66, 68]. There are statistical packages and tools to perform the meta-analyses [69], [70], [71].

Thus, applying the right statistical tools and packages to the data of a systematic review can be a complicated process. This is where the team statistician will be required to guide the appropriate use of methods.

Describing the evidence

Report the findings

The report should describe the overall literature search, the populations included, interventions or investigations used, the comparators and the outcomes. The types of studies should be reported with the key characteristics and tabulated into sections that help the reader find relevant summary material. This should include any pertinent summary statistics such as OR, hazard ratios (HR), relative risk (RR), diagnostic accuracy etc. Where 2 × 2 data is available the relevant statistics should be reported. Figure 3 demonsrates a Forest Plot that is another common format for reporting summary findings.

Figure 3: 
A description of a Forest plot. The explanations in text boxes on diagram explain the features of a Forest plot.
Figure 3:

A description of a Forest plot. The explanations in text boxes on diagram explain the features of a Forest plot.

Consider the impact of reporting bias

Reporting bias is prevalent as more studies with positive findings are published than those with negative findings. This selection bias is derived from both investigators and journals for a range of different reasons. The Cochrane handbook and the GRADE handbook describe this in more detail. To try and avoid this, systematic review investigators need to look for grey literature and unpublished studies. The use of a funnel plot (Figure 4) is one way of assessing the opportunity for publication bias with either a visual assessment or the use of the Egger test [72].

Figure 4: 
Hypothetical funnel plots with descriptions. Modified from: Publication Bias in Meta-Analysis: Prevention, Assessment and Adjustments. Edited by H.R. Rothstein, A.J. Sutton and M. Borenstein 2005 John Wiley & Sons, Ltd ISBN: 0-470-87014-1. SE, standard error; OR, odds ratio.
Figure 4:

Hypothetical funnel plots with descriptions. Modified from: Publication Bias in Meta-Analysis: Prevention, Assessment and Adjustments. Edited by H.R. Rothstein, A.J. Sutton and M. Borenstein 2005 John Wiley & Sons, Ltd ISBN: 0-470-87014-1. SE, standard error; OR, odds ratio.

A funnel plot assumes that studies with high precision will be plotted near the average and studies with low precision spread evenly on both sides of the average, creating an inverted funnel shape. A visually asymmetric funnel may reveal that studies with lower precision may even have greater effect (better diagnostic performance) than higher precision studies, denoting what is known as “small study effect”, discovering the reporting bias as the main reason behind it. In Figure 4 Open circles indicate smaller studies of inadequate quality with weaker evidence for effects. (a) symmetrical plot in the absence of bias; (b) asymmetrical plot in the presence of publication bias (smaller studies showing no beneficial effects are missing); (c) asymmetrical plot in the presence of bias due to low methodological quality of smaller studies whose results are biased towards larger beneficial effects.

The purpose of checking for reporting bias is to discuss the potential impact of the different factors that may lead to the bias and how it may impact the overall evaluation of the quality of the evidence reported in the systematic review.

Investigate heterogeneity in the results

Heterogeneity is the variability between results from different studies. Too much heterogeneity indicates that meta-analysis may be contra-indicated [66]. An assessment of the inconsistency of results is an important step. Understanding the reasons behind the heterogeneity of results can be explored with techniques such as sub-group analysis and/or stratification using different possible effect modifiers. There are statistical models that can be used to explore heterogeneity, and these are described in the Cochrane handbook. The heterogeneity statistic I 2 can be obtained and considered in the reporting as an indicator of heterogeneity. The GRADE handbook provides a rule of thumb interpretation [73]. All the statistical approaches have limitations and the key item here is to review the heterogeneity and try and understand the underlying factors and discuss this in the report.

Grade the evidence

The Grading of Recommendations, Assessment, Development and Evaluation (GRADE) working group https://www.gradeworkinggroup.org/ [4] has developed a handbook https://gdt.gradepro.org/app/handbook/handbook.html#h.f7lc8w9c3nh8 [73] that provides the basis of grading recommendations made in clinical practice guidelines and systematic reviews. One of the purposes of a systematic review is to synthesise the knowledge from primary studies and this synthesised knowledge is fundamental to the recommendation that will guide clinical practice. Commenting on the factors that facilitate the GRADE system (Study design, Risk of bias, Indirectness, Inconsistency, Imprecision, and publication bias) should all be done as part of the systematic review. The GRADE handbook does point out that the methodology used for primary studies in the diagnostic testing realm is less robust, because there are less randomised controlled trials and more observational studies, but in principle the criteria are applicable. Table 4 and Figure 5 describes the GRADE evaluation and the up and down grading factors. Consistency of the results and the confidence limits around the estimate of effect are important factors.

Table 4:

Factors that impact the quality of evidence for diagnostic test accuracy (DTA) studies.

Factors affecting the quality of evidence Application for DTA studies
Study design Unlike other intervention studies, diagnostic accuracy cross-sectional or cohort studies, directly comparing subjects test result with the available “gold” standard diagnostic strategy are considered high quality evidence (unless affected by other factors).
Risk of bias (limitations in study design and execution) Evaluated differently than for intervention studies by examining:
  1. Representativeness of the target population.

  2. Independent comparison with the gold standard test strategy

  3. Performance of the tested and reference tests for all subjects.

  4. Provision of diagnostic uncertainty.

  5. Ability of the reference standard to correctly classify the target condition.

Indirectness This refers to lowering quality of evidence due to:
  1. Any difference in studied and the real recommended population.

  2. Difference in the test itself and its performance in the study and actual practice.

  3. If tests are not directly compared in the same study, but each to the “gold” standard in different studies.

  4. Absence of direct evidence about impact on patient outcomes.

Important inconsistency in study results Regarding sensitivity, specificity or likelihood ratios; especially if inconsistency is unexplained.
Imprecise evidence Referring to wide confidence intervals for estimates of test accuracy.
High probability of publication bias Referring to small studies effect or asymmetry in a funnel plot.
Upgrading for dose effect and large effects residual plausible bias There is disagreement if and how dose effects play a role in assessing the quality of evidence in DTA studies, and methods have not been properly developed.
  1. Adapted from the GRADE Handbook Table 7.2. Handbook for grading the quality of evidence and the strength of recommendations using the GRADE approach. Updated October 2013. Holger Schünemann, Jan Brożek, Gordon Guyatt, and Andrew Oxman [71].

Figure 5: 
Overview the systematic review process with grading of recommendations assessment, development and evaluation (GRADE) approach. This figure is adapted from Figure 2 in the Grade Handbook [71]. Initially, a systematic review group is formed. This is followed by formulating questions, selecting, and prioritising outcomes, systematically summarising the evidence base, and producing a GRADE evidence profile or summary of findings table presenting the pooled estimates and certainty of evidence for each outcome. The systematic review team can rate the quality of evidence and describe the different factors that down or upregulate the quality of evidence rating. When using GRADE for studies on diagnostic test accuracy or prognostic factors, observational studies start as high certainty evidence, and when using certain tools (e.g. the Risk of Bias In Non-randomised Studies – of Interventions (ROBINS-I) tool) to assess the risk of bias in non-randomised studies (observational studies), certainty of evidence starts as high, and is subsequently rated down accordingly. PICOTS, population, intervention, comparator, outcomes, time, and setting.
Figure 5:

Overview the systematic review process with grading of recommendations assessment, development and evaluation (GRADE) approach. This figure is adapted from Figure 2 in the Grade Handbook [71]. Initially, a systematic review group is formed. This is followed by formulating questions, selecting, and prioritising outcomes, systematically summarising the evidence base, and producing a GRADE evidence profile or summary of findings table presenting the pooled estimates and certainty of evidence for each outcome. The systematic review team can rate the quality of evidence and describe the different factors that down or upregulate the quality of evidence rating. When using GRADE for studies on diagnostic test accuracy or prognostic factors, observational studies start as high certainty evidence, and when using certain tools (e.g. the Risk of Bias In Non-randomised Studies – of Interventions (ROBINS-I) tool) to assess the risk of bias in non-randomised studies (observational studies), certainty of evidence starts as high, and is subsequently rated down accordingly. PICOTS, population, intervention, comparator, outcomes, time, and setting.

Finalise the report and submit a manuscript for publication

Updating the contents of the systematic review prior to submission for publication is important. New publications may have appeared in the literature while the team have been working on the data extraction and analysis. Thus doing a updated search using the initial criteria is important to locate and add in any new publications.

The systematic reviews in laboratory medicine can be found in several places in the published literature and in organization reports. Most laboratory medicine journals will consider systematic reviews for publication, as will many other journals. There is a good description of the requirements for publication in the Annals of Internal Medicine [6].

To successfully publish systematic reviews it is worth considering the guidance offered by the STARD 2015 initiative [57] in assessing the original papers the quality evaluation along the line of the QUADAS-2 tool [54, 55]. A commentary on reporting measures of accuracy is also helpful in preparing manuscripts [74]. The PRISMA checklist [49] or more specifically PRISMA-DTA [75] is an important guide to the structure of the report that should be checked prior to submission to a journal.

There are other evaluative tools available, and these are useful guides to consider in performing a systematic review and in the writing up of such a report. These include the Centre for Evidence Based Medicine (CEBM) in Oxford and the Cochrane Collaboration [30] that provide a structure for doing systematic review and the structure for reports that get published in the Cochrane database.

Conclusions

In conclusion, systematic reviews of laboratory tests are required for evidence based laboratory medicine. There are established techniques to follow to undertake this type of research and publication is possible. This paper has described the process and will assist investigators in planning these types of studies.


Corresponding author: Dr. Andrew C. Don-Wauchope, Department of Pathology and Molecular Medicine, Faculty of Health Sciences, McMaster University, 1200 Main Street West, Hamilton, ON L8N 3Z5, Canada, E-mail:

Acknowledgments

Dr. Owen Wiese for Figure 3. Rachel Don-Wauchope for Figures 1, 4, and 5.

  1. Research ethics: Not applicable.

  2. Informed consent: Not applicable.

  3. Author contributions: All authors have accepted responsibility for the entire content of this manuscript and approved its submission.

  4. Competing interests: Authors state no conflict of interest.

  5. Research funding: None declared.

References

1. Oosterhuis, WP, Bruns, DE, Watine, J, Sandberg, S, Horvath, AR. Evidence-based guidelines in laboratory medicine: principles and methods. Clin Chem 2004;50:806–18. https://doi.org/10.1373/clinchem.2003.025528.Search in Google Scholar PubMed

2. Guyatt, G, Cairns, J, Churchill, D, Cook, D, Haynes, B, Hirsh, J, et al.. Evidence-based medicine: a new approach to teaching the practice of medicine. JAMA 1992;268:2420–5. https://doi.org/10.1001/jama.1992.03490170092032.Search in Google Scholar PubMed

3. McQueen, MJ. Overview of evidence-based medicine: challenges for evidence-based laboratory medicine. Clin Chem 2001;47:1536–46. https://doi.org/10.1093/clinchem/47.8.1536.Search in Google Scholar

4. GRADE Working group. https://www.gradeworkinggroup.org/ [Accessed 24 Jul 2023].Search in Google Scholar

5. Whiting, P, Toerien, M, de Salis, I, Sterne, JAC, Dieppe, P, Egger, M, et al.. A review identifies and classifies reasons for ordering diagnostic tests. J Clin Epidemiol 2007;60:981–9. https://doi.org/10.1016/j.jclinepi.2007.01.012.Search in Google Scholar PubMed

6. Leeflang, MMG, Deeks, JJ, Gatsonis, C, Bossuyt, PMM, Group, CDTAW. Systematic reviews of diagnostic test accuracy. Ann Intern Med 2008;149:889–97. https://doi.org/10.7326/0003-4819-149-12-200812160-00008.Search in Google Scholar PubMed PubMed Central

7. Hill, SA, Booth, RA, Santaguida, PL, Don-Wauchope, A, Brown, JA, Oremus, M, et al.. Use of BNP and NT-proBNP for the diagnosis of heart failure in the emergency department: a systematic review of the evidence. Heart Fail Rev 2014;19:421–38. https://doi.org/10.1007/s10741-014-9447-6.Search in Google Scholar PubMed

8. Booth, RA, Hill, SA, Don-Wauchope, A, Santaguida, PL, Oremus, M, McKelvie, R, et al.. Performance of BNP and NT-proBNP for diagnosis of heart failure in primary care patients: a systematic review. Heart Fail Rev 2014;19:439–51. https://doi.org/10.1007/s10741-014-9445-8.Search in Google Scholar PubMed

9. Sacks, DB, Arnold, M, Bakris, GL, Bruns, DE, Horvath, AR, Kirkman, MS, et al.. Guidelines and recommendations for laboratory analysis in the diagnosis and management of diabetes mellitus. Clin Chem 2011;57:e1–47. https://doi.org/10.2337/dc11-9998.Search in Google Scholar PubMed PubMed Central

10. Formica, V, Sera, F, Cremolini, C, Riondino, S, Morelli, C, Arkenau, H-T, et al.. KRAS and BRAF mutations in stage II/III colon cancer: a systematic review and meta-analysis. JNCI J Natl Cancer Inst 2021;114:djab190.10.1093/jnci/djab190Search in Google Scholar PubMed PubMed Central

11. Koliopoulos, G, Nyaga, VN, Santesso, N, Bryant, A, Martin-Hirsch, PPL, Mustafa, RA, et al.. Cytology vs. HPV testing for cervical cancer screening in the general population. Cochrane Database Syst Rev 2017;2018:CD008587. https://doi.org/10.1002/14651858.cd008587.pub2.Search in Google Scholar PubMed PubMed Central

12. Eichler, K, Puhan, MA, Steurer, J, Bachmann, LM. Prediction of first coronary events with the Framingham score: a systematic review. Am Heart J 2007;153:722–31.e8. https://doi.org/10.1016/j.ahj.2007.02.027.Search in Google Scholar PubMed

13. de Bruel, AV, Cleemput, I, Aertgeerts, B, Ramaekers, D, Buntinx, F. The evaluation of diagnostic tests: evidence on technical and diagnostic accuracy, impact on patient outcome and cost-effectiveness is needed. J Clin Epidemiol 2007;60:1116–22. https://doi.org/10.1016/j.jclinepi.2007.03.015.Search in Google Scholar PubMed

14. Devillé, WL, Buntinx, F, Bouter, LM, Montori, VM, de Vet, HCW, van der Windt, DAWM, et al.. Conducting systematic reviews of diagnostic studies: didactic guidelines. BMC Med Res Methodol 2002;2:9. https://doi.org/10.1186/1471-2288-2-9.Search in Google Scholar PubMed PubMed Central

15. Horvath, AR, Pewsner, D. Systematic reviews in laboratory medicine: principles, processes and practical considerations. Clin Chim Acta 2004;342:23–39. https://doi.org/10.1016/j.cccn.2003.12.015.Search in Google Scholar PubMed

16. Reitsma, JB, Moons, KGM, Bossuyt, PMM, Linnet, K. Systematic reviews of studies quantifying the accuracy of diagnostic tests and markers. Clin Chem 2012;58:1534–45. https://doi.org/10.1373/clinchem.2012.182568.Search in Google Scholar PubMed

17. Tatsioni, A, Zarin, DA, Aronson, N, Samson, DJ, Flamm, CR, Schmid, C, et al.. Challenges in systematic reviews of diagnostic technologies. Ann Intern Med 2005;142:1048–55. https://doi.org/10.7326/0003-4819-142-12_part_2-200506211-00004.Search in Google Scholar PubMed

18. Deeks, JJ, Bossuyt, PM, Gatsonis C, editors. Cochrane handbook for systematic reviews of diagnostic test accuracy: The Cochrane Collaboration; 2013. Available from: https://methods.cochrane.org/sdt/handbook-dta-reviews.Search in Google Scholar

19. Oosterhuis, WP, Niessen, RW, Bossuyt, PM. The science of systematic reviewing studies of diagnostic tests. Clin Chem Lab Med 2000;38:577–88. https://doi.org/10.1515/cclm.2000.084.Search in Google Scholar

20. Jang, M-A, Kim, B, Lee, YK. Reporting quality of diagnostic accuracy studies in laboratory medicine: adherence to standards for reporting of diagnostic accuracy studies (STARD) 2015. Ann Lab Med 2020;40:245–52. https://doi.org/10.3343/alm.2020.40.3.245.Search in Google Scholar PubMed PubMed Central

21. Korevaar, DA, van Enst, WA, Spijker, R, Bossuyt, PMM, Hooft, L. Reporting quality of diagnostic accuracy studies: a systematic review and meta-analysis of investigations on adherence to STARD. Évid Base Med 2014;19:47. https://doi.org/10.1136/eb-2013-101637.Search in Google Scholar PubMed

22. van Dinter, R, Tekinerdogan, B, Catal, C. Automation of systematic literature reviews: a systematic literature review. Inf Software Technol 2021;136:106589. https://doi.org/10.1016/j.infsof.2021.106589.Search in Google Scholar

23. The Buyer’s guide to systematic review software. https://www.evidencepartners.com/resources/guides-white-papers/buyers-guide-to-systematic-review-software [Accessed 23 Jul 2023].Search in Google Scholar

24. Software for systematic reviews. https://library-guides.ucl.ac.uk/systematic-reviews/software [Accessed 23 Jul 2023].Search in Google Scholar

25. Marshall, C, Sutton, A, O’Keefe, H, Johnson, E. The systematic review toolbox; 2022. Available from: http://www.systematicreviewtools.com/.Search in Google Scholar

26. Muka, T, Glisic, M, Milic, J, Verhoog, S, Bohlius, J, Bramer, W, et al.. A 24-step guide on how to design, conduct, and successfully publish a systematic review and meta-analysis in medical research. Eur J Epidemiol 2020;35:49–60. https://doi.org/10.1007/s10654-019-00576-5.Search in Google Scholar PubMed

27. Cochrane handbook for systematic reviews of diagnostic test accuracy: Cochrane; 2022. Available from: https://training.cochrane.org/handbook-diagnostic-test-accuracy.Search in Google Scholar

28. Brown, D. A review of the PubMed PICO tool: using evidence-based practice in health education. Health Promot Pract 2020;21:496–8. https://doi.org/10.1177/1524839919893361.Search in Google Scholar PubMed

29. Samson, D, Schoelles, KM. Chapter 2: medical tests guidance (2) developing the topic and structuring systematic reviews of medical tests: utility of PICOTS, analytic frameworks, decision trees, and other frameworks. J Gen Intern Med 2012;27(1 Suppl):S11–9. https://doi.org/10.1007/s11606-012-2007-7.Search in Google Scholar PubMed PubMed Central

30. Stone, PW. Popping the (PICO) question in research and evidence-based practice. Appl Nurs Res 2002;15:197–8. https://doi.org/10.1053/apnr.2002.34181.Search in Google Scholar PubMed

31. Cooke, A, Smith, D, Booth, A. Beyond PICO. Qual Health Res 2012;22:1435–43. https://doi.org/10.1177/1049732312452938.Search in Google Scholar PubMed

32. Booth, A. Clear and present questions: formulating questions for evidence based practice. Libr Hi Technol 2006;24:355–68. https://doi.org/10.1108/07378830610692127.Search in Google Scholar

33. Wildridge, V, Bell, L. How CLIP became ECLIPSE: a mnemonic to assist in searching for health policy/management information. Health Inf Libr J 2002;19:113–5. https://doi.org/10.1046/j.1471-1842.2002.00378.x.Search in Google Scholar PubMed

34. Price, CP. Evidence-based laboratory medicine: is it working in practice? Clin Biochem Rev 2012;33:13–9.Search in Google Scholar

35. Bayliss, SE, Davenport, C. Locating systematic reviews of test accuracy studies: how five specialist review databases measure up. Int J Technol Assess Health Care 2008;24:403–11. https://doi.org/10.1017/s0266462308080537.Search in Google Scholar

36. Cochrane database of systematic reviews [Internet]. Wiley; 2023. https://www.cochranelibrary.com/cdsr/reviews [Accessed 22 Mar 2023].Search in Google Scholar

37. ACESSSS smart search. https://www.accessss.org/ [Accessed 24 Jul 2023].Search in Google Scholar

38. Trip medical database. https://www.tripdatabase.com/ [Accessed 24 Jul 2023].Search in Google Scholar

39. Page, MJ, Shamseer, L, Tricco, AC. Registration of systematic reviews in PROSPERO: 30,000 records and counting. Syst Rev 2018;7:32. https://doi.org/10.1186/s13643-018-0699-4.Search in Google Scholar PubMed PubMed Central

40. Thomas, J, Kneale, D, McKenzie, JE, Brennan, SE, Bhaumik, S. Chapter 2: Determining the scope of the review and the questions it will address. In: Higgins, JPT, Thomas, J, Chandler, J, Cumpston, M, Li, T, Page, MJ, et al.., editors. Cochrane handbook for systematic reviews of interventions version 6.3 (updated February 2022). Cochrane; 2022. Available from: www.training.cochrane.org/handbook. Print version: 2nd ed. Chichester: John Wiley & Sons; 2019.Search in Google Scholar

41. Tew, K, Irwig, L, Matthews, A, Crowe, P, Macaskill, P. Meta-analysis of sentinel node imprint cytology in breast cancer. Br J Surg 2005;92:1068–80. https://doi.org/10.1002/bjs.5139.Search in Google Scholar PubMed

42. Andriolo, BNG, Andriolo, RB, Salomão, R, Atallah, ÁN. Effectiveness and safety of procalcitonin evaluation for reducing mortality in adults with sepsis, severe sepsis or septic shock. Cochrane Database Syst Rev 2017;2019:CD010959. https://doi.org/10.1002/14651858.cd010959.pub2.Search in Google Scholar

43. Nagar, G, Vandermeer, B, Campbell, S, Kumar, M. Effect of phototherapy on the reliability of transcutaneous bilirubin devices in term and near-term infants: a systematic review and meta-analysis. Neonatology 2016;109:203–12. https://doi.org/10.1159/000442195.Search in Google Scholar PubMed

44. Smedemark, SA, Aabenhus, R, Llor, C, Fournaise, A, Olsen, O, Jørgensen, KJ. Biomarkers as point-of-care tests to guide prescription of antibiotics in people with acute respiratory infections in primary care. Cochrane Database Syst Rev 2022;2022:CD010130. https://doi.org/10.1002/14651858.cd010130.pub3.Search in Google Scholar

45. de Vet, HCW, Eisinga, A, Riphagen, II, Aertgeerts, B, Pewsner, D. Chapter 7: Searching for studies. In: Cochrane handbook for systematic reviews of diagnostic test accuracy version 04 [Internet]. Oxford: The Cochrane Collaboration; 2008.Search in Google Scholar

46. Wilczynski, NL, Haynes, RB, Team, H. EMBASE search strategies for identifying methodologically sound diagnostic studies for use by clinicians and researchers. BMC Med 2005;3:7. https://doi.org/10.1186/1741-7015-3-7.Search in Google Scholar PubMed PubMed Central

47. Paez, A. Gray literature: an important resource in systematic reviews. J Evid Base Med 2017;10:233–40. https://doi.org/10.1111/jebm.12266.Search in Google Scholar PubMed

48. PRISMA for systematic review protocols (PRISMA-P). https://prisma-statement.org/Extensions/Protocols [Accessed 24 Jul 2023].Search in Google Scholar

49. Page, MJ, McKenzie, JE, Bossuyt, PM, Boutron, I, Hoffmann, TC, Mulrow, CD, et al.. The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. BMJ 2021;372:n71. https://doi.org/10.1186/s13643-021-01626-4.Search in Google Scholar PubMed PubMed Central

50. PROSPERO international prospective register of systematic reviews. University of York: Centre for reviews and dissemination. https://www.crd.york.ac.uk/prospero/ [Accessed 24 Jul 2023].Search in Google Scholar

51. Bramer, WM, Milic, J, Mast, F. Reviewing retrieved references for inclusion in systematic reviews using EndNote. J Med Libr Assoc 2016;105:84–7. https://doi.org/10.5195/jmla.2017.111.Search in Google Scholar PubMed PubMed Central

52. van de Schoot, R, de Bruin, J, Schram, R, Zahedi, P, de Boer, J, Weijdema, F, et al.. An open source machine learning framework for efficient and transparent systematic reviews. Nat Mach Intell 2021;3:125–33. https://doi.org/10.1038/s42256-020-00287-7.Search in Google Scholar

53. Van de Schoot, R, De Bruin, J, Schram, R, Zahedi, P, De Boer, J, Weijdema, F, et al.. ASReview: active learning for systematic reviews. v0.19.3. Geneve: Zenodo; 2022.10.1038/s42256-020-00287-7Search in Google Scholar

54. Whiting, P, Rutjes, AWS, Reitsma, JB, Bossuyt, PMM, Kleijnen, J. The development of QUADAS: a tool for the quality assessment of studies of diagnostic accuracy included in systematic reviews. BMC Med Res Methodol 2003;3:25. https://doi.org/10.1186/1471-2288-3-25.Search in Google Scholar PubMed PubMed Central

55. Whiting, PF, Rutjes, AWS, Westwood, ME, Mallett, S, Deeks, JJ, Reitsma, JB, et al.. QUADAS-2: a revised tool for the quality assessment of diagnostic accuracy studies. Ann Intern Med 2011;155:529–36. https://doi.org/10.7326/0003-4819-155-8-201110180-00009.Search in Google Scholar PubMed

56. Bossuyt, PM, Reitsma, JB, Bruns, DE, Gatsonis, CA, Glasziou, PP, Irwig, LM, et al., editors. Towards complete and accurate reporting of studies of diagnostic accuracy: the STARD initiative. Clin Chem 2003;49:7–18.10.1373/49.1.7Search in Google Scholar PubMed

57. Bossuyt, PM, Reitsma, JB, Bruns, DE, Gatsonis, CA, Glasziou, PP, Irwig, L, et al.. STARD 2015: an updated list of essential items for reporting diagnostic accuracy studies. Clin Chem 2015;61:1446–52. https://doi.org/10.1148/radiol.2015151516.Search in Google Scholar PubMed

58. Whiting, PF, Weswood, ME, Rutjes, AWS, Reitsma, JB, Bossuyt, PNM, Kleijnen, J. Evaluation of QUADAS, a tool for the quality assessment of diagnostic accuracy studies. BMC Med Res Methodol 2006;6:9. https://doi.org/10.1186/1471-2288-6-9.Search in Google Scholar PubMed PubMed Central

59. Schueler, S, Schuetz, GM, Dewey, M. The revised QUADAS-2 tool. Ann Intern Med 2012;156:323. https://doi.org/10.7326/0003-4819-156-4-201202210-00019.Search in Google Scholar

60. Cook, C, Cleland, J, Hegedus, E, Wright, A, Hancock, M. The creation of the diagnostic accuracy quality scale (DAQS). J Man Manip Ther 2014;22:90–6. https://doi.org/10.1179/2042618613y.0000000032.Search in Google Scholar PubMed PubMed Central

61. Yang, B, Mallett, S, Takwoingi, Y, Davenport, CF, Hyde, CJ, Whiting, PF, et al.. QUADAS-C: a tool for assessing risk of bias in comparative diagnostic accuracy studies. Ann Intern Med 2021;174:1592–9. https://doi.org/10.7326/m21-2234.Search in Google Scholar PubMed

62. Lee, J, Mulder, F, Leeflang, M, Wolff, R, Whiting, P, Bossuyt, PM. QUAPAS: an adaptation of the QUADAS-2 tool to assess prognostic accuracy studies. Ann Intern Med 2022;175:1010–8. https://doi.org/10.7326/m22-0276.Search in Google Scholar PubMed

63. Fagan, TJ. Nomogram for Bayes’s theorem. N Engl J Med 1975;293:257. https://doi.org/10.1056/NEJM197507312930513.Search in Google Scholar PubMed

64. Irwig, L, Macaskill, P, Glasziou, P, Fahey, M. Meta-analytic methods for diagnostic test accuracy. J Clin Epidemiol 1995;48:119–30, discussion 31–2. https://doi.org/10.1016/0895-4356(94)00099-c.Search in Google Scholar PubMed

65. Hatala, R, Keitz, S, Wyer, P, Guyatt, G, Group, E-BMTTW. Tips for learners of evidence-based medicine: 4. Assessing heterogeneity of primary studies in systematic reviews and whether to combine their results. Can Med Assoc J 2005;172:661–5. https://doi.org/10.1503/cmaj.1031920.Search in Google Scholar PubMed PubMed Central

66. Ryan, R. Cochrane Consumers and Communication Group: meta-analysis: Cochrane Consumers and Communication Review Group; 2016. Available from: https://cccrg.cochrane.org/.10.1002/14651858.CD010825.pub2Search in Google Scholar PubMed PubMed Central

67. Schlattmann, P. Tutorial: statistical methods for the meta-analysis of diagnostic test accuracy studies. Clin Chem Lab Med 2023;61:777–94. https://doi.org/10.1515/cclm-2022-1256.Search in Google Scholar PubMed

68. Deeks, JJ, Higgins, JPT, Altman, DG. Chapter 10: Analysing data and undertaking meta-analyses. In: Higgins, JPT, Thomas, J, Chandler, J, Cumpston, M, Li, T, Page, MJ, et al.., editors. Cochrane handbook for systematic reviews of interventions version 6.3 (updated Feb 2022). Cochrane; 2022. Available from: www.training.cochrane.org/handbook. Print version: 2nd ed. Chichester, UK: John Wiley & Sons; 2019.Search in Google Scholar

69. Wang, J, Leeflang, M. Recommended software/packages for meta-analysis of diagnostic accuracy. J Lab Precis Med 2019;4:22. https://doi.org/10.21037/jlpm.2019.06.01.Search in Google Scholar

70. Banno, M, Tsujimoto, Y, Luo, Y, Miyakoshi, C, Kataoka, Y. CAST-HSROC: a web application for calculating the summary points of diagnostic test accuracy from the hierarchical summary receiver operating characteristic model. Cureus 2021;13:e13257. https://doi.org/10.7759/cureus.13257.Search in Google Scholar PubMed PubMed Central

71. Freeman, SC, Kerby, CR, Patel, A, Cooper, NJ, Quinn, T, Sutton, AJ. Development of an interactive web-based tool to conduct and interrogate meta-analysis of diagnostic test accuracy studies: MetaDTA. BMC Med Res Methodol 2019;19:81. https://doi.org/10.1186/s12874-019-0724-x.Search in Google Scholar PubMed PubMed Central

72. Sedgwick, P. Meta-analyses: how to read a funnel plot. BMJ Br Med J 2013;346:f1342. https://doi.org/10.1136/bmj.f1342.Search in Google Scholar

73. Schünemann, H, Brożek, J, Guyatt G, Oxman, A, editors. GRADE handbook 2013. https://gdt.gradepro.org/app/handbook/handbook.html#h.svwngs6pm0f2 [Accessed 24 Jul 2023].Search in Google Scholar

74. Honest, H, Khan, KS. Reporting of measures of accuracy in systematic reviews of diagnostic literature. BMC Health Serv Res 2002;2:4. https://doi.org/10.1186/1472-6963-2-4.Search in Google Scholar PubMed PubMed Central

75. Salameh, J-P, Bossuyt, PM, McGrath, TA, Thombs, BD, Hyde, CJ, Macaskill, P, et al.. Preferred reporting items for systematic review and meta-analysis of diagnostic test accuracy studies (PRISMA-DTA): explanation, elaboration, and checklist. BMJ 2020;370:m2632. https://doi.org/10.1136/bmj.m2632.Search in Google Scholar PubMed

76. Haraka, F, Kakolwa, M, Schumacher, SG, Nathavitharana, RR, Denkinger, CM, Gagneux, S, et al.. Impact of the diagnostic test Xpert MTB/RIF on patient outcomes for tuberculosis. Cochrane Database Syst Rev 2021;2021:CD012972. https://doi.org/10.1002/14651858.cd012972.pub2.Search in Google Scholar

77. Colli, A, Nadarevic, T, Miletic, D, Giljaca, V, Fraquelli, M, Štimac, D, et al.. Abdominal ultrasound and alpha-foetoprotein for the diagnosis of hepatocellular carcinoma in adults with chronic liver disease. Cochrane Database Syst Rev 2021;2021:CD013346. https://doi.org/10.1002/14651858.cd013346.pub2.Search in Google Scholar PubMed PubMed Central

78. Sethi, A, Bajaj, A, Malhotra, G, Arora, RR, Khosla, S. Diagnostic accuracy of sensitive or high-sensitive troponin on presentation for myocardial infarction: a meta-analysis and systematic review. Vasc Health Risk Manag 2014;10:435–50. https://doi.org/10.2147/vhrm.s63416.Search in Google Scholar PubMed PubMed Central

79. Matthaiou, DK, Ntani, G, Kontogiorgi, M, Poulakou, G, Armaganidis, A, Dimopoulos, G. An ESICM systematic review and meta-analysis of procalcitonin-guided antibiotic therapy algorithms in adult critically ill patients. Intensive Care Med 2012;38:940–9. https://doi.org/10.1007/s00134-012-2563-7.Search in Google Scholar PubMed

Received: 2023-04-01
Accepted: 2023-07-19
Published Online: 2023-08-04
Published in Print: 2024-01-26

© 2023 Walter de Gruyter GmbH, Berlin/Boston

Downloaded on 9.5.2024 from https://www.degruyter.com/document/doi/10.1515/cclm-2023-0333/html
Scroll to top button