Original ArticleThe Delphi List: A Criteria List for Quality Assessment of Randomized Clinical Trials for Conducting Systematic Reviews Developed by Delphi Consensus
Introduction
In recent years, the number of available randomized clinical trials (RCTs) has grown exponentially. It is therefore almost impossible for clinicians to keep up with the increase of scientific information from original research [1]. An important aim of reviewing the literature in health care is to summarize the evidence on which clinicians need to base their care and thus to provide the empirical basis for clinical decision making. The overall conclusions of a review often appear to depend on the quality of both the individual RCTs and the review process 2, 3. A clear description of the strategies for identifying, selecting, and integrating the information distinguishes a systematic review from the traditional narrative review 4, 5. Today, many systematic reviews rely substantially on the assessment of the methodological quality of the individual trials 6, 7, 8.
“Quality” as a concept is not easy to define. Quality of RCTs has recently been defined as “the likelihood of the trial design to generate unbiased results” [9]. This definition covers only the dimension of internal validity. Although most articles proposing a criteria list to assess the methodological quality of RCTs do not explicitly define the concept of quality [10], most lists measure at least three dimensions that may encompass the concept of quality in its broadest sense: internal validity, external validity, and statistical analysis 11, 12, 13, 14, 15. Some authors distinguish an ethical component in the concept of quality as well 16, 17.
The method to develop a quality criteria list is similar to that of other measurement instruments, for example, “quality of life” scales [18]. Here, consensus methods are often used to select and reduce the number of items. Consensus studies are typically designed to combine the knowledge and experience of experts with the limited amount of available evidence. From the existing consensus methods, we chose the Delphi technique 19, 20 because of the number of the participants we wanted to involve, the written procedure, the anonymity of the comments, and the time available (approximately 2 years) to conduct the study.
The aim of this study is to achieve consensus among experts, implicitly based on both empirical evidence and personal opinion, on how the quality of RCTs can be measured best, resulting in a quality criteria list. We have considered two approaches to reach this goal: try to achieve consensus on the definition of quality of RCTs and infer the necessary items for a criteria list, or, conversely, try to achieve consensus on items that, according to the participants, measure quality of a trial and infer from those a definition, or a description of the concept, of quality. We considered the latter approach to have a higher chance of success.
To be able to measure the quality of the design and conduct of a trial, one has to rely on the quality of the report. Our point of departure is the ideal situation, that is, that the report presents an honest, accurate, and comprehensive reflection of the conduct of the study. We regard the quality criteria list resulting from this study as a starting point for a future minimum reference standard to be used in systematic reviews. As such, it is not intended to replace existing criteria lists but to facilitate comparison of reviews more easily. This article presents the Delphi procedure and the resulting criteria list in quality assessment of RCTs on which experts reached consensus.
Section snippets
Staff Team
A staff team was formed to initiate this research and consisted of all authors except L.M.B. All staff team members are epidemiologists, one of whom is also a clinician and one of whom has a statistical background. The others are medical doctors or health scientists. The staff team was responsible for the procedures of the selection of items and the participants and was responsible for the construction of the questionnaires, the analysis of the responses and the formulation of the feedback.
Selection of the Items
For
Participants
We were able to locate 15 of 17 identified authors (or co-authors) of original criteria lists. One of them refused to participate, and three did not respond. We located 13 of 19 epidemiologists, of whom two refused to respond and two did not respond. Of the 15 statisticians we located, one refused to respond and one did not respond. Potential participants declined mostly because they were too busy; only one declined because he did not like the Delphi method for this purpose. We started with 33
Discussion
After three Delphi rounds, the participants achieved consensus on a generic core set of items for quality assessment in RCTs. Because of the chosen Delphi consensus procedure, we will call this list the Delphi List. In our effort to develop a criteria list, we chose not to define the word “quality” beforehand because a well-accepted definition does not exist. We assumed that the participants (all experts in the field of quality assessment) would have their own clear picture of what quality is.
Conclusion
The participants in this Delphi process achieved consensus on a generic criteria list for quality assessment in RCTs: the Delphi List. The adoption of this core set by the participants and other researchers may be the first step toward a minimum reference standard of quality measures for all RCTs. It is not our intention to replace existing criteria lists, but we suggest it should be used alongside these lists. The validity of this criteria list will have to be measured and evaluated over time.
Acknowledgements
The authors thank the following persons for their participation: D.G. Altman, E. Andrew, J. Berlin, L.M. Bouter, S.A. Brown, M.K. Cho, M. Clarke, K. Dickersin, M. Evans (and A.V. Pollock), C. Friedenreich, P.C. Gøtzsche, S. Greenland, J. van Houwelingen, T.E. Imperiale, J. Lau, C. Mulrow, M. Nurmohamed, I. Olkin, P. Onghena, G. ter Riet, H. Sacks, K.F. Schultz, K. Smith, P. Tugwell, and S. Yusuf. Their participation in this project does not necessarily mean that they fully agree with the final
References (45)
- et al.
Validation of an index of the quality of review articles
J Clin Epidemiol
(1991) - et al.
Assessing the quality of reports of randomized clinical trialsIs blinding necessary?
Control Clin Trials
(1996) - et al.
Assessing the quality of randomized controlled trialsAn annotated bibliography of scales and checklists
Control Clin Trials
(1995) - et al.
A method for assessing the quality of a randomized control trial
Control Clin Trials
(1981) Methodology and overt and hidden bias in reports of 196 double-blind trials of nonsteroidal anti-inflammatory drugs in rheumatoid arthritis
Control Clin Trials
(1989)- et al.
Low-molecular-weight heparin versus standard heparin in general and orthopaedic surgeryA meta-analysis
Lancet
(1992) - et al.
Antidepressant-induced analgesia in chronic non-malignant painA meta-analysis of 39 placebo-controlled studies
Pain
(1992) - et al.
Acupuncture and chronic painA criteria-based meta-analysis
J Clin Epidemiol
(1990) - et al.
Incorporating variations in the quality of individual randomized trials into meta analysis
J Clin Epidemiol
(1992) Meta-analysis in medicineWhere we are and where we want to go
J Clin Epidemiol
(1989)