Medical errors in the United States are the third leading cause of mortality, with an estimated 250,000 deaths per year [
1]. Reported major error rates in radiology range from 2% to 6% [
2–
5]. The rates of detected radiologic errors have remained similar over time [
6–
10]. An analysis of diagnostic errors among 2.9 million imaging examinations from an international teleradiology practice found higher shift volumes to be associated with diagnostic errors [
11]. This finding raises concern for potential future increases in error rates given radiologists' increasing workloads [
12].
Neuroradiology has been a topic of interest in the literature relating to diagnostic error. An analysis from 2019 of malpractice claims in diagnostic and interventional neuroradiology found a median plaintiff award of USD2,877,847 and a median settlement amount of USD1,950,000 [
13]. Qualitative studies have described common blind spots and error patterns for imaging of the neurovasculature, head and neck, and skull base [
14–
16], and additional studies have analyzed diagnostic errors in neuroradiology using small datasets [
3,
17]. Similar to the previously noted finding for radiology examinations in general [
11,
12], a study of neuroradiology examinations found higher shift volumes to be associated with diagnostic errors, although shift volume was the only variable analyzed [
18]. Large-scale studies of neuroradiology examinations incorporating multivariable analyses of potential risk factors for diagnostic error are currently lacking. Identification of specific risk factors could potentially lead to corrective interventions both at the radiologist (individual) and system (group) levels. The goal of this study was to evaluate associations of examination interpretation time, shift volume, care setting, day of the week, and trainee participation with diagnostic errors by neuroradiologists at a large academic medical center.
Methods
Study Setting
Institutional review board approval was obtained for this retrospective HIPAA-compliant case-control study, with a waiver of the requirement for written informed consent.
The study was performed at the University of California, Davis, a large tertiary-care academic medical center. The neuroradiology quality assurance (QA) database was initiated in January 1, 2014, and has been previously described [
19]. Each day that a neuroradiologist is on clinical service, the neuroradiologist is presented with three CT or MRI examinations (including examinations of the brain, head and neck, or spine, as well as MR angiograms [MRA] and CTA examinations), randomly selected by software, to review and assign a score (1, 2a, 2b, 3a, 3b, or 4) using the American College of Radiology RADPEER system; a score of 4 could no longer be assigned after this score was eliminated from the RADPEER system in May 2016. Examinations assigned a RADPEER score of 2a, 2b, 3a, 3b, or 4 are flagged and further reviewed, either by two additional attending neuroradiologists or by the entire neuroradiology division during a quarterly QA conference, to reach a consensus RAD-PEER score; examinations assigned a RADPEER score of 1 do not undergo further review. The reviewed examinations are entered into the QA database, along with the associated RADPEER score (score of 1 if assigned by the initially designated radiologist or the consensus RADPEER score otherwise).
All members of the neuroradiology division are full-time employees who have completed neuroradiology fellowships. Division members have similar yearly productivity and read similar neuroimaging case mixes. All division members rotate with similar frequency among five clinical services: weekday routine-hours emergency/inpatient (9-hour shift), weekday routine-hours spine (9-hour shift), weekday routine-hours outpatient (9-hour shift), weekday evening emergency/inpatient (6-hour shift), and weekend emergency/inpatient (8-to-12–hour shift). Given this work-flow, examinations interpreted on weekends are primarily emergency/inpatient examinations. The division uses two shared worklists: outpatient and emergency/inpatient; examinations are selected from the worklists in chronologic order, although examinations assigned special priority (e.g., stat priority) are selected ahead of chronologic order. There are no additional subspecialized rotations, and specific examination subsets (e.g., head and neck examinations) are not directed to individual neuroradiologists for interpretation.
Patients and Examinations
Separate searches were conducted of the neuroradiology QA database from January 2014 through March 2020 for examinations with a RADPEER score of 1 (i.e., no diagnostic error) and for examinations with a RADPEER score of 2a, 2b, 3a, 3b, or 4 (i.e., presence of diagnostic error), hereafter described as the control and case groups, respectively [
20]. For each examination in the case group, matched examinations in the control group were identified by the name of the interpreting neuroradiologist and the examination's CPT code. In each group, examinations for which a match could not be found in the other group were excluded. Then, for each remaining examination in the case group, two matched examinations in the control group were randomly selected for inclusion; only one matched examination from the control group was selected if two matched studies could not be identified (e.g., because of faculty turnover over the course of the study period and/or low volumes for certain examinations such as spine MRA examinations). After random selection of matched examinations, additional examinations were excluded if information for the examination could not be extracted from the database because of technical errors or if the examination's interpretation time (as described later in the Methods) had an outlier value (based on the 95th percentile of the distribution of interpretation times for the given group); examinations with outlier interpretation times were excluded given the likelihood that such interpretation times did not reflect time dedicated to interpretation of the given study. After these exclusions, additional examinations were excluded to maintain the 1:2 matching between the study and control groups (unless only 1:1 matching was possible). The remaining examinations formed the final study sample.
All patients in the case group were included in earlier studies using the neuroradiology QA database that explored a range of questions relating to diagnostic error in neuroradiology [
14–
16,
18,
19]. Unique elements of the current study include the comparison with a control group, the multivariable analysis among possible risk factors for diagnostic error, and the control for the interpreting neuroradiologist and examination type.
Data Collection
For each included examination in the study and control groups, the following variables were extracted from the radiology departmental database: interpretation time, shift volume, care setting (emergency/inpatient vs outpatient), day of interpretation (weekday vs weekend), and trainee participation. Interpretation time was defined as the time in minutes between when the interpreting neuroradiologist finalized the report of the immediately preceding study and finalized the report of the given study. For the first study interpreted during the shift, interpretation time was computed with respect to the last report to be finalized during the preceding shift, resulting in all such examinations having an outlier interpretation time and thus being excluded. Shift volume was defined as the total number of CT and MRI examinations that the interpreting neuroradiologist interpreted during the shift in which the given examination was interpreted. For examinations comprising multiple unique accession numbers (e.g., head MRI-MRA), each accession number was considered a single examination for purposes of determining shift volume, and each such accession number was assigned the same interpretation time (i.e., the time since finalizing the prior report). Trainee participation was defined as the presence of a radiology resident or fellow in the generation of the official report.
Statistical Analysis
Descriptive statistics (mean and SD for continuous variables; count and percentage for categoric variables) were obtained for patient age and sex as well as for examination characteristics including interpretation time, shift volume, care setting, day of interpretation, and trainee participation. Formal significance testing was not performed for differences in the descriptive statistics between the two groups.
Conditional and marginal mixed-effects logistic regression models were used to identify predictors of diagnostic error [
21]. Examination characteristics were treated as fixed effects. The interpreting radiologist and examination type were treated as random effects to account for potential correlations among multiple examinations from the same radiologist and examination type. The conditional models generated ORs describing the effect of the examination characteristics on an individual radiologist's risk of diagnostic error, whereas the marginal models generated ORs describing the mean effect of the characteristic at the group (i.e., neuroradiology division as a whole) level. In the models, the continuous variables (interpretation time and shift volume) were standardized by their SDs; thus, the models' estimates represented the effect of a 1 SD change in variable on the outcome of whether the examination was associated with diagnostic error.
Univariable mixed-effects logistic regression models were first performed for each examination characteristic. Characteristics with a p value equal to or less than .1 in either the conditional or marginal model were further evaluated in conditional and marginal mixed-effects multivariable models that yielded adjusted ORs. The multivariable models also included interaction terms between day of interpretation and the other characteristics in the multivariable models. Additional subgroup analyses were performed, stratifying by weekday versus weekend interpretation. Interpretation time and shift volume were summarized, and univariable and multivariable marginal and conditional mixed-effects logistic regression models were performed separately for weekday and weekend examinations among the remaining characteristics that were otherwise included in the earlier multivariable models.
The p values were considered statistically significant if less than .05. All statistical analyses were performed using the SAS Studio version 5.2 software (SAS Institute). Matching was performed using the Surveyselect procedure in SAS software; modeling was performed using the Glimmix procedure in SAS software.
Discussion
This study showed statistically significant associations of diagnostic error for neuroimaging examinations with longer interpretation times, higher shift volumes, and weekend interpretation. According to marginal and conditional mixed-effects logistic regression models, these three variables were found to be risk factors for diagnostic error at both the individual neuroradiologist and neuroradiology division levels, respectively. In subanalyses, longer interpretation times and higher shift volumes were significantly associated with a higher likelihood of diagnostic error on weekdays, but not on weekends. Diagnostic error was not significantly associated with emergency/inpatient setting or trainee participation in the interpretation.
Interpretation times were longer among neuroimaging examinations with diagnostic errors than among those without diagnostic error. Data regarding the association of interpretation times and diagnostic error are scarce. A prior study found that, for abdominal imaging examinations, faster interpretation speed is associated with a higher error rate [
22]. However, that study entailed a prospective intervention whereby radiologists were assigned examinations to interpret twice as fast as their mean baseline reporting time; thus, that study did not reflect variations in interpretation times that occurred during routine practice. Potential reasons for the longer interpretation times for examinations with diagnostic error in the current study include greater complexity of such examinations and interruptions occurring during the longer interpretation periods.
The present finding of an association between higher shift volumes and diagnostic error is consistent with the results of a prior study of 2.8 million examinations from an international teleradiology practice [
11]. Numerous additional studies have found an association between error rates and longer radiology workdays [
11,
23–
25], with a peak in errors after the 10th hour of work [
11,
25]. In an additional study relating to errors in neuroradiology, errors were more likely to be perceptual rather than interpretive in nature for shifts with higher volumes and for examinations interpreted later during shifts [
17]. Radiologist fatigue may account for these associations of error with shift volume or duration.
In the current study, weekend interpretation was a significant risk factor for diagnostic error; to our knowledge, this finding has not been previously reported. In subgroup analysis, longer interpretation times and higher shift volumes were significantly associated with a higher likelihood of diagnostic error on weekdays, but not on weekends. This disparity may in part relate to the larger sample size (and thus greater statistical power) for weekday than for weekend examinations. However, the higher ORs for diagnostic errors for interpretation time and shift volume on weekdays than on weekends, as well as the difference in frequency of diagnostic error between weekdays and weekends, suggest a true influence of weekend interpretation on the relationship of diagnostic error with interpretation time and shift volume. In addition, interpretation times were shorter and shift volumes were higher on weekends than on weekdays in both the study and control groups; it is possible that diagnostic error rates are sensitive to changes in interpretation time and shift volume only within specific ranges for those variables, and that the variables were outside of such ranges on weekends.
Prior literature has highlighted a range of strategies for reducing diagnostic error, including interpretation by subspecialty radiologists [
7,
26], decreases in shift volumes [
11], limits on shift lengths to less than 10 hours [
11,
25], reductions in noninterpretive tasks during clinical shifts [
27], radiologist participation at multispecialty tumor boards [
19], and (as previously discussed in contrast with the present findings) reduced interpretation speed [
22]. In a study from 2014, on-call radiologists received a mean of 72 telephone calls (mean total time for handling the calls of 108 minutes) during a typical 12-hour overnight shift; when allowing an additional 90 minutes for other interruptions, breaks, consultations, and conferences, this volume of telephone calls resulted in radiologists having less than 1 second (0.86 second) to view each image during the shift [
28]. In 2012, the Royal College of Radiologists [
29] issued a national guideline recommending that radiologists interpret up to two complex CT or MRI examinations, and up to six CT or MRI examinations overall, per hour. In 2022, the Japanese College of Radiology [
30] issued a national guideline recommending that radiologists interpret up to four examinations per hour. The American College of Radiology has not issued such guidelines for radiologists in the United States. Further prospective studies are warranted to assess the impact on diagnostic error of specific corrective workflow interventions.
This study had limitations. First, it used a single-center retrospective design. Second, given the large volume of examinations without diagnostic error in the neuroradiology QA database, only a fraction of such cases were included in the analyses, and examinations were randomly selected using a matching process with respect to examinations with diagnostic error. Third, the determination of the presence of diagnostic error is a subjective process, and examinations without diagnostic error based on the assessment of the initially assigned radiologist were not reviewed by additional radiologists. Fourth, all radiologists were fellowship-trained neuroradiologists at an academic medical center; findings may have differed for general radiologists or for neuroradiologists at community practices. Fifth, the association of the interpreting neuroradiologist's years of experience with diagnostic error was not evaluated, as prior work using the neuroradiology QA database did not find this factor to be associated with errors [
19]. Sixth, when evaluating the effect of trainee participation on diagnostic error, stratification was not performed on the basis of level of training (i.e., junior or senior resident vs neuroradiology fellow). Nonetheless, the observed lack of an association between trainee participation and diagnostic error is similar to the results of prior studies in neuroradiology [
31,
32]. Seventh, shift length was not evaluated as a risk factor for diagnostic error given that shifts were generally shorter than 10 hours, whereas prior literature suggests a significant increase in errors at a threshold longer than 10 hours [
11,
25]. Thus, potential interactions of the identified risk factors with long shifts are unknown. Eighth, the assessment of shift volume reflected solely the total number of examinations interpreted over the course of the shift and did not account for variation in types of examinations (e.g., CT vs MRI; brain vs spine) interpreted during the course of the shift, the number of examinations that the neuroradiologist had interpreted before the given examination, or the time of day (e.g., morning, afternoon, evening) that the examination was interpreted. Ninth, interpretation time was determined as the time since completion of the immediately preceding report; this approach assumed that the entire intervening period was dedicated to interpretation of the given examination. Tenth, the regression analyses did not account for potential clustering effects among multiple examinations in individual patients. Eleventh, the analysis did not account for examination complexity, whether related to patient factors, the type of examination, or the examination's findings. Finally, subanalyses were not performed with respect to specific RADPEER scores (2a, 2b, 3a, or 3b).
In conclusion, diagnostic errors by attending neuroradiologists at a single large academic medical center were significantly associated with longer interpretation times, higher shift volumes, and weekend interpretation. The associations with diagnostic error of longer interpretation times and higher shift volumes were observed for weekday, but not for weekend, interpretations. These findings should be considered when designing workflow-related and other interventions seeking to reduce errors in neuroimaging interpretation.