While SE is an important method to diagnose coronary artery disease, it is based on the subjective assessment of changes in LV WMA. Thus there is the potential for variability in interpretation amongst readers. In a prelude to the multicenter ISCHEMIA Trial, we examined the degree of agreement between enrolling site and core lab SE interpretations of cases that were representative of varying degrees of myocardial ischemia. We found that while agreement in aggregate is good, there are variables which are associated with lower degrees of inter-reader agreement. These include the degree of myocardial ischemia and the image quality. Such discordances highlight the importance of the use of a core lab in multicenter trials.
Specifically, there was agreement between local site and core lab in the interpretation of the degree of myocardial ischemia in 81 % of the cases. In the cases where there was disagreement, the majority (87 %) of these were determined by the core lab to have no or mild myocardial ischemia (defined as 0, 1 or 2 positive segments) but were interpreted as more extensive ischemia by the local site. Thus while one might have hypothesized that a major source of discrepant interpretations would be when small regions of stress induced WMAs are missed, in fact the major source of discrepancy was a tendency for local interpretations to over-estimate the extent of stress induced WMA. One can speculate that this may reflect a cognitive bias as the site interpreter might have been influenced by the clinical information about the patient, exercise performance, symptoms during the test or the stress ECG to interpret the stress echocardiographic wall motion to reflect more extensive ischemia than was present. The core lab interpretation was immune to such bias as the core lab did not have access to clinical or stress test data. An alternative explanation for our findings of more disagreement when myocardial ischemia is absent or mild during SE is that larger extent of, or more severe degrees of, WMA are easier to appreciate and thus the agreement would be better in these cases. Previous studies of inter-institutional observer agreement of SE have also found better agreement when the coronary artery disease is more extensive [4, 10, 11].
It is not surprising that 60 % of the disagreements occurred in cases where images were graded as fair or poor due to reduced visualization of left ventricular endocardium. Hoffman and colleagues previously identified image quality as an important factor influencing inter-observer variability in the interpretation of dobutamine stress echocardiograms . Interestingly, even though image quality has improved in the 17 years since that report, the findings remain similar. Our study also extends those findings by examining exercise SE. The Kappa statistic was better for pharmacologic SE than for exercise SE. This reduced precision in agreement with exercise stress echocardiography may reflect the fact that imaging can be more challenging with exercise especially in these beating rapidly immediately after peak exercise than when the patient remains supine for the entire pharmacologic stress test. Since the sample size for exercise stress echo was smaller than for pharmacologic stress, this difference should be taken with caution.
Prior studies that have examined this issue typically have utilized single centers  or involved multiple experienced centers [3, 4]. Our study extends the observations to a more “real world” experience utilizing active clinical programs from around the world with various levels of experience.
While variability in discrimination of different degrees of ischemia may not be as important for the diagnosis of coronary artery disease (that is, establishing the presence or absence of any coronary disease), it does have implications for assessment of prognosis or for trial enrollment. For example, the ischemia entry criterion for the ISCHEMIA Trial is the presence of moderate or greater myocardial ischemia on stress testing. Our observations suggest that without core lab oversight, patients with less than moderate ischemia might be enrolled, and this could importantly affect the results. While the Clinical Outcomes Utilizing Revascularization and Aggressive Drug Evaluation (COURAGE) Trial did not show a benefit of revascularization to reduce major adverse cardiovascular events compared to optimal medical therapy in patients with stable coronary artery disease, a substudy demonstrated that the greatest reduction in ischemia as measured by myocardial perfusion imaging occurred in those with moderate to severe ischemia who underwent percutaneous revascularization [13, 14]. Analysis of outcomes based on the degree of ischemia was limited by small sample size. The ISCHEMIA Trial aims to more clearly define the role of an invasive strategy of routine angiography and complete revascularization in stable coronary artery disease patients with moderate or greater myocardial ischemia. Other clinical studies have shown that the extent and severity of myocardial ischemia is well correlated with prognosis, again highlighting the importance of accurate determination of not just the presence but also the severity of myocardial ischemia . Our observations highlight the importance of a core lab not biased by the knowledge of the details of the patient and stress test in order to assure proper composition of the trial population.
The value of core lab interpretations has also been demonstrated in the interpretation of electrocardiograms in patients with acute coronary syndromes [16, 17]. These studies taken with ours show that the advantages of core laboratory interpretations include standardized assessments (especially when confounding variables are present), lack of bias from other clinical data, and low intra- and inter-observer variability.
Our study differs from prior studies examining inter-observer variability in SE interpretation at expert centers in that technologic advances have occurred and were incorporated into our cases. While prior studies included a high proportion of videotape assessments, all of our assessments were on digital images with harmonic imaging that were formatted for side by side rest and stress comparisons . More recent updates to the original studies by Hoffman confirm that technologic enhancements such as digital image processing and harmonic imaging lead to better SE interpretation agreements . Our data suggest that at least part of this improvement is due to the improvements that result in image quality.
While there may be different equipment, stress protocols, scanning techniques, image acquisition protocols and levels of expertise at different sites around the world, our data did not demonstrate a difference in concordance as a function of this geographic location.
Interpretation of stress echocardiograms is challenging and prior studies have shown the value of educational initiatives . An added benefit of core lab interpretations is that the information can be passed back to the enrolling site and improve subsequent quality and interpretations.
There are several limitations in this study. First, due to the restrictions of the pilot phase of this trial, we did not have access to patient demographic data and thus we cannot assess if such characteristics would influence concordance. On the other hand, this enabled the core lab to remain free of bias. Also we did not have access to information about the site readers. Their years of echocardiography experience might explain differences in concordance. In fact, prior studies suggest that specific training and experience in SE is critical for accurate interpretations [20, 21]. In this actively enrolling trial, we do not yet have access to the coronary anatomy by cardiac CT or coronary angiography and thus are unable to assess concordance as a function of the location and severity of coronary artery disease. It is expected that use of contrast agents to enhance LV endocardial border delineation would improve image quality and reduce inter-observer interpretation variability. However, contrast was used in only 14 cases in this data set and so we were unable to assess its benefit. There were differences in the numbers of pharmacologic and exercise stress cases that composed our population. While the sample sizes of each population were sufficient for analysis, it is possible that this difference might contribute to the differences noted by modality. Lastly, SE in this trial did not incorporate new technologies such as strain imaging which would provide quantitative parameters that might influence concordance.