Impact of a training program incorporating cardiac magnetic resonance imaging on the accuracy and reproducibility of two-dimensional echocardiographic measurements of left ventricular volumes and ejection fraction

Background Left ventricular (LV) ejection fraction (LVEF) assessed by two-dimensional echocardiography (2DE) is the most widely used parameter for clinical decision-making, but reproducibility and accuracy problems remain. We evaluated the usefulness of a novel training program based on cardiac magnetic resonance (CMR) imaging to obtain more reliable values of 2DE-derived LVEF and LV volumes. Methods Fifty-four sonographers from five hospitals independently measured LV volumes and LVEF using the same 2DE images from 15 patients who underwent CMR and 2DE. After receiving a lecture from an expert on how to properly trace the LV endocardium, each sonographer voluntary performed the measurements using the same datasets, and was invited to perform the same analysis for additional patients. The effect of the training intervention was evaluated using the coefficient of variation (CV) and coverage probability (CP). Results Before the intervention, the LV volumes were significantly underestimated and the LVEF was significantly overestimated compared to the CMR results; however, these differences were reduced after the intervention. In particular, the CP (0.52 vs. 0.76, p < 0.001) for the LVEF showed significant improvement. However, the degree of improvement differed among institutions, and the CV actually became worse in two hospitals after the intervention. Level of experience and self-practice was associated with the reproducibility after the intervention. Conclusions A training program using CMR as a reference improved the accuracy of 2DE-determined LV measurements. Since the degree of improvements differed among hospitals, individualization of training programs and periodical objective evaluation may be required to reduce inter-institutional variability.


Background
Echocardiography plays a pivotal role in the diagnosis and management of cardiovascular diseases and other pathologies [1,2]. In particular, assessing the left ventricular (LV) ejection fraction (LVEF) is one of the most common reasons to perform echocardiography. Because the LVEF is an important parameter to guide appropriate medication and device therapy [3,4], the accuracy and reproducibility of LVEF measurement is of paramount importance for clinical decision-making [5,6]. However, the reproducibility of LV volume measurements remains a serious concern due to the fact that manual tracing of the LV endocardial border using two-dimensional echocardiography (2DE) produces non-negligible measurement variability [7]. Thus, the American Society of Echocardiography recommends the annual assessment of observer variability of LVEF [8]. However, the lack of a reference standard makes this approach ineffective for reducing the inter-institutional variability and improving accuracy. The potential solutions to resolve this problem are to (1) use fully automated LV quantification software or (2) establish a training program to enhance the accuracy and reproducibility of the measurements. Although the former approach is more robust and straightforward, it requires specific equipment that cannot be used in routine echocardiographic laboratories.
Cardiac magnetic resonance (CMR) imaging is a reference standard to measure both LV volumes and LVEF [9]. If both CMR and 2DE are conducted on the same day, the LV volumes and LVEF assessed by CMR can serve as references for the corresponding 2DE measurements. Thus, we hypothesized that a training program aimed at obtaining 2DE LV volumes and LVEF values similar to CMR measurements would help to improve inter-observer and inter-institutional variability and consequently the overall accuracy of the assessment.
Accordingly, we aimed to (1) clarify the inter-observer and inter-institutional variability of 2DE-determined LV volumes and LVEF in a large number of examiners and to assess their accuracy against CMR measurements; (2) evaluate whether intervention via a training program including expert guidance would improve reproducibility and accuracy; and (3) determine sonographer's characteristics on reproducibility after the intervention.

Methods
This study was approved by the Institutional Review Board of the University of Occupational and Environmental Health, School of Medicine. All examiners provided informed consent for participation in this study.

Patients
We selected a consecutive series of 15 patients who underwent clinically indicated CMR examinations and also agreed to undergo 2DE examinations on the same day among the CMR database of our laboratory. We did not exclude any patients who had poor 2DE image quality. We then aimed to select an additional 15 patients with propensity score matching of LV end-diastolic volume (LVEDV) and LVEF on the CMR and 2DE image quality from the same CMR database. However, one of the patients among the first 15 cases did not have an appropriate propensity score match in the database. Thus, we did not use this patient for the comparative analysis before and after the intervention.

Acquisition and analysis of 2DE images
All 2DE images were acquired by one expert sonographer using a commercially available ultrasound machine and equipment (iE33, Philips Medical Systems, Andover, MA, USA). Echocardiographic image acquisition was systematically performed according to the American Society of Echocardiography guidelines [8] with breath-holding in all patients, from which we used apical LV-focused 4-and 2chamber views for the analysis. LV volumes (LVEDV, LV end-systolic volume: LVESV) and LVEF were measured by tracing the LV endocardial border at end-diastole and at end-systole on the apical 4-and 2-chamber views and were calculated using the modified biplane Simpson method (ImageArena, TomTec Imaging Systems, Unterschleissheim, Germany). Image quality was evaluated according to the visualization of the LV endocardial border in LV 18-segment model (good: 0-2 segments were not visible, fair: 3-5 segments were not visible, and poor: > 5 segments were poorly visible).
Acquisition and analysis of CMR images CMR imaging was performed using a 3-T scanner (Discovery 750 W; GE Healthcare, Milwaukee, WI) with a phasedarray cardiovascular coil. Retrospective electrocardiographygated localizing spin-echo sequences were used to identify the long axis of the heart. Steady-state free precision (SSFP) dynamic gradient-echo cine loops were acquired by retrospective electrocardiographic gating and parallel imaging techniques during 10-to 15-s breath-holds with the following general parameters: 8-mm slice thickness of the imaging planes, 40 × 40-cm field of view, 200 × 160-scan matrix, 50°f lip angle, 3.8/1.7-ms repetition/echo times, and 20 reconstructed cardiac phases. Eight to 16 short-axis slices from the base of the heart to the apex, and three standard longaxis views were recorded in each patient.
CMR LV volumes and LVEF were measured by the same operator (the expert that visited each hospital for training intervention) via feature tracking analysis (2D CPA MR; TomTec Imaging Systems, Unterschleissheim, Germany). In three apical long-axis SSFP images, the LV endocardial border was manually traced at the enddiastolic frame. Then, the feature tracking software propagates the endocardial contour and follows its tissue features throughout a cardiac cycle to generate LV volume curves from which LVEDV, LVESV, and LVEF were determined automatically.

Examiners
We invited cardiac sonographers working in five different hospitals to participate in the study. We defined an expert sonographer as one with > 10 years of experience in echocardiography and a novice sonographer one having ≤10 years' experience [10]. Moreover, we defined an active sonographer as the sonographer who performs > 1000 echocardiography examinations/year.

Protocol 1
Anonymized 2DE apical 4-and 2-chamber DICOM images for 15 patients were sent to each participating hospital. There were only two DICOM images for each patient, with three consecutive cardiac cycles for each image. Each sonographer selected one cardiac cycle from the three beats and determined both the end-diastolic and end-systolic frames. Subsequently, the sonographer traced the LV endocardial border, which was used for calculation of LVEDV, LVESV, and LVEF. After finishing the measurements, all measurement data were sent to the core laboratory. Differences in LV volumes and LVEF between 2DE and CMR examinations were determined for each participant. We also analyzed the interreader and inter-institutional variability.

Protocol 2
The LVEDV, LVESV, and LVEF values for all 15 patients determined by CMR were given as feedback to each sonographer. An expert (MT) then visited each hospital and gave a 4-h hands-on lecture on how to best trace the LV endocardium border to obtain similar values to the CMR measurements. The lecture was also video recorded and uploaded to a webpage where each participant was able to access. We allocated 3 months for the sonographers to individually or jointly practice LV border tracing using the same datasets of protocol 1. This practice period was completely arbitrary and individually based. When the 3-month practice period was finished, we sent the propensity score matched 2DE apical 4-and 2-chamber DICOM images from the additional 14 patients to the same sonographers who were invited to repeat the analysis. The same statistical analysis was performed in the core laboratory on the data as conducted for protocol 1 to assess the impact of the intervention. The study protocol is schematically summarized in Fig. 1.

Statistical analysis
Continuous data are expressed as the mean ± standard deviation (SD). Categorical data are presented as a number or percentage. The t-test or Wilcoxon rank sum test was used to evaluate the differences in continuous variables between two groups. Friedman's analysis with post-hoc comparison was used to determine differences in the 2DE assessment between each hospital and from the CMR results. The coefficient of variation (CV) was used to evaluate reproducibility [11], defined as the SD divided by the mean value. Coverage probability (CP) was used to evaluate the accuracy of echocardiographic measurements compared to CMR results. CP represents the probability of the absolute differences falling within the acceptable difference. Although it is generally used for assessing reproducibility [12,13], we considered CP to be a useful parameter to guide review and retraining efforts. In accordance with previous studies, the acceptable difference was defined as 30 mL for LVEDV and LVESV and 10% for LVEF [12,13]. A two-sided p value < 0.05 was defined as statistically significant. All statistical analyses were performed using commercial software (JMP, version 14; SAS Institute, Inc., Cary, NC, USA) and R version 3.1.0. (The R Foundation for Statistical Computing, Vienna, Austria) with the party package.

Results
A total of 54 sonographers agreed to participate the study. Table 1 shows characteristics of sonographers in each hospital. All five hospitals play an important role for the management of acute cardiovascular care. A core laboratory was located in Site A. Except for Site B, other four hospitals have a cardiovascular center and department of cardiovascular surgery. The median sonographers' experience with echocardiography was 11 years. Table 2 depicts the clinical characteristics of the study population included in protocols 1 and 2. Among 29 echocardiography examinations, image quality analysis revealed good in 5 cases (17%), fair in 11 cases (38%), and poor in 13 cases (45%), respectively. Among the 54 sonographers participating in protocol 1, three sonographers did not participate in protocol 2 due to pregnancy (n = 1) and moving to a different hospital (n = 2). Thus, we used the results from 51 sonographers.

Protocol 1
LV volume and LVEF measurements of each hospital and the CMR examination are summarized in Table 3. LV volumes measured by 2DE were significantly underestimated and LVEF was significantly overestimated compared to the corresponding values of CMR measurements in each hospital and for the whole cohort. However, the degree of this discordance differed substantially among hospitals (Supplemental Table 1). The percentage of underestimation of LVESV was higher than that of LVEDV, resulting in the significant overestimation of LVEF in all five hospitals. Although the CV of LVEDV, LVESV, and LVEF in each hospital ranged from 11 to 16%, from 13 to 21%, and from 8 to 13%, respectively, the corresponding CVs in the whole cohort were 18, 25, and 13%, respectively, suggesting the presence of large inter-institutional variability. The CP of LVEDV ranged from 0.21 to 0.56 in each hospital, indicating that only one-quarter to one-half of the measurements fall into the acceptable range of LVEDV differences (≤30 mL). Similarly, the CP for LVEF ranged from 0.40 to 0.77. These results reflected the different habits of tracing the endocardial border at end-diastole and end-systole among individual hospitals, pointing to the need for the standardization.  Table 3. Although most hospitals still showed underestimation of LVEDV and LVESV, the degree of underestimation became smaller, and the difference was no longer statistically significant in some hospitals. All but one hospital reported LVEF values similar to those measured with CMR. Although the CP values of LVEDV were still low (0.49-0.59), those of LVEF ranged from 0.71 to 0.83; in particular, site A provided a CP value > 0.80. Table 4 indicates the comparison of several reproducibility parameters before and after the intervention. The CP significantly improved after the intervention in most hospitals; however, the CV actually became worse in two hospitals. In particular, at sites C and E, despite significant improvements in the CP of LV volumes and LVEF, the CV of LVEF became significantly worse from 9.9 to 15.5% and from 8.7 to 14.6%, respectively. Figure 3 shows the measurement biases of LVEDV, LVESV, and LVEF between the two methods (2DE -CMR) for each case sorted according to the values of CMR determined LV volumes, with the LV volumes gradually increasing from the left-sided case to the rightsided case. The green plots show measurement differences falling within the acceptable range and the red plots show differences outside the acceptable range: the number of green plots divided by the number of all plots indicates the CP value. The plots of protocol 1 indicate that larger LV volumes were associated with greater underestimation of 2DE determined both LVEDV and LVESV. Bland-Altman analysis also showed that the degree of underestimation of LV volumes especially in large left ventricles became smaller after the intervention (Fig. 4). Since absolute differences between the two methods of measuring LVEDV and LVESV appeared to be similar, there was consistent overestimation of LVEF Fig. 1 Process of quality improvement. Quality improvement process to assess inter-reader and inter-institutional variability and achieve improvement of reproducibility and accuracy before the intervention. However, the underestimation of LV volumes became smaller after the intervention, especially for LVESV. In addition, the 2DE measurements were actually overestimated for both LVEDV and LVESV in cases with a small left ventricle. These changes resulted in the LVEF from 2DE becoming similar to CMR measurements. CP analysis revealed that the effect of the intervention was different among the hospitals. Although the tracing habit of the sonographers working at sites C, D, and E obviously changed following the intervention, the effect was much smaller at sites A and B. Figure 5 provides a representative example of the difference of the effect of the intervention between sites A and E. The effect of intervention was more obvious at site E, especially for the measurements of LVESV and LVEF.

Sonographer characteristics and reproducibility
Years of experience for echocardiography was not different among five hospitals (p = 0.801). The prevalence of the expert sonographer was not different (p = 0.409) ( Table 1). Although total number of echocardiography examinations performed in each institution was the highest in Site C, mean number of echocardiography examinations per one sonographer was the lowest in Site C (n = 414) and followed by Site E (n = 650).
All but four sonographers attended the hands-on lecture or watched the web video lecture. However, less than half of sonographers (47%) actually performed the self-practice for LV endocardial border tracing before the protocol 2. Table 5 indicates the effect of intervention according to the characteristics of sonographers. Although CP values before and after the intervention were not different between the expert group and the novice group, the expert group had significantly lower CV values of LVEDV and LVESV than those of the novice group after the intervention. Active sonographers had a larger improvement CP values of EDV between before and after the intervention compared with inactive sonographers. Sonographers who performed self-practice had a significantly higher CP values of LVEDV and LVESV compared with sonographers who did not perform self-practice before the intervention. Sonographers who did not perform self-practice showed significantly higher values of CV for LVEDV and LVESV after the intervention than those who performed self-practice.

Discussion
The main strengths of our study include the relatively large number of participants, new insights concerning CP usage, and provision of a novel training program incorporating CMR as a reference.
Inter-observer variability is a major limitation of echocardiography, especially for assessing LV volumes and LVEF [7,12,14,15]. Accurate determination of the LV endocardial border is critical for the reliable calculation of these parameters; however, there is no clear consensus regarding the precise tracing border, which is one of the most important causes of inter-observer variability. One potential solution is to use corresponding values from other imaging modalities as a reference. Owing to its high spatial resolution and tissue contrast, CMR SSFP images provide clearer visualization of the LV endocardial border, and the LV volumes and LVEF measured by CMR are associated with reduced inter-observer variability [16]. Several previous studies indicated that 2DE systematically underestimates LV volumes and overestimates LVEF compared to CMR due to foreshortening of 2D cut-planes and individual habits of tracing the more inner part of the LV myocardium [17]. The latter cause is more important, because even a 1-mm difference in tracing the LV endocardial border could produce a significant change in the obtained LV volume [18]. Since CMR provides reference values of LVEDV and LVESV for each patient [19], this offers an opportunity to examine precisely where to trace so as to obtain similar values for LV volumes on 2DE images. In the present study, we evaluated the usefulness of this training program for obtaining more reliable measurements of LV volumes and LVEF using 2DE. As expected, we found significant inter-observer and inter-institutional variability in LV volume measurements before the intervention. Overall, our training program had a favorable impact with a diverse range of echocardiographic image quality. However, the degree of this effect was different among institutions, and thus the approach was not sufficient to overcome inter-institutional variability in LVEF measurements. When the LV endocardial border is traced using the biplane Simpson method, the eyeball EF (visual EF) must also be taken into consideration [20]. LVEF determination by visual EF might differ for each sonographer as well as in each institution [21]. Indeed, although the mean values of LVEF differed among each institution, the CV of each institution was distributed in a relatively narrow range (8.3-11.0%), except for one hospital (14.1%), before the intervention. These results are consistent with those of a recent study conducted by Khouri et al. [13], indicating that there are site-specific criteria for visual EF determination. Thus, without improvement of this inter-institutional visual EF variability, interinstitutional LVEF variability using Simpson's method may not be reduced. Although an expert lectured on how to effectively trace the LV endocardial border and discussed the visual EF in selected cases, there was significant deterioration of the CV for LVEF after the intervention in some institutions. One possible reason for this effect is that the training time was too short for every sonographer to change their established habits concerning tracing of the LV endocardium with confidence in every case. This may also indicate that even if the intervention led to improvement of measurement accuracy in comparison to CMR, it might also have caused confusion for some sonographers, resulting in an overall loss of consistency in some institutions (sites C and E). A long-standing training program, increasing sonographers' daily practice and repeated educational hands-on lectures may be required to overcome this problem.
We also analyzed the relationship between sonographers' characteristics and reproducibility. Expert sonographers (> 10 years' experience) showed better CVs of LVEDV and LVESV after the intervention than novice sonographers. CVs of LVEDV and LVESV in sonographers who perform more than 1000 examinations per year markedly improved after the training. Importantly, sonographers who performed the self-practice showed significantly lower CV values before and after the intervention compared with the sonographers who did not perform the self-training. Our results suggest that not only clinical experience but also voluntary self-practice are important to improve tracing habitus with keeping the range of CV values among the groups. Medvedofsky et al. [22] has already reported that experience is important for reproducibility of LVEF measurements in which study one expert echocardiographer and three novices who were not intervened participated. Our results were consistent with this study and novelty of our study is that experience of sonographer is also associated with the reproducibility.
This study further demonstrates the potential utility of CP as a quality control parameter. CP has been used as an indicator of inter-observer variability of echocardiographic measurements [11][12][13]. We used the same cut-off values of CP as these previous studies (≤30 mL for LVEDV and LVESV and ≤ 10% for LVEF). Daubert et al. [12] first reported the usefulness of CP for quality control of echocardiography. They reported that the CP for LVEDV was significantly improved after training; however, the CP for LVEF was excellent without training. These results suggest that intrainstitutional variability of visual EF is very low under the supervision of an expert in each hospital, because the EF calculation is usually affected by visual EF. Thus, determination of visual EF is crucial to calculate the accurate EF. However, the authors did not report the measurement accuracy because of a lack of a reference value for LVEF. Thavendiranathan et al. [15] reported that using CMR in a training program is effective for improving inter-observer variability and accuracy of visual EF assessment. We also used CMR in our training program; however, we calculated the CP to assess the differences between 2DE and CMR measurements rather than to assess inter-observer variability so as to determine the accuracy of LV volume and LVEF calculation in each participant and at each institution. Although the Intersocietal Accreditation Commission accreditation process recommends routine assessment of echocardiographic variability and the use of a quality improvement program [23], there is no commonly accepted approach. We believe that our approach may be an effective way to achieve quality improvement both individually and inter-institutionally.

Study limitations
This study has several limitations that should be acknowledged. First, we had no data concerning the longitudinal effect of the training program. We do not know how long the effect of training remains, and thus, it is difficult to determine when retraining is required to maintain the participants' measurement ability at a constant level. The longitudinal effect may be influenced by the experience of each sonographer; thus, further study is required to address this point. Second, our intervention only focused on manual tracing of the LV endocardial border, which is only one aspect of quality control. Patient selection and image quality could also affect the outcome, although we used propensity score matching for the selection of the patients. Third, we used one ultrasound vendor's machine to acquire all of the 2DE datasets. Therefore, our results cannot be generalized to other ultrasound vendors [24]. Fourth, when adopting this method into clinical practice, the reproducibility of CMR measurements is also an important consideration, since not all images are of good quality, even with CMR.
Fifth, it has not been settled whether measurements of echocardiography should be similar to those of CMR. Several previous studies using echocardiographic indices showed significant prognostic value even if LV volumes and LVEF were significantly different from those of CMR measurements [25][26][27][28]. However, we believe that there should be a reference modality, such as CMR to improve the nationwide reproducibility of echocardiographic measurements [29]. The reproducibility of CMR measurements are usually better than that of 2DE [30]. Although on an echocardiogram it is easier to trace the blood-tissue boundary than to trace the interface of a compacted layer, we believe that the endocardial border visualized on a CMR is more obvious than the bloodtissue boundary on an echocardiogram, thereby making it ideal for providing objective answers to measurements. Sixth, although both CMR and echocardiography used long-axis views for measuring LV volumes, and LVEF, the number of imaging plane used for analysis was different, which made some discrepancies. Seventh, since the training was arbitrary, the lack of standardization of the practice period after the training intervention and the lack of objective evaluation were other limitations of this study. Finally, sample size was not enough to draw definite conclusions. Therefore, further studies should be required to determine the optimal number of cases and period of training.

Conclusion
A training program for 2DE determined LV volumes and LVEF incorporating CMR as a reference improved the accuracy of measuring LV volumes and LVEF in the majority of participants. However, the degree of the learning effect differed among the hospitals, and reproducibility even became worse in two of five hospitals after the intervention. To improve accuracy and reproducibility, individualization of the training program and a periodical objective evaluation in each subject for each hospital with a standardization of the practice period is required.
Additional file 1: Table S1. Difference of echocardiographic measurements between hospitals