Design, setting and participants: Observational study of 209 patient encounters involving 28 IMGs and 35 examiners at three metropolitan teaching hospitals in New South Wales, Victoria and Queensland, September–December 2006.
Results: The G coefficient for eight encounters was 0.88, suggesting that the reliability of the mini-CEX was 0.90 for 10 encounters. Almost half of the IMGs (7/16) and most examiners (14/18) were satisfied with the mini-CEX as a learning tool. Most of the IMGs and examiners enjoyed the immediate feedback, which is a strong component of the tool.
Assessing the performance of junior doctors in the workplace is important but challenging. The optimum assessment is by direct observation of doctors’ interactions with patients and comprises multiple assessments by multiple examiners on a variety of patient problems. Clinical supervisors are best suited to observe and certify trainees, but often do not observe them directly.1 Performance assessment is not done well in most instances, as it requires multiple sampling over time.2 In-training assessments done at the end of a term introduce a “halo effect”.3
Most of these problems can be overcome by the mini clinical evaluation exercise (mini-CEX), developed by the American Board of Internal Medicine.4 The mini-CEX involves direct observation of a trainee in a focused clinical encounter, followed by immediate feedback. The assessment is recorded on a rating form that has been shown to have high internal consistency and reliability among internal medicine trainees, giving scores comparable with a high-stake clinical examination.5,6 The mini-CEX has higher fidelity than other formats.7
International medical graduates (IMGs) comprise about 25% of the medical workforce in developed countries.8 Their certification for registration is a major task of the medical boards and registration authorities in Australia and other countries.9 The Australian Medical Council (AMC) has conducted clinical examinations to assess IMGs since 1978.10 Successful candidates undertake 12 months of supervised practice before obtaining full registration. Despite having passed the current AMC clinical examination, IMGs’ competence and performance in the workplace have been criticised.11
The study was conducted in three large metropolitan teaching hospitals in Australia, one each in New South Wales, Queensland and Victoria, as part of a larger international collaborative study with the Medical Council of Canada. The ethics committee in each centre approved the study.
All IMGs at the participating hospitals who had passed the AMC clinical examination in the previous 12 months and 50 potential examiners were asked to volunteer for the study. All IMGs gave written, informed consent.
In each centre, potential examiners were requested to attend a training session. A coordinator assisted the study team at each site. Patients were inpatients or outpatients of the participating hospitals.
The following skills were rated: medical interviewing, physical examination, professionalism/humanistic qualities, counselling, clinical judgement, organisation/efficiency and overall clinical competence. Ratings were on a nine-point scale, where 1–3 signified unsatisfactory performance; 4–6, satisfactory performance; and 7–9, superior performance at a mid-postgraduate year 1 level. Examiners were also asked to grade the encounters as “met expectations”, “borderline” or “did not meet expectations”.
The examination comprised four assessments in emergency medicine (two examination, one history taking and one counselling), three in medicine (one history taking, one management and one counselling), and three in surgery (two examination and one management). These three specialties were selected because these terms are compulsory for internship in most Australian states and territories.
The IMGs and examiners were asked to schedule a mutually convenient time, with at least 30 minutes of protected time for assessment and immediate feedback. At the end of the study period, the IMGs and examiners were asked to evaluate the process.
Reliability was assessed using generalisability theory analysis. Generalisability analyses (G studies) allow estimation of the variance components associated with the different examination conditions (eg, types of tasks, number of tasks, and number of markers). The ratio of the variance component for the object of measurement (in this case, differences between IMGs) to the total error variance yields an estimate of reliability: the generalisability (G) coefficient, with values ranging from 0 to 1. The effect of changes in the examination conditions (eg, increase in the number of tasks to be performed by the IMG, or change in the number of markers for each task) can be modelled to inform decisions on optimising the measurement (decision [D] studies).
We used a G study followed by a D study to evaluate the reliability of the measurement, and sought to determine the number of observed clinical encounters needed to attain a G coefficient of 0.90. This value was set by taking into account the high stakes of the assessment for the IMGs. The average rating for the different assessments within each encounter (eg, history taking, examination) was the outcome measure for these analyses. As each IMG interacted with different patients and a different number of patients, the “patient-encounters” factor was treated as nested within IMGs in the G-study design.
All 28 IMGs who had passed the AMC examinations within the previous 12 months and 35 examiners volunteered to participate in the study. Twenty-two examiners were trained in assessing the mini-CEX; the remaining 13 examiners participated without training. The examiners included specialists and specialist trainees in internal medicine, surgery and emergency medicine.
The 28 IMGs were assessed by the 35 examiners on 209 clinical encounters: 122 assessments were done in wards, 70 in emergency departments, eight in intensive care units, six in outpatient clinics and two in offices; location was not recorded for one encounter. The mean number of mini-CEXs completed by IMGs and examiners was 7.2 (range, 2–13) and 6.0 (range, 1–20), respectively. Assessments were scored as “met expectations” for 150 encounters, “borderline” for 40, and 19 (9% across 12 IMGs) “did not meet expectations”. The average mini-CEX duration was 20 minutes (range, 6–45 minutes). The average time for feedback was 12 minutes (range, 3–20 minutes). Complexity of encounters was rated by examiners as low for 19, moderate for 150 and high for 31 encounters; data were missing for nine encounters.
Because of differences in the number of encounters per participant, we included a maximum of eight encounters in the generalisability study. The results of the variance components estimation are shown in Box 1. The G coefficient for eight encounters was 0.88. As a measure of discrimination, the standard error of measurement for the measurement design with eight encounters was estimated at 0.35 (that is, 19/20 times, the “true” score of an IMG will fall within ± 0.69 of an observed score). The results of the D study indicated that 10 encounters were necessary to achieve a reliability of 0.90 (Box 2).
In the evaluation survey, 16/28 IMGs (57%) and 18/35 examiners (51%) responded. Most respondents (10 IMGs; 15 examiners) never or only occasionally experienced difficulty arranging the mini-CEX encounters. When problems did occur, they were often due to rostering issues and patients being away from the wards.
Twelve of the 16 IMG respondents received feedback after each mini-CEX encounter, and nine of these were satisfied with this feedback. There was no response from examiners for the remaining four. Seven of 15 examiner respondents indicated they would like further training in giving feedback. The comments indicated that the feedback is a strong feature of the mini-CEX.
Almost half of the IMGs (7/16) and most examiners (14/18) were satisfied or very satisfied with the mini-CEX as a tool for learning. Ten of the examiner respondents were satisfied or very satisfied with the mini-CEX as an assessment tool and eight were neutral. Most respondents (8/16 IMGs; 15/18 examiners) were positive about the mini-CEX continuing as an assessment tool.
Under the conditions and settings used, the mini-CEX reliably assessed the clinical performance of IMGs with eight to 10 encounters. This is consistent with the results of other studies.7 As the mini-CEX is conducted within the workplace with real patients, it has high fidelity and it is acceptable to both IMGs and examiners. A fail rate of 9% (19/209 encounters) across 12 IMGs is concerning, given these IMGs had passed the AMC clinical examination.
As we have demonstrated, the mini-CEX appears to be a reliable and acceptable assessment tool of clinicians in the workplace, and is a valuable method of identifying which candidates may have problems in a clinical situation. Most IMGs were satisfied with their feedback. Examiners reported that this was the most important part of the mini-CEX. However, they would have preferred more training in this task. Overall, examiners felt this tool assesses clinical performance better than the conventional methods.
There are limitations to this study. Only patients from medical, surgery and emergency departments were included, and the results cannot automatically be extended to other specialties. The IMGs were a self-selected group, introducing a possible bias, and only a small group of experienced clinicians took part, limiting the generalisability of the study.
The mini-CEX is a feasible, reliable and high-fidelity instrument for workplace-based assessment of IMGs with the strong advantage of providing ongoing observation and feedback. It has the potential to be used for summative assessment of IMGs and other medical trainees. The AMC, in collaboration with some licensing bodies, has already introduced the mini-CEX as a workplace assessment tool for some IMGs.
1 Variance components estimates for the generalisability study