This particular description of a paper aims to examine the difference of assessment and self assessment scores in oral and maxillofacial surgery trainees and MSc postgraduates following the surgical removal of lower third molar teeth.
Subjects and Methods
A total of 17 trainees and MSc postgraduates were assessed when surgically removing lower third molar teeth under general anesthesia. The teeth were selected on the basis that their removal would necessitate raising of a flap and removal of bone. Assessors were members of staff of the department. One assessor was scrubbed, assisting and, where necessary, training the operator; the second observed the procedure closely. Where necessary, the assessor/trainer instructed and/or took over the procedure in the normal way.
Operators were shown the assessment forms prior to the surgery. They were told that the assessment would not count in any way towards their continuous assessment.
Methods of assessment were:
1.An objective assessment of whether 20 components of the procedure were correctly or incorrectly performed. In cases where the trainer corrected the operative technique or took over, the relevant parts of the procedure were judged incorrectly performed.
2.An operative global rating scale (1-5). The scale is anchored by descriptors and measures different aspects of performance, i.e. respect for tissue, time and motion, instrument handling, knowledge of instruments, flow of operation, use of assistants, knowledge of procedure, overall performance.
Both types of assessment were marked by the two assessors during or immediately postoperatively. The operator was asked to assess his performance using the same assessment form immediately postoperatively. The results were correlated using standard statistical techniques.
A total of 22 lower third molar teeth were removed by 17 different operators. There were 8 different assessors using both the objective checklist and global rating scales. In 18 cases, operators assessed their performance using both scales.
There was no evidence of a difference between the marks of the two assessors. Using a two-way analysis of variance P = 0.70 and P = 0.68 for the objective and global rating scales, respectively. The level of agreement between assessors was 86.36% (kappa = 0.79, P <0.001) in the objective checklist scale and 90.91% (kappa = 0.83, P<0.001) in the global rating scale.
Two-way analysis of variance shows that there was evidence of a difference between assessors’ and self-assessment marks using both types of scores (objective checklist score, P < 0.001; global rating score, P < 0.001).
Although there was evidence of good agreement between assessors, there was poor agreement between assessors and operators when using both the objective checklist and global rating scales. Operators almost invariably scored themselves higher than the assessors. Some of these differences were substantial and some operators who were scored very low by assessors scored themselves extremely high. In the objective scale scores were up to 10.5 marks higher (maximum 20) than those of the assessors. They were up to 12.5 marks higher (maximum 40) in the global scale.
These results suggest that some operators have poor judgment and over-rate their surgical ability even when assessed for a specific procedure and given rigid criteria against which to mark.
Little work appears to have been done on self-assessment of specific clinical procedures, especially when marking the self-assessment after the procedure concerned was performed. There have, however, been reports45 of relatively poor agreement between external measures of medical students’ clinical performance and students’ self-assessment of their performance. Additionally, lower performing medical students tended to rate their clinical performances higher than did their peers at initial self-assessment.
In the present study, objective checklist scores although having very rigid criteria tended to be overscored more than the global rating scale where operators were perhaps reluctant to give themselves marks at the extremes of the scale. Certainly over-scoring of checklist criteria suggests that either operators did not know what was expected of them or in some cases exhibited a considerable degree of self-deception. Alliteratively, they may have scored potential or ideal performance or even tried to compensate for poor performance as a defense mechanism.
The results of this study found evidence of a surprising and worrying over-rating of their own surgical skills by many trainees and postgraduates in oral and maxillofacial surgery. There can be little doubt that there is a need to evaluate further the accuracy of self-assessment of operative skills. In conjunction with this, we must train surgeons to evaluate critically their performance and self-assessment can form an excellent basis for constructive feedback between trainer and trainee.
It may be found that some individuals will never develop the judgment to assess accurately their performance. It would be invaluable to have a way to identify these individuals so that they could be redirected at an early stage in their careers.