Video-based assessment of practical operative skills for Undergraduate dental students

Introduction: The aim of this study is to evaluate, within the scope of an experimental design, to what extent the assessment of two different settings of prepared cavities, based on video sequences, containing digital analysis tools of the prepCheck software, as well as to what extent they deviate from one another and are reliable. Materials and Methods: For this prospective, single-centred, experimental study, 60 examination cavities related to a ceramic inlay preparation were assessed by four trainers in two different settings (A: video fi lm versus B: video fi lm plus an analogue model assessment) by using a standard checklist. The examined parameters contained: the 1. preparation / outer edges, 2. surface & smoothness / inner edges, 3. width & depth, 4. slide-in direction, 5. outer contact positioning and 6. overall grade on a Likert scale of 1 = ‘excellent’, 2 = ‘very good’, 3 = ‘good’, 4 = ‘satisfactory’ to 5 = ‘unsatisfactory’. An evaluation questionnaire with 33 items was additionally addressed to the concept of application of a digital-analytic software. The statistical analysis, using SAS 9.2 (SAS Institute Inc., Cary, USA, PROC MIXED) and R (Version 2.15, Package lme4) concerned the reliability, inter-rater correlation and signifi cant factors at a p of 0.05. Results: The assessment of the individual criteria and overall grade of the control group (A) were, on average, lower (i.e. better) than in the study group (B), yet with the exception of the ‘outer contact positioning’, without conclusive statistical signifi cance. The reliability lay at an average of α=0.83 (A) and α=0.79 (B). The maximum reliability of the criteria ‘preparation edge’, ‘surface’, ‘width & depth’ as well as ‘overall grade’ were reasonable in the assessment mode, with α > 0.7. The inter Video-based Assessment 3 rater correlation was at an average of 0.43 < r < 0.74 higher in assessment mode A than B that comprised 0.35 < r < 0.60. Conclusion: The current examination shows an average reliability in the assessment mode A that exceeds the requirements for practical examination (α ≥ 0.6) and also fulfi ls the general requirements for ‘high-stake’ examinations of α ≥ 0.8. Research Article Video-based assessment of practical operative skills for Undergraduate dental students Wälter A1, Möltner A2, Böckers A3, Rüttermann S4, Gerhardt Szép S4* 1Department of Operative Dentistry, Centre for Dentistry and Oral Medicine (Carolinum), GoetheUniversity Frankfurt, Frankfurt, Germany 2Competence Centre for Examinations in Medicine/ Baden-Württemberg, Medical Faculty, University of Heidelberg, Heidelberg, Germany 3Medical Faculty, Institute of Anatomy and Cell Biology, University in Ulm, Albert-Einstein Allee 11, 89081 Ulm, Germany 4Department of Operative Dentistry, Centre for Dentistry and Oral Medicine (Carolinum), GoetheUniversity Frankfurt, Frankfurt, Germany Received: 29 September, 2018 Accepted: 16 October, 2018 Published: 18 October, 2018 *Corresponding author: Susanne Gerhardt-Szep, Department of Operative Dentistry, Centre for Dentistry and Oral Medicine (Carolinum), GoetheUniversity Frankfurt, Frankfurt, Germany, Tel. No: +4969-6301-7505; Email: https://www.peertechz.com


Introduction
The practical development of skills, i.e. the process of gaining expertise in procedures and techniques required for operative dentistry, incorporates a fundamental part of any study in dentistry. In this way, the aim of the sixth semester of the dental education is also, in scope of the phantom course of operative dentistry, to optimally prepare students for the treatment of patients. Above all in regards to cavity preparation, which is one of the basic competencies required by any dentist later in their career, the students were unsure of themselves at fi rst. These, above all include parameters such as cavity depth, i.e. width, surface smoothness and the cavity edge form [1]. Deviating assessments provided by different trainers may lead to frustration and confusion among students [2]. Assistance should be provided by means of modern media, such as computer-generated digital analysis tools [3][4][5], of which the implementation in the dental curriculum was supported in many different ways [5][6][7][8]. To achieve this, the services provided by students are monitored and assessed by trainers at various times, both formatively and summatively. Ideally, therefore, all of the assessments should incorporate all characteristics of reliability, validity, responsibility, fl exibility, comprehension, implementation ability and relevance [9,10]. The assessment of practical skills in dental and medical schools requires considerable time and effort on part of a supervising faculty [1,11,12]. The live assessment of these skills poses a signifi cant problem of resources to dentistry schools and results in complications arising in the execution and scheduling of their daily activities [11]. Dental literature, on the other hand, acknowledges the need for the objective assessment of skills in operative training [1,10]. Structured grading systems, such as the Objective Structured Practical Examinations (OSPE) or OSCEs (Objective Structured Clinical Examinations) were specifi cally designed to reduce subjectivity [1,10,13,14]. In order to fulfi l the general requirements of 'high-stake' examinations, a specifi c number of examiners as well as checklists, should be implemented in the assessment of cavities in an OSPE-design 1.
A major disadvantage of requiring live assessment is the substantial demand on time and resources involved in getting several staff members to observe and assess students' performance 1. As an alternative, supervisors could use videos that reduce some of the logistical overhead [11]. Video-based assessment allow raters to be blind to certain aspects of the performance, such as the identity of trainees, that may otherwise engender bias in rating [11,12,[15][16][17][18][19][20][21][22][23]. It further facilitates a more detailed review of a learner's performance and provides additional time for the rater to fully focus on the performance of a trainee [12]. In addition, videos can be reviewed several times, by several different raters. Finally, trainees can review video recordings themselves and thus be given the opportunity of enhancing their learning through debriefi ng methods [12].
The type of video assessment for cavity preparations in dental medicine most suitable for 'high-stake' examinations, has not been clarifi ed in any work of literature, up to now.
It is also not clear, whether the video assessment of dental cavities (in a simulation model) alone, i.e. their additionallyregarded manual-analogue component, may possibly imply a difference in grading. The additional consideration of models also leads to a higher demand in time and personnel, as this must be carried out individually and assessed by the examiner him or herself. This setting, namely that each model must be individually assessed by examiners and consequently evaluated within a unanimous procedure, where a fi nal grade is allocated, describes the current stand in the type of situation for grading.
Optimally, we assume that three to four examiners are required here [1].
To address this gap, this study aims to compare two different settings for the videobased practical operative skills evaluation, including an analysis tool. In an experimental design, it should be evaluated to what extent different assessments of prepared cavities based on sequences of videos containing digital analysis tools, deviate from one another and the reliability that each possesses. In the study control group (Part A), examiners assessed examination activities, which they observed in a video that illustrated various parameters of a digital analysis tool. Finally, the examiners received the opportunity in Part B (study group), to additionally regard the real examination activity model themselves and modify their previously-provided video assessment. Two main research questions should be answered for the examination: 1. Do the various modes of assessment used in both examined settings (control and study group) affect their reliability?
2. What infl uence do the different modes of assessing the examined settings (control and study group) have on the overall assessment of study participants (trainers)?
In addition, we were interested in the evaluation of study participants, with regards to the application concept of the digitally-analysed software and study procedure.

Materials and Methods
This is a prospective, single-centered experimental study conducted at the Goethe University, in Frankfurt. The

Participants and Assessed Parameters
The criteria for selecting suitable study participators (examiners) was determined in the run-up. Their inclusion criteria included belonging to the department of operative dentistry and the fact that they proved to have little or no experience of the PrepCheck software (had only worked with it up to ten times). In preparation of the study, the trainers were fi rst prepared for the assessment scenario through two trainthe-teacher events and their evaluation skills calibrated.
The exact time frame is represented in (Table 1)

Procedure
By means of the Wilcoxon-Matched-Pairs-Test using the Bonferroni version, a case number of n=60 was determined from the results of a preceding train-the-teacher event at =0.0125 and a probability of P(X+X'>0)=0.25, in order to guarantee a power of 80% for four trainers.
The cavities were randomly allocated to both groups (Parts A and B) of the experiment. The randomisation took place by entering coded models into an online randomizer (https:// www.random.org).

Video-based Assessment
The composition of the video of the digitalised teeth was     Figure 2). At every seat, basic dental utensils (a mirror, probe) were provided that included a lead pencil and cotton wool buds. The examiners used the model with the corresponding reference number and the fi lledin checklist with the corresponding individual assessment from Setting A. They examined the already available individual grade and modifi ed these, where necessary. For the assessment, the teeth could be taken out of the models and the preparation edges marked with a lead pencil, where necessary. This was meant to assist in more easily recognising undesired bevels  In the fi rst step (2nd Part), an enlarged view is taken at the cavity to be assessed, while the prepCheck is set to 'zoom to cavity' for approx. 2 seconds (Translation for: weiter = further, Hinterschnitt = undercut, Präparationsrand = preparation edge, Oberfl ächenbeschaffenheit = surface consistency, Distanzmessung = measurement of distance, freie Winkelmessung = free angle measurement, Winkel zur Kronenachse = angle of the crown axis, Schnittebene = cut level, Kronenachse bestimmen = determine crown axis, Analyse = analysis, Warnung = warning).

Figure 3:
In the second step, the outer edges of the preparation are represented from all sides for approx. 17 seconds, while the prepCheck is set to 'preparation edge', and the setting is automatically slanted in all directions (buccal, oral, mesial and distal), so that one always has a direct view for the assessment. This setting is entered into the assessment questionnaire as the fi rst parameter (Translation for: weiter = further, Hinterschnitt = undercut, Präparationsrand = preparation edge, Oberfl ächenbeschaffenheit = surface consistency, Distanzmessung = measurement of distance, freie Winkelmessung = free angle measurement, Winkel zur Kronenachse = angle of the crown axis, Schnittebene = cut level, Kronenachse bestimmen = determine crown axis, Analyse = analysis, Warnung = warning).

Figure 4:
In the third step, the surface, smoothness and inner edges of the preparation are represented from all sides for approx. 31 seconds in the prepCheck setting 'surface consistency', by automatically slanting the setting in all directions (buccal, oral, mesial and distal), so that one always has a direct view for making the assessment. The programme provides support through indications on the concaveness i.e. convexness of the preparation. Recognisable edges are coloured in in orange. At the cursor, which is represented by an arrow here, a direct response to the consistency is provided. This setting is entered into the assessment questionnaire as the second parameter (Translation for: weiter = further, Hinterschnitt = undercut, Präparationsrand = preparation edge, Oberfl ächenbeschaffenheit = surface consistency, Distanzmessung = measurement of distance, freie Winkelmessung = free angle measurement, Winkel zur Kronenachse = angle of the crown axis, Schnittebene = cut level, Kronenachse bestimmen = determine crown axis, Analyse = analysis, konkav = concave, wenig gekrümmt = slightly bent, konvex = convex, Kante = edge, Meßwert = measured value). during the preparation. Before completion of the assessment time and prior to their being passed onto the next examiner, the cavities had to be cleaned with a moist cotton wool bud.
Maximum 120 seconds was foreseen for the assessment of each model. In the background, a count-down timer ran above the beamer that could be viewed by all participants (Figure 2).

Statistical Analysis
The case number calculation took place in co-operation with the Institute of Biostatics and Mathematical Modelling, in Frankfurt-on-Main. The assessment of the results occurred by means of the statistic programmes SAS 9.2 (SAS Institute Inc., Cary, USA, PROC MIXED) and R (Version 2.15, Package lme4). Basic data was retrieved and an analysis of the similarity of the mean values carried out between the observers (ANOVA for dependent observations, as the same models were used).
Finally, the inter-correlations of the assessments of the four raters were calculated among each other. For the comparison of the assessments in Part A and B, each of the four observer ratings were determined and both parts (A and B) tested using a ttest for paired samples. In order to determine the overall reliability of both scenarios, the six single-assessment parameters were complemented by a further 'mean' variable. In addition, a test was carried out to determine the differences between both alpha values for Parts A and B, followed by the reliability test for the 'mean' of the grades of both scenarios.
The statistical assessment was carried out in co-operation with the Competence Centre for Examinations in Medicine, Baden-Württemberg of the Medical Faculty, Heidelberg.

Collected preparation parameters and inter-rater correlations
The descriptive, statistical assessment of the individual assessment providing the mean value, standard deviation, median, minimum and maximum, as well as the calculation of the reliability, took place simultaneously for all criteria ('mean'), i.e. separately from one another in regards to the 'preparation edge/outer edges', 'surface & smoothness/inner edges', 'slide-in direction', 'outer contact positioning', 'width & depth' and 'overall grade' (Table 4). In conclusion, the following results can be summarised in the following way: the A screening of 1mm is projected onto the preparation, so that one even receives a metrical analysis. The programme provides support through indicating measurements of the preparation and already at this stage points a green arrow at the preparation axis. This setting is entered into the assessment questionnaire as the third parameter (Translation for: weiter = further, Hinterschnitt = undercut, Präparationsrand = preparation edge, Oberfl ächenbeschaffenheit = surface consistency, Distanzmessung = measurement of distance, freie Winkelmessung = free angle measurement, Winkel zur Kronenachse = angle of the crown axis, Schnittebene = cut level, Kronenachse bestimmen = determine crown axis, Analyse = analysis, konkav = concave, wenig gekrümmt = slightly bent, konvex = convex, Kante = edge, Meßwert = measured value).

Figure 6:
In the fi fth step, the slide-in direction of the preparation is portrayed more precisely in the prepCheck setting 'undercut' for approx. 6 seconds. A wheel is projected on the preparation, by which the rotation can slant the cavity in more directions and in that way also examine the indications. The programme furthermore provides support of indication measurements of the preparations and a green arrow pointing at the preparation axis. This setting is entered into the assessment questionnaire as the fourth parameter (Translation for: weiter = further, Hinterschnitt = ndercut, Präparationsrand = preparation edge, Oberfl ächenbeschaffenheit = surface consistency, Distanzmessung = measurement of distance, freie Winkelmessung = free angle measurement, Winkel zur Kronenachse = angle of the crown axis, Schnittebene = cut level, Kronenachse bestimmen = determine crown axis, Analyse = analysis, konkav = concave, wenig gekrümmt = slightly bent, konvex = convex, Kante = edge, Meßwert = measured value).

Figure 7:
In the fi nal step, the outer contact positioning of the preparation is represented for approx. 5 seconds in the prepCheck setting 'undercut'. Here, the cavity can also be pointed in all directions. The programme provides further support through giving indications on the measurements of the preparation and through a green arrow pointing at the preparation axis. The fi lm ends after approx. 120 seconds. This setting is entered into the assessment questionnaire as the fi nal parameter (Translation for: weiter = further, Hinterschnitt = undercut, Präparationsrand = preparation edge, Oberfl ächenbeschaffenheit = surface consistency, Distanzmessung = measurement of distance, freie Winkelmessung = free angle measurement, Winkel zur Kronenachse = angle of the crown axis, Schnittebene = cut level, Kronenachse bestimmen = determine crown axis, Analyse = analysis, konkav = concave, wenig gekrümmt = slightly bent, konvex = convex, Kante = edge, Meßwert = measured value).
students. Trends Comput Sci Inf Technol 3(1): 005-014. DOI: http://dx.doi.org/10.17352/tcsit.000007 assessments of the individual criteria and overall grade were in the control group on average lower (i.e. better) than in the study group (prepCheck video + consequent model), however with one exception that showed no statistical signifi cance. For the assessment of the parameter 'outer contact positioning', the alpha signifi cantly rose from 0.56 (Part A) to 0.74 for Part B. The results of the inter-rater correlations are outlined in (Table 4).

Assessment questionnaire
All distributed assessment and evaluation questionnaires were returned after being fi lled-in. The exclusion rate lay at 0%. The indications on the included study populations are to be taken from ( Table 5). The results of the evaluation can be viewed in (Tables 6,7). An excerpt from the freely-composed commentaries is to be taken from (Table 8).

Discussion
This study establishes evidence to support the reliability of video-based assessments of operative competency in performing cavity preparations in dentistry. To the best of our knowledge, this is the fi rst study to prospectively compare    Question 24: On the whole, the procedure was very manageable. 1 ±0.00 1 1 1 Table 8: Excerpt from the freely-composed commentaries of the examiners at the end of the evaluation questionnaire.
" I felt the experience agained…through the calibration helpful." "The colour variation in the representation of the prepared and non-prepared part appears useful and helpful." " I found the assessments partially very tiring." "…The scenario is great for calibration puposes! Would be great for the train-theteacher events!" "… the assessment could have been more effective by using another sequence of the asssessment criteria" "…Well prepared(setting), comfortable atmosphere." "I found the representation of the preparation edges approximate to using the PC particularly helpful! "I was able to easily reconise frequent sloping i.e., errors in the secondary preparation!" two different settings of video-based assessments of cavity preparation performance using predefi ned checklists.
The reliability of this study lay at an average of =0.79 (Part B: study group) and at =0.83 (Part A: control group). In other literature, one can fi nd reliability values in the form of Cronbach's  of around 0.5 for examinations using CAD systems [24][25][26]. The reliability for OSPE without CAD systems, on the other hand, is depicted between the range of =0.68 and =0.87 [1,10,27]. The current experimental study, thereby, is more closely aligned with the results of these latter results. Due to the reliability values determined, the setting of Part A could be applied to 'high-stake' examinations. Part B only lies slightly below the value of  = 0.8 and requires an additional assessment step beyond the models. Thus, the  In studies on video-based examinations, some reliability data is provided in the form of ICC (interclass correlation coeffi cients) values. LAEEQ, CHEN and SCAFFIDI report an ICC of 0.62 [12,14,15]. KATEEB ICC values of 0.47 ≤ r ≤ 0.78 [28], provided in publications on CAD systems; inter-rater correlations of 0.17 ≤ r ≤ 0.56 are mentioned by ESSER [29].
This experimental study is therefore closest aligned to ESSER [29]. KATEEB [28], LAEEQ and CHEN [14,15]. The statement by SAMPAIOFERNANDES that there is a lot of deviation between individually implemented examiners [31], is in any case applicable, which also occurs in this study, regardless of whether this problem was tried to be counteracted through the trainthe-teacher events. The effects of the training were less than optimal, however, so that a greater need for more information and practice would have been required above all concerning the parameters 'slide-in direction' and 'outer contact positioning'.
The fact that the outer contacting positioning correlated to low ICC values within the control group, i.e. that were therefore exclusively assessed on grounds of the prepCheck videos, is not surprising. For, in inlay preparations, the outer contact positioning is conceivable, due to the given extension surfaces and therefore the relation to more diffi cult conditions for scanning the cavities. These areas would certainly be easier to demonstrate in full-crown preparations. Were additional models provided for the assessment, the ICC values doubled, as the scanning no longer played a role here and one could assess the outer contact positioning i.e. correct the assessment, better. Here, the software would have to be improved on part of the manufacturer. In addition, a signifi cant increase of Cronbach's alpha occurred in Setting B, when the 'outer contact positioning' was evaluated. This is also not surprising, as one was in the position of assessing these areas more carefully on the model. Appropriately, the study participants assessed the possibility of being able to assess the approximate outer It is generally regarded as fundamentally important, however, to primarily perform the assessment by use of the analysis tool for examinations (3.00 ± 1.41). It is not surprising, that it is generally agreed that "dental assistants cannot be replaced by prepCheck when assessing cavities" (1.00 ± 0.00). For, the sole use of digital analysis tools in the current valid version alone, may require critical parameters in the grading, such as, for example, to insuffi ciently depict an image of the outer contact positioning. The overall assessment of the prepCheck analysis tool, ended up being a rather modest at 2.87 ± 0.89 (on a Likert scale of 1 = excellent to 6 = unsatisfactory) and points to the above-mentioned problematic areas that can certainly be optimised on part of the software.
In order to reduce the limitations of the study, various points were considered. First of all, the order of the displayed videos and models was randomised by means of an online randomiser. As the variable of the experimental parts, i.e. examination teeth, was independent to the participants, a 'selection effect' did not take place. Secondly, the study took place with the same four study participants in both parts, at the same time (13:20) and in the same time frame (approx. two hours and 27 minutes), using the same procedure in the same rooms. Thirdly, the lighting of both of these settings were equally also the same, as well as also the duration of the videos (2 mins 0-10 secs) and the sequence of the settings portrayed in the individual fi lms. Furthermore, it was taken into consideration that the participants were selected from the trainers of the department of operative dentistry, who were already actively taking part in practical preparation exercises (phantoms course for the study of conservative dentistry) while in their sixth semester of study and proved to have assessment experience. In order to reduce the problematics of the lack of realistic representation, it was attempted to perform the study in such a way that it refl ected the circumstances of examination as closely as possible. In this way, the duration of the live-assessment of a cavity preparation was determined in preliminary studies and the assessment questionnaires compared to the checklists familiar from the examinations students. Trends Comput Sci Inf Technol 3(1): 005-014. DOI: http://dx.doi.org/10.17352/tcsit.000007 [1,27]. In order to eliminate the problem of generalisation occurring through the differing teaching experience, it was attempted to calibrate the assessment of the cavities in the preceding train-the-teacher events. Despite this, the following limitations should be taken into consideration: it is conceivable that when assessing a model (Part B), the evaluation was generally more rigid, as the preliminary grades from after the fi rst part were already known. It is also possible that in scope of the whole experimental part, a practice-effect took place that became evident to each individual assessor to a different degree. This could explain why, despite the preceding trainthe-teacher events, the inter-rater reliability differed. The infl uence of gender, age and teaching experience of the subject group was not a main part of this examination, although it could well be addressed in future studies.