
One of the most important concepts in simulation-based assessment is achieving reliability, and specifically interrater reliability. While I have discussed previously in this blog every simulation is assessment, in this article I am speaking of the type of simulation assessment that requires one or more raters to record data associated with the performance or more specifically an assessment tool.
Interpreter reliability simply put is that if we have multiple raters watching a simulation and using a scoring rubric or tool, that they will produce similar scores. Achieving intermittent reliability is important for several reasons including that we are usually using more than one rater to evaluate simulations over time. Other times we are engaged in research and other high stakes reasons to complete assessment tools and want to be certain that we are reaching correct conclusions.
Improving assessment capabilities for stimulation requires a significant amount of effort. The amount of time and effort that can go into the assessment process should be directly proportional to the stakes of the assessment.
In this article I offer five tips to consider for improving into rate of reliability when conducting simulation-based assessment
1 – Train Your Raters
The most basic and overlooked aspect of achieving into rate and reliability comes from training of the raters. The raters need to be trained to the process, the assessment tools, and each item of the assessment that they are rendering an opinion on. It is tempting to think of subject matter experts as knowledgeable enough to fill out simple assessments however you will find out with detailed testing that often the scoring of the item is truly in the eye of the beholder. Simple items like “asked medical history” may be difficult to achieve reliability if not defined prior to the assessment activity. Other things may affect the assessment that require rater calibration/training such as limitations of the simulation, and how something is being simulated and/or overall familiarity with the technology that may be used to collect the data.
2 – Modify Your Assessment Tool
Modifications to the assessment tool can enhance interrelated reliability. Sometimes it can be extreme as having to remove an assessment item because you figure out that you are unable to achieve reliability despite iterative attempts at improvement. Other less drastic changes can come in the form of clarifying the text directives that are associated with the item. Sometimes removing qualitative wording such as “appropriately” or “correctly” can help to improve reliability. Adding descriptors of expected behavior or behaviorally anchored statements to items can help to improve reliability. However, these modifications and qualifying statements should also be addressed in the training of the raters as described above.

3 – Make Things Assessable (Scenario Design)
An often-overlooked factor that can help to improve indurated reliability is make modifications to the simulation scenario to allow things to be more “assessable”. We make a sizable number of decisions when creating simulation-based scenarios for education purposes. There are other decisions and functions that can be designed into the scenario to allow assessments to be more accurate and reliable. For example, if we want to know if someone correctly interpreted wheezing in the lung sounds of the simulator, we introduced design elements in the scenario that could help us to gather this information accurately and thus increase into rater reliability. For example, we could embed a person in the scenario to play the role of another healthcare provider that simply asks the participant what they heard. Alternatively, we could have the participant fill out a questionnaire at the end of the scenario, or even complete an assessment form regarding the simulation encounter. Lastly, we could embed the assessment tool into the debriefing process and simply ask the participant during the debriefing what they heard when I auscultated the lungs. There is no correct way to do this, I am trying to articulate different solutions to the same problem that could represent solutions based on the context of your scenario design.
4 – Assessment Tool Technology
Gathering assessment data electronically can help significantly. When compared to a paper and pencil collection scheme technology enhanced or “smart” scoring systems can assist. For example, if there are many items on a paper scoring tool the page can sometimes become unwieldy to monitor. Electronic systems can continuously update and filter out data that does not need to be displayed at a given point in time during the unfolding of the simulation assessment. Simply having previously evaluated items disappear off the screen can reduce the clutter associated with scoring tools.
5 – Consider Video Scoring
For high stakes assessment and research purposes it is often wise to consider video scoring. High stakes meaning pass/fail criteria associated with advancement in a program, heavy weighting of a grade, licensure, or practice decisions. The ability to add multiple camera angles as well as the functionality to rewind and play back things that occurred during the simulation are valuable in improving the scoring accuracy of the collected data which will subsequently improve the interrater reliability. Video scoring associated with assessments requires considerable time and effort and thus reserved for the times when it is necessary.
I hope that you found these tips useful. Assessment during simulations can be an important part of improving the quality and safety of patient care!
If you found this post useful please consider subscribing to this blog!
Thanks and until next time! Happy Simulating.