
Achieving high interrater reliability (IRR) is a cornerstone of any effective medium or high stakes assessment in healthcare simulation. Without consistent and dependable scoring across multiple raters, the validity of an assessment can be called into question. Interrater reliability ensures that evaluations are fair, objective, and truly reflective of the participant’s performance rather than the subjective biases or variability among raters.
For simulation-based assessments, however, maintaining IRR can be particularly challenging due to the complex, dynamic, and multifaceted nature of healthcare scenarios. This is where the RST approach—focusing on changes to the Rater, the Simulation, and the Tool—can offer a systematic and impactful framework for improvement. In this post I’ll walk you through this approach, providing insights and practical strategies for applying RST to your simulation programs.
The R in RST: Changing the Rater
One of the most straightforward avenues to improve IRR is addressing variability related to the rater. This is critical because raters bring their own perspectives, experiences, and biases to the evaluation process, all of which can affect their scoring.
Strategies for Enhancing the Rater’s Consistency:
- Rater Calibration Sessions
Conducting rater calibration sessions is one of the most effective ways to ensure raters have a shared understanding of the evaluation criteria. These sessions involve reviewing sample performances as a group and discussing scoring rationales to align perceptions. This shared experience helps raters interpret assessment tools in the same way, leading to more consistent scoring. - Rater Selection and Expertise
Consider who is performing the assessment. Are they subject matter experts? Are they trained educators? Selecting raters with relevant expertise and familiarity with the assessment content can reduce variability. Alternatively, inexperienced or overly diverse rater pools may introduce inconsistencies. - Addressing Rater Bias
Even with calibration, unconscious biases can creep into assessments. Training raters to recognize and mitigate biases—such as favoring individuals who perform similarly to the rater’s own practice style—can improve consistency. - Changing Raters
If specific raters consistently show discrepancies in their scoring compared to others, it may be necessary to replace them or limit their participation in high-stakes assessments. Using multiple raters per simulation and averaging scores can also dilute individual biases.
The S in RST: Changing the Simulation
The second dimension of the RST approach involves modifying the simulation itself to make it more assessable. By carefully designing simulations to make critical behaviors, thought processes, and decisions more observable, you enhance the ability of raters to evaluate participants consistently.
Strategies for Simulation Adjustments:
- Prompting Observable Actions
Simulations can be structured to encourage participants to verbalize their thought processes or articulate their decisions. For instance, during a scenario involving a critical diagnosis, asking participants to “think aloud” as they interpret clinical findings can provide raters with clear evidence of decision-making skills, making scoring more straightforward. - Embedding Structured Checkpoints
Building structured checkpoints into the simulation—such as specific moments when participants are asked to summarize their findings or outline their next steps—creates clear opportunities for assessment. This reduces ambiguity for raters. - Standardizing Simulation Flow
Variability in how simulations unfold can lead to scoring challenges. Using standardized patient scripts, consistent cues, and fixed timing for critical events ensures that all participants encounter the same conditions, making assessments more comparable. If high technology simulators are being used for the simulation, consider the use of preprogram scenario to ensure the physiology changes are consistent across all episodes of the same scenario. - Revisiting Scenario Complexity
While realism is a hallmark of effective simulation, excessive complexity can overwhelm raters and obscure key performance indicators. Simplifying scenarios to focus on specific competencies can improve the clarity and reliability of evaluations.
The T in RST: Changing the Tool
The assessment tool is often an overlooked factor in achieving IRR, yet it plays a pivotal role in how raters interpret and apply scoring criteria. A well-designed tool minimizes ambiguity and makes scoring intuitive, even for less experienced raters.
Strategies for Tool Optimization:
- Behavioral Anchors for Rating Scales
Adding specific behavioral examples or descriptors to rating scale items helps raters apply the scales consistently. For instance, instead of a vague “Good” rating, an anchored descriptor like “Effectively communicates diagnosis and treatment plan to patient” provides clarity. - Item Grouping and Ordering
Organizing items logically—for example, grouping communication skills, clinical decision-making, and procedural skills separately—makes it easier for raters to focus on one domain at a time. A cluttered or disorganized tool can lead to confusion and inconsistent scoring. - Simplifying Language
Ensure that the language in the tool is straightforward and free of jargon. If raters struggle to interpret an item, their scoring may vary widely. - Usability Enhancements
Small changes, like improving the font size, using bullet points, or incorporating intuitive layouts, can significantly reduce rater fatigue and errors during scoring. A user-friendly tool ensures raters stay focused on the participant’s performance rather than grappling with the mechanics of the tool. - Pretesting the Tool
Conduct pilot assessments using the tool to identify problematic items or inconsistencies. This feedback loop allows you to refine the tool before deploying it in high-stakes simulations.
Putting It All Together: The RST Approach in Action
To illustrate how the RST approach works holistically, imagine a healthcare simulation designed to assess a participant’s ability to manage a cardiac arrest scenario:
- Rater: You organize a calibration session where all raters review a sample video of a cardiac arrest scenario and agree on scoring criteria. You also ensure raters have experience in emergency medicine and provide bias-awareness training.
- Simulation: The scenario is adjusted to include a structured moment where the participant is required to verbalize their reasoning for choosing a particular medication. Additionally, standardized cues are used to ensure all participants face identical conditions.
- Tool: The assessment tool is revised to include behavioral anchors, such as “Identifies and administers epinephrine within 3 minutes” for procedural accuracy. The tool’s layout is simplified, grouping items under headings like “Clinical Judgment” and “Communication.”
With these changes, the IRR for this simulation-based assessment improves, as raters now have a shared understanding, participants’ actions are more easily observable, and the tool provides clearer guidance.
Conclusion: Adopting the RST Approach for Better Assessments
While I will agree, improving interrater reliability in healthcare simulation assessments is no small task, but the RST approach offers a structured framework to tackle the challenge. By focusing on the Rater, the Simulation, and the Tool, you can systematically address the factors that contribute to variability and ensure more consistent, fair, and accurate evaluations. For more on this see my previous blog post on interrater reliability.
Whether you are designing a new assessment or refining an existing one, considering how changes in these three areas might influence IRR is a worthwhile investment. With reliable assessments, we not only enhance the quality of simulation-based education but also uphold the integrity of our evaluations—ultimately contributing to better-prepared healthcare professionals.
Are you ready to elevate your simulation assessments? The RST approach is here to guide your journey.
Please like and comment if you would like to see more topics like this in my blog!
Until next time, Happy Simulating!