Introduction

Inter-rater reliability is a measure that indicates the level of agreement between two or more coders on a project. This measure is used to ensure that the study analysis is conducted with rigor and that the data collected and analyzed is consistent among all researchers. High inter-rater reliability scores are significant because the relationship between coders is crucial to the validity of the research project. The three main ways to measure inter-rater reliability in qualitative coding include using Cohen’s Kappa score, Intraclass Correlation Coefficient, and Percentage Agreement scores.

Methods to Measure Inter-rater Reliability (IRR)

There are multiple ways to measure inter-rater reliability in qualitative coding.

Cohen’s Kappa – Cohen’s Kappa is a statistical measure that is used to measure the agreement of coding between multiple coders. This measure is best used with categorical data and ranges from 1 to -1 with 1 indicating perfect agreement and -1 indicating complete disagreement. Zero suggests there is no better agreement than chance. This a more nuanced understanding of the reliability and congruence of the raters.[1]

Intraclass Correlation Coefficient – ICC is used to as a method to measure reliability of measurements made by multiple raters. It is most useful when the data is continuous rather than categorical. The ICC values range between 0 and 1, with values closest to 1 indicating higher reliability. This is different from Cohen’s Kappa because it is ideal for continuous data while Cohen’s Kappa is better for categorical data, and ICC can be used with more than two raters unlike Cohen’s Kappa.

Percentage Agreement – Percentage agreement is the simplest way to measure inter-rater reliability by calculating the number of times that the coders agree without the possibility of chance agreement. Because this method does not account for agreements that might have occurred for chance, this is a less robust and rigorous method for measuring inter-rater reliability.[2]

Conclusion

High inter-rater reliability scores are significant because the relationship between coders is crucial to the validity of the research project. Cohen’s Kappa is a statistical measure that is used to measure the agreement of coding between multiple coders. ICC is used to as a method to measure reliability of measurements made by multiple raters. It is most useful when the data is continuous rather than categorical. Percentage agreement is the simplest way to measure inter-rater reliability by calculating the number of times that the coders agree without the possibility of chance agreement.

Take Away

This article explains how inter-rater reliability is used as a measure of reliability in qualitative coding. This is critical for ensuring validity in qualitative research coding.

[1] Nili, A., Tate, M., Barros, A., & Johnstone, D. (2020). An approach for selecting and using a method of inter-coder reliability in information management research. International Journal of Information Management, 54, 102154.s.

[2] O’Connor, C., & Joffe, H. (2020). Intercoder reliability in qualitative research: debates and practical guidelines. International journal of qualitative methods, 19, 1609406919899220.

Articles and White Papers About Coding & Analytic Memos

From Interviews to Infographics: Reporting Qual Data for Community Use

Introduction Qualitative research plays a critical role in community-engaged studies by capturing lived experiences, narratives, and social complexities that quantitative data often cannot. However, a persistent challenge lies in translating these nuanced insights into accessible, actionable formats for non-academic stakeholders—especially community members who are usually the subjects and beneficiaries of...

How to Write Qualitative Research Reports for Funders and Stakeholders

Introduction Qualitative research is a powerful tool for understanding complex social, behavioral, and organizational phenomena. Its strength lies in capturing rich, contextual, and nuanced data that reflect the lived experiences of individuals and communities. To translate insights into tangible impact, findings must be communicated effectively to decision-makers (funders, policymakers, practitioners,...

Turning Words into Action: How to Spot Actionable Insights in Interviews

Introduction In the business world, qualitative interviews offer a powerful window into users’ needs, frustrations, and motivations—insights which are often invisible in quantitative data. Yet those rich stories can remain dormant if not translated into action. In applied settings like product teams, marketing departments, and customer experience efforts, turning words...

Publishing Qualitative Research from Community Projects: What Funders and Journals Want

Introduction Qualitative research rooted in community projects often holds the potential to generate deep, contextually rich insights that inform social programs, policy, and participatory action. However, the journey from community engagement to published work requires alignment with the expectations of both funders and peer-reviewed journals. Understanding these expectations is critical...

Ensuring Inter-Rater Reliability in Qualitative Coding: Techniques for Accuracy