Introduction

Electronic Health Records (EHRs) are “electronic versions of the paper charts in…[a]…doctor’s or healthcare provider’s office…[and]…may include…medical history, notes, and other [health] information.”[1] In addition to having grown immensely in popularity in the last 10+ years,[2] EHRs are also believed to improve the patient experience by reducing medical error, maximizing medical record accuracy and clarity.[3] Given that EHRs contain patients’ sensitive medical information, this data is protected by the Health Insurance Portability and Accountability Act of 1996 (HIPAA).[4] The advent of EHRs have increased the mobility of healthcare but with this convenience comes a new layer of potential security risks.[5] As such, medical professionals are required to secure sensitive health information according to the requirements delineated by HIPAA[6]; examples of protected health information (PHI) include personal details like name, birthdate, social security number, contact information, weight, prescriptions, treatment plans, etc.[7] To maximize the security of PHI, the HIPAA security rule mandates that it is protected by administrative safeguards (e.g., ensuring the effectiveness of implemented security measures), physical safeguards (e.g., using secure technology), organizational standards, and policies and procedures.[8]

In addition to making patient information more easily recorded and accessed, EHRs are also an invaluable source of data for quantitative medical research; the All of Us Data and Research Center, a subsidiary of the National Institutes of Health, standardizes all EHR data for analysis by means of the Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM).[9] Given that EHRs present a double-edged sword for research due to both their sensitive nature and value for medical research, we present 5 tips on using EHRs for research in an ethical and effective way.

5 HER Research Tips

Tip #1: Ensure Prior Comprehension of the Data Collection Process and Data Source

Prior to conducting data analysis, data scientists would benefit from ensuring they have a solid grasp of the data collection process.[10] This prior understanding of the data can make it easier for analysts to interpret the data that fits the context in which the data were collected. Relatedly, analysts should familiarize themselves with the planned source of electronic health records. Data scientists, and the work that they do, benefit from getting to know the EHR platform they anticipate using, along with any EHR-based sources of secondary data. For example, analysts will likely find it advantageous to have sufficient prior knowledge of what sorts of data are available, how these data were collected, and who collected these data. We also advise data scientists to have a data variable dictionary that contains important information such as the data’s type, the possible values of each variable, general summary statistics, missingness, etc.

Tip #2: Create a Research Plan to Maximize Data Quality 

We also recommend that analysts be proactive in maximizing the quality of the EHR data they plan to use.[11] One definition of data quality is “how well a dataset meets criteria for accuracy, completeness, validity, consistency, uniqueness, timeliness and fitness for purpose, and it is critical to all data governance initiatives within an organization.”[12] In this context, “quality” is not necessary whether data are good or bad but rather how well-suited a given set of data is for informing decisions.[13] Optimal data quality may be achieved by implementing strategies to appropriately handle missing data and other potential issues with the data that may arise. Besides missingness, data scientists may also need to remove extreme outlier values, duplicate values, impossible values, and other values that jeopardize data quality; analysts ideally lay out a plan or a standardized procedure to use, should they identify any data value that needs to be addressed to preserve the overall quality and integrity of the dataset.

Tip #3: Anticipate and Plan for Challenges

Data scientists will also ideally anticipate and plan for general challenges they may encounter during the time they are analyzing EHR records.[14] For example, analysts may detect inconsistencies within the dataset as they set out to analyze the data or find that the observations within the data lack units of measurement. Data analysts would also benefit from having a plan that addresses how to tackle issues with data consistency, completeness, and the like. In other words, making a plan to maximize data quality and making a plan to address potential challenges go hand in hand.

Tip #4: Make a Plan to Maximize Data Protection 

We recommend data scientists be intentional in maximizing data protection, especially when handling sensitive data like EHRs. With the rapid evolution of digital technologies and data collection, analysts must prioritize this to ensure individuals retain control over their information. For example, in 2016, the European Commission introduced the GDPR, outlining guidelines for organizations’ data processing. To quote the GDPR: “Personal data processing should be designed to serve mankind.”[15]

Tip #5: Be Transparent When Making Data Reports

For our final tip, we recommend that analysts prioritize transparency when making data reports following their analysis. Specifically, data scientists need to be clear and straightforward when discussing the quality of their data (e.g., missingness within the data), changes made to the data (e.g., variables removed ahead of the analysis), and any other limitations that characterize the data’s quality.[16] Transparency in every step of the process (e.g., data prep, data analysis, reporting findings, and presentation of findings) is beneficial for many reasons. Indeed, “to ensure the patients receive care as they need and to draw valid and reliable research findings, quality data are needed.”[17]

Conclusion

Leveraging Electronic Health Records (EHRs) as data sources in quantitative medical research presents a wealth of opportunities while also necessitating careful considerations to maintain ethical standards and data integrity. EHRs offer a rich repository of patient information, making them invaluable for research aimed at improving healthcare outcomes. However, the sensitive nature of this data requires a meticulous approach to ensure both accuracy and compliance with regulatory standards.

The first step in effectively using EHRs for research is ensuring a comprehensive understanding of the data collection process. Researchers must familiarize themselves with the specific EHR platform and the nature of the data it contains. Creating a robust research plan is equally critical to maximize data quality. Researchers must define clear strategies for handling missing data, identifying and addressing outliers, and ensuring overall data consistency. Anticipating and planning for potential challenges is another essential step. Having a predefined plan to tackle these issues ensures that researchers can address them promptly without compromising the research timeline.

Researchers must adhere to stringent data protection regulations, such as HIPAA and GDPR, to safeguard patient information. This includes implementing robust security measures and maintaining up-to-date knowledge of evolving digital technologies and data protection standards. Researchers should also be candid about the quality of their data, any modifications made during the analysis, and the limitations of their findings. Transparency fosters trust and credibility, ensuring that the research can be replicated and validated by other scientists.

By following these recommendations, researchers can navigate the complexities of using EHRs in medical research, ultimately contributing to advancements in healthcare and improved patient outcomes. Ethical and effective utilization of EHR data not only enhances the quality of research but also ensures that patient privacy and data security are upheld, fostering trust in medical research practices.

Take Away

As highlighted in the article, “To ensure the patients receive care as they need and to draw valid and reliable research findings, quality data are needed.” Balancing the immense potential of EHRs with the responsibility to safeguard patient privacy and integrity is crucial for advancing healthcare through reliable research.

[1] Office for Civil Rights. Privacy, Security, and Electronic Health Records. Department of Health and Human Services. https://www.hhs.gov/sites/default/files/ocr/privacy/hipaa/understanding/consumers/privacy-security-electronic-records.pdf

[2] Basil, N. N., Ambe, S., Chukwuyem, E. & Ekokobe, F. Health Records Database and Inherent Security Concerns: A Review of the Literature. Cureus. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9647912/

[3] Centers for Medicare and Medicaid Services. Electronic Health Records. Department of Health and Human Services. https://www.cms.gov/priorities/key-initiatives/e-health/records

[4] U.S. Department of Health and Human Services. Summary of the HIPAA Security Rule. https://www.hhs.gov/hipaa/for-professionals/security/laws-regulations/index.html

[5] U.S. Department of Health and Human Services. Summary of the HIPAA Security Rule. https://www.hhs.gov/hipaa/for-professionals/security/laws-regulations/index.html

[6] St. Louis College of Health Careers. What is HIPAA and How Does it Impact Electronic Health Records?

[7] St. Louis College of Health Careers. What is HIPAA and How Does it Impact Electronic Health Records?

[8] The Office of the National Coordinator for Health Information Technology. Chapter 4: Understanding Electronic Health Records, the HIPAA Security Rule, and Cybersecurity. Guide to Privacy and Security. Department of Health and Human Services. https://www.healthit.gov/sites/default/files/pdf/privacy/privacy-and-security-guide-chapter-4.pdf

[9] All of Us Research HUB. Data Sources. National Institutes of Health. https://www.researchallofus.org/data-tools/data-sources/

[10] Wang, W., Ferrari, D., Haddon-Hill, G. & Curcin, V. Electronic Health Records as Sources of Research Data. Protocol. https://link.springer.com/protocol/10.1007/978-1-0716-3195-9_11

[11] Wang, W., Ferrari, D., Haddon-Hill, G. & Curcin, V. Electronic Health Records as Sources of Research Data. Protocol. https://link.springer.com/protocol/10.1007/978-1-0716-3195-9_11

[12] IBM. What is data quality? https://www.ibm.com/topics/data-quality

[13] Bauman, J. Data quality management: What you need to know. SAS. https://www.sas.com/en_us/insights/articles/data-management/data-quality-management-what-you-need-to-know.html

[14] Wang, W., Ferrari, D., Haddon-Hill, G. & Curcin, V. Electronic Health Records as Sources of Research Data. Protocol. https://link.springer.com/protocol/10.1007/978-1-0716-3195-9_11

[15] GDPR Hub. Article 1 GDPR. https://gdprhub.eu/Article_1_GDPR#:~:text=The%20processing%20of%20personal%20data%20should%20be%20designed%20to%20serve,with%20the%20principle%20of%20proportionality.

[16] Wang, W., Ferrari, D., Haddon-Hill, G. & Curcin, V. Electronic Health Records as Sources of Research Data. Protocol. https://link.springer.com/protocol/10.1007/978-1-0716-3195-9_11

[17] Wang, W., Ferrari, D., Haddon-Hill, G. & Curcin, V. Electronic Health Records as Sources of Research Data. Protocol. https://link.springer.com/protocol/10.1007/978-1-0716-3195-9_11

Articles and White Papers About Data Sources

The Problem with Relying Solely on Dashboards

Articles and White Papers About Monitoring & Evaluation Case Study: Apprenticeship Program Evaluation Conducting a statistically representative comprehensive program evaluation which includes conducting a comprehensive evaluation for two workforce development programs and work with each of the vendors and the County to use interim findings to improve program design and...

Read More

What Types of Data Should You Track?

Articles and White Papers About Data Governance Planning What Types of Data Should You Track? Read More 5 Strategies for Ensuring Ethical Data Handling in Nonprofit Quantitative Research Introduction With the rapid advancement of technology, ethically engaging with data is more imperative than ever, particularly in the realm of quantitative...

Read More

Ethical Considerations in Utilizing Quantitative Design Data Sources in Research

Introduction From 1932 to 1972 the U.S. Public Health Service conducted a now infamous study called the Tuskegee experiment.[1] This experiment was designed to observe how untreated syphilis progressed in Black men.[2] During the recruitment process researchers did not receive informed consent from the participants; recruiters capitalized on local jargon,...

Read More