Ethical Considerations in Social Media Qualitative Data Collection
Introduction
In the years since its inception, social media has established itself as a mainstay in modern society. Of the approximately 6,000 American adults that Pew Research surveyed in 2023, 83% said they had ever used YouTube, with 68% having used Facebook, 47% having used Instagram, and 33% having used TikTok[1]. On a global scale, an estimated 5.17 billion people use social media in some capacity (as of July 2024)[2]; this approximates to roughly 67% of the world population (as estimated in 2019)[3]. The ubiquity of social media usage around the world has not been lost on academic audiences: many scholars have incorporated data harvested from social media into their research programs[4]. One proposed reason for the increased leverage of social media data is that it is often easier and faster than older data collection methods (e.g., census data, traditional survey data, etc.)[5]. The rapidly evolving landscape of social media platforms enables researchers to study phenomena at pace more closely aligned to the actual timing of given events5. Social media data are not, thankfully, a resource afforded only to researchers with the privilege of institutional affiliation5: much data from social media have been made available to the public for data analysis.
5 Effective Strategies for Social Media Data Collection
The public nature of many social media data makes them an especially valuable asset for the independent (individual) researcher. This is especially the case for qualitative scholars, given the personal and richly-detailed nature of both qualitative research and many a social media post. For example, qualitative methodology proved a natural fit for capturing the stories that sexual violence survivors shared online[6]. Given the accessibility and usefulness of qualitative social media data, we have curated 5 effective strategies for independent researchers conducting qualitative research.
- Engage Ethically with Social Media Data
Regardless of whether the data in question are considered “free real estate”, ethics must always be at the forefront of the research process. An ethical research process for qualitative social media collection includes aspects such as the implementation of the basic ethical standards presented in the Belmont Report (i.e., beneficence, respect, and justice for participants)[7], cross-cultural respect, informed consent, using best judgment, and the like[8]. Ethical social media research is particularly crucial given the debate between ethics experts regarding whether social media content is indeed public domain[9]. One researcher acknowledged this tension between the wealth of detail offered by social media data and the ethical obligations that scholars have to protect social media users: “The frankness of [some] posts…can provide rich data, but its ethical use presents challenges.”[10] Despite the associated complications, scholars conducting social media research remain responsible for doing everything in their power to ensure ethical engagement with data obtained from social media. This obligation remains regardless of whether a researcher is subject to an in-house Institutional Review Board (IRB) or are self-employed as an independent researcher.
- Read the (Digital) Room
Before collecting any social media data, researchers must take the time to “read the room”, or get a sense of the concerns and priorities of the social media users whose data researcher is interested in aggregating. “Reading the room” is an enactment of the Belmont Report’s mandate for respecting the rights and privacy of (potential) study participants. This is particularly crucial given that many people are justifiably concerned with how their data from technological activity will be used[11], social media platforms regarded with particular distrust[12]. Ford and colleagues provided one example of how researchers might “read the room” when planning to collect social media data: avoiding data collection in private social media spaces such as closed Facebook groups or Instagram posts on a private account[13]. Indeed, social media users who post content to a private group or account can reasonably expect that their privacy will be maintained in these environments; the need for consent is especially apparent in circumstances of this nature. Even outside of private social media settings, social media users do not generally post thinking that what they share may be used for research (and therefore are not providing informed consent at the time of posting)[14].
Social media researchers must “read the room” to get a feel for how to best respect the privacy and wellbeing of potential study participants; this should take priority over collecting the desired data. To ensure that potential participants’ wellbeing is at the forefront of a proposed study, researchers should keep in mind the following:
“Data [are] an extension of a person or groups of people, and therefore should be treated as you would treat people—respectfully…data are the words, thoughts, feelings, expressions, interactions, and contributions from individual people, it is not abstract.”[15]
- Maximize (Digital) Media Literacy
Part of conducting ethical and respectful research is being able to expertly navigate social media: The effective independent social media researcher must have exceptional digital literacy skills. Given the ubiquity of the internet and the omnipresent nature of social media for many people, readers may be tempted to skip this strategy (particularly researchers and others with advanced education). However, Artificial Intelligence (AI) and bots are increasingly convincing internet users that their content is human-generated[16] [17] [18]. Social media bots, defined as social media accounts “that operate ‘on their own, without human involvement, to post and do other activities on social media sites.’”[19] are perhaps the most insidious variety: many people do not always successfully identify bot-created content14 19.
The need for maximal digital media literacy, regardless of education level14, has never been greater especially for researchers wanting to conduct high-quality social media research. Interestingly, one method of determining whether social media content is authentically human is by using machine learning to pick up posting patterns of social media bots.[20] A resource from the U.S. Department of Homeland Security provides examples of the kind of content that digital media literacy (misinformation, malinformation, and disinformation) addresses. This resource provides key steps to achieving digital media literacy (e.g., considering the source, inspecting the URL, checking the author) and provides an effective framework for bolstering the digital media literacy of researchers preparing to collect social media data[21]. Given that digital media literate individuals are more likely to be able to identify inauthentic social media content[22], researchers should prioritize developing or strengthening this skill.
- Work Smarter, Not Harder
Another strategy that we recommend is efficiency, also known as “work smarter, not harder.” This strategy is not unique to independent researchers or scholar studying social media. However, improving efficiency is especially imperative for the independent researcher given the general lack of resources associated with not being affiliated with a research institution. One way to foster efficiency in the research process is through utilizing preexisting data, rather than starting from scratch; this will likely be extremely helpful for independent researchers (and scholar lacking research funding). For example, researchers can search the secondary data housed at the Social Media Archive through the Inter-university for Political and Social Research and find datasets that contain text for qualitative analysis; this resource is linked below in our list of additional resources.
While it may sound counterintuitive to the nature of an independent researcher, another way to “work smarter, not harder” is to collaborate with other scholars. In the words of one independent researcher, “To do your best work, you need other people”[23]; the successful independent researcher is, in fact, interdependent with supportive research colleagues. Many independent scholars have come to this realization and formed coalitions to support one another. For example, the National Coalition of Independent Scholars (NCIS) was established in 1989 and connects independent researchers around the world. Other networks, such as the Australia-based Research Society, are specifically designed for researchers within a certain country.
- Make Use of Reputable Research Tools
Another means of boosting efficiency is utilizing reputable research tools. Social media data collection tools include those designed to facilitate qualitative data mining. One popular tool helpful for collecting text data is TAGS (Twitter Archiving Google Sheet); TAGS is also free of charge to those wishing to utilize it. Scholars should note “that the search API over-represents the more central users and does not offer an accurate picture of peripheral activity; we also find that the bias is greater for the network of mentions.”[24] While a valuable resource for the independent researcher, scholars wishing to use TAGS should take these limitations into account.
One free data analysis tool is RQDA (R package for Qualitative Data Analysis), an R package designed to be installed in R Studio[25]. Supported on Windows, Mac, and even Linux, RQDA facilitates the analysis of text data (i.e., data formatted as plain text). Specifically, RQDA can carry out qualitative analysis tasks such as identifying codes, grouping codes into categories, provide a summary of identified codes, etc.
Conclusion
Qualitative data collection through social media offers independent researchers both unique opportunities and distinct challenges. The five strategies outlined—engaging ethically with social media data, reading the (digital) room, maximizing (digital) media literacy, working smarter, not harder, and making use of reputable research tools—serve as essential strategies for conducting effective and responsible qualitative research in this space.
Ethical engagement is paramount when handling social media data, particularly in qualitative research, where personal and detailed narratives are often collected. Researchers must ensure that informed consent, privacy, and respect for the individuals behind the data are always prioritized. This requires “reading the room” to understand the context in which users share their stories and ensuring that their data is used respectfully, especially in private or sensitive online spaces. Given the rise of bots and misinformation, digital media literacy becomes critical for qualitative researchers aiming to capture authentic human experiences. Efficiency is also crucial, particularly for independent researchers who may not have access to institutional resources. The strategy to work smarter, not harder suggests leveraging preexisting datasets and collaborating with peers to streamline data collection efforts, ensuring that time and resources are used effectively. Finally, using reputable research tools like TAGS and RQDA enables qualitative researchers to gather and analyze text data from social media platforms efficiently while keeping the limitations of such tools in mind.
By applying these strategies, researchers can more effectively navigate the complexities of collecting qualitative data from social media, ensuring ethical practices while making the most of the available tools and resources. This approach allows for rich, insightful qualitative analysis that respects the privacy and dignity of social media users.
Take Away
Collecting qualitative data from social media offers unique opportunities for independent researchers but also presents ethical and technical challenges. By focusing on ethical engagement, digital literacy, and efficient use of research tools, researchers can navigate these challenges effectively while gathering meaningful data. These strategies ensure that researchers can responsibly and resourcefully analyze social media content.
[1] Gottfried, J. Americans’ Social Media Use. Pew Research. https://www.pewresearch.org/internet/2024/01/31/americans-social-media-use/
[2] Statista. Number of internet and social media users worldwide as of July 2024 (in billions). https://www.statista.com/statistics/617136/digital-population-worldwide/
[3] Ortiz-Ospina, E. The rise of social media. Our world in Data. https://ourworldindata.org/rise-of-social-media
[4] Clavert, F. History in the Era of Massive Data: Online Social Media as Primary Sources for Historians. Geschichte und Gesellschaft (Vandenhoeck & Ruprecht). https://www.researchgate.net/publication/352295455_History_in_the_Era_of_Massive_Data_Online_Social_Media_as_Primary_Sources_for_Historians
[5] Open Access Government. Social media data for social and behavioural research. https://www.openaccessgovernment.org/social-media-data/113292/
[6] Mendes, K., Keller, J. & Ringrose, J. Digitized narratives of sexual violence: Making sexual violence felt and known through digital disclosures. New Media & Society. https://doi.org/10.1177/1461444818820069
[7] Belmont Report & National Commission for the Protection of Human Subjects of Biomedical and Behavioral Research. The Belmont report: Ethical principles and guidelines for the protection of human subjects of research. U.S. Department of Health and Human Services. https://www.hhs.gov/ohrp/regulations-and-policy/belmont-report/read-the-belmont-report/index.html
[8] the Association of Internet Researchers. Internet Research: Ethical Guidelines 3.0. https://aoir.org/reports/ethics3.pdf
[9] Hibbin, R. A., Samuel, G. & Derrick, G. E. From “a Fair Game” to “a Form of Covert Research”: Research Ethics Committee Members’ Differing Notions of Consent and Potential Risk to Participants Within Social Media Research. Journal of Empirical Research on Human Research Ethics. https://doi.org/10.1177/1556264617751510
[10] Harrington, C. Making ethical judgement calls about qualitative social media research on sensitive issues. International Journal of Social Research Methodology. https://doi.org/10.1080/13645579.2024.2393796
[11] McClain, C., Faverio, M., Anderson, M. & Park, E. How Americans View Data Privacy. Pew Research Center. https://www.pewresearch.org/internet/2023/10/18/how-americans-view-data-privacy/
[12] Faverio, M. Key findings about Americans and data privacy. Pew Research Center. https://www.pewresearch.org/short-reads/2023/10/18/key-findings-about-americans-and-data-privacy/
[13] Ford, E., Shepherd, S., Jones, K. & Hassan, L. Toward an Ethical Framework for the Text Mining of Social Media for Health Research: A Systematic Review. Frontiers in Digital Health. https://doi.org/10.3389/fdgth.2020.592237
[14] University of Pennsylvania. Use of Social Media as a Research Activity. https://irb.upenn.edu/homepage/social-behavioral-homepage/guidance/types-of-social-behavioral-research/use-of-social-media-as-a-research-activity/
[15] The Dave and Lucile Packard Foundation. New Resources: The Data Ethics Guidebook and Toolkit. https://www.packard.org/insights/publication/new-resources-the-data-ethics-guidebook-and-toolkit/#:~:text=Data%20is%20an%20extension%20of,distant%2Fremoved%20from%20people.%E2%80%9D
[16] Tyson, A. & Kennedy, B. Many Americans think generative AI programs should credit the sources they rely on. Pew Research Center. https://www.pewresearch.org/short-reads/2024/03/26/many-americans-think-generative-ai-programs-should-credit-the-sources-they-rely-on/
[17] Kennedy, B., Tyson, A. & Saks, E. Public Awareness of Artificial Intelligence in Everyday Activities. Pew Research Center. https://www.pewresearch.org/science/2023/02/15/public-awareness-of-artificial-intelligence-in-everyday-activities/
[18] Sidoti, O. & Vogels, E. A. What Americans Know About AI, Cybersecurity and Big Tech. Pew Research Center. https://www.pewresearch.org/internet/2023/08/17/what-americans-know-about-ai-cybersecurity-and-big-tech/
[19] Stocking, G. & Sumida, N. Most Americans have heard about social media bots; many think they are malicious and hard to identify. Pew Research Center. https://www.pewresearch.org/journalism/2018/10/15/most-americans-have-heard-about-social-media-bots-many-think-they-are-malicious-and-hard-to-identify/
[20] Gramlich, J. Q&A: How Pew Research Center identified bots on Twitter. Pew Research Center. https://www.pewresearch.org/short-reads/2018/04/19/qa-how-pew-research-center-identified-bots-on-twitter/
[21] Center for Prevention Programs and Partnerships. Media Literacy & Critical Thinking Online. U.S. Department of Homeland Security. https://www.dhs.gov/sites/default/files/publications/digital_media_literacy_1.pdf
[22] Sirlin, N., Epstein, Z., Arechar, A. A. & Rand, D. G. Digital literacy is associated with more discerning accuracy judgments but not sharing intentions. Harvard Kennedy School. https://misinforeview.hks.harvard.edu/article/digital-literacy-is-associated-with-more-discerning-accuracy-judgments-but-not-sharing-intentions/
[23] Hoyton, J. The Dangerous Myth of the “Independent Researcher”. PhD. Academy. https://phd.academy/blog/the-myth-of-the-independent-researcher/
[24] Gonzalez-Bailon, S., Wang, N., Rivero, A., Borge-Holtoefer, J. & Moreno, Y. Assessing the Bias in Samples of Large Online Networks. Forthcoming in Social Networks. https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2185134
[25] Huang, R. G. Welcome to RQDA Project. R-Forge. https://rqda.r-forge.r-project.org/
Articles and White Papers About Social Media
Ethical Considerations in Social Media Qualitative Data Collection
Introduction In the years since its inception, social media has established itself as a mainstay in modern society. Of the approximately 6,000 American adults that Pew Research surveyed in 2023, 83% said they had ever used YouTube, with 68% having used Facebook, 47% having used Instagram, and 33% having used...
Read More5 Effective Strategies for Individual Researchers Collecting Qualitative Data on Social Media
Introduction While social media—as we think of it today—may have started out as simply a novel and riveting distraction, it has evolved in both capacity and potential[1]. Indeed, social media has also hosted pivotal movements such as the #metoo movement hashtag trend in 2017 on X (formerly Twitter)[2]. While the...
Read More