Skip to Main Content

Research Data Management: Data Ethics

The Research Data Management Portal is designed to provide guidance, best practices, and resources on the steps within the research data lifecycle and its correlation to the requirements of established data management practices.

Introduction to Data Ethics

Research Data Ethics: Addressing Rapidly Evolving Standards

What are Research Data Ethics?

 

Research Data Ethics are the norms of behavior that promote appropriate judgments and accountability when acquiring, managing, or using data, with the goals of protecting civil liberties, minimizing risks to individuals and society, and maximizing the public good.

 

Source: Resources.data.gov Data Ethics Framework

 

Data Ethics Principles

Research data ethics are fundamental for U.S. federal research. Data ethics encompasses more than just ethics regarding participants in your study. Good data ethics dictates how researchers conduct their research, work with colleagues in the field, how they interact with communities such as Tribal Nations, and how they promote transparency in their research processes.

 

Data ethics can take many forms. Depending on your research project, you may not encounter all means of performing data ethics. However, being aware of all the aspects of data ethics is important to ensure accountability as a researcher. Good research practice is to adapt your approach to evolving norms and seek guidance from ethical organizations, funders, and your research community when navigating ethical dilemmas.

Federal Data Strategy Data Ethics Framework


Federal Data Strategy Data Ethics Framework

The Federal Data Strategy's Data Ethics Framework is an essential document for any researcher who works with or receives government funding. This framework is meant to address the gap in U.S. policy in data ethics. As the United States is one of the largest data producers in the world, it is essential that researchers manage and create data from the population responsibly and ethically. The Data Ethics Framework's purpose is to guide federal leaders, employees, researchers, and data users on how to make ethical decisions when handling, creating, managing, and acquiring data. Researchers and data creators at universities, state DOTs, and local governments who receive federal funding are also bound to performing their research with good data ethics. This framework is relevant throughout the data lifecycle, from creation to publication. It is crucial that all who work with data and the federal government understand these principles, the benefits of good data ethics, what actions they need to take to perform good data ethics, and any gaps their processes currently have. 

 

Benefits of Good Data Ethics

Data Ethics Framework's Data Ethics Tenants

1. Uphold applicable statutes, regulations, professional practices, and ethical standards. Existing laws reflect and reinforce ethics. Therefore, data leaders and data users should adhere to all applicable legal authorities. Legal authorities often address historic situations and issues and may not keep pace with the evolving world of data and technology. Organizational leaders are encouraged to maintain up-to-date, comprehensive ethical standards regarding data use and staff are responsible for learning and applying agency guidance appropriately.
2. Respect the public, individuals, and communities. Data activities have the overarching goal of benefiting the public good. Responsible use of data begins with careful consideration of its potential impacts. Data initiatives should include considerations for unique community and local contexts and have an identified and clear benefit to society.
3. Respect privacy and confidentiality. Privacy and confidentiality should always be protected in a manner that respects the dignity, rights, and freedom of data subjects. In this context, privacy is the state of being free from unwarranted intrusion into the private life of individuals, and confidentiality is the state of one’s information being free from inappropriate access and misuse. An essential objective of privacy and confidentiality protection is to minimize potential negative consequences through measures such as comprehensive risk assessments, disclosure avoidance, and upholding data governance standards. Data activities involving individual privacy should align with the Fair Information Practice Principles (FIPPs).
4. Act with honesty, integrity, and humility. All federal leaders and data users are expected to exhibit honesty and integrity in their work with data, regardless of job title, role, or data responsibilities. Federal leaders and data users should not perform or condone unethical data behaviors. When sharing data and findings, personnel should report information accurately and present any data limitations, known biases, and methods of analysis that apply. It should also be recognized that no dataset can fully represent all facets of a person, community, or issue. Federal leaders and data users are expected to have humility when presenting data, be open to feedback, and when possible, invite discussion with the public. In addition, federal data users should accurately represent their abilities when working with data.
5. Hold oneself and others accountable. Accountability requires that anyone acquiring, managing, or using data be aware of stakeholders and be responsible to them, as appropriate. Remaining accountable includes the responsible handling of classified and controlled information, upholding data use agreements made with data providers, minimizing data collection, informing individuals and organizations of the potential uses of their data, and allowing for public access, amendment, and contestability to data and findings, where appropriate.
6. Promote transparency. Individuals, organizations, and communities benefit when the ethical decision-making process is as transparent as possible to stakeholders. Transparency depends on clear communication of all aspects of data activities and appropriate engagement with data stakeholders. Promoting transparency requires engaging stakeholders through easily accessible feedback channels and providing timely updates on the progress and outcomes of data use.
7. Stay informed of developments in the fields of data management and data science. Advanced technologies provide great benefit to the public sector, but should be deployed with a commitment to accountability and risk mitigation. While traditional data use and analysis can introduce bias, emerging systems, technologies, and techniques require additional awareness and oversight because they can increase opportunities for bias. It is critical to remain informed of developments in the fields of data management and data science, especially as advanced methods impact future data collection, management, and use. In addition, new data innovations (e.g., systems, solutions, computational methods) emerge every day, increasing the importance for federal leaders and employees working with data to keep abreast of market innovations and learn how to ethically use new methods.

Source: Resources.data.gov Data Ethics Framework

Data Ethics for Human Data

Protecting Personally Identifiable Information

Protecting Participants' Personally Identifiable Information

Protecting personally identifiable information (PII) is essential to performing good research data ethics. When receiving funding from U.S. DOT, all research must be anonymized before publication. Researchers, the government, libraries, and university Institutional Review Boards (IRB) must work together to ensure that research participants are protected and understand risks to their data when participating in research. 

 

PII encompasses more than basic information such as name and demographics. It includes smaller, seemingly innocuous details that, when combined, can identify individuals. Establishing what data points need to be anonymized during the planning and acquiring data steps saves time down the road and ensures the safety of your participants. 

PII can include but is not limited to: 

  • Social Security Number
  • IP address
  • Address
  • Name
  • Age
  • Employment information
  • Race and Ethnicity
  • Disability
  • Income/Household income
  • Family size
  • Biometric data

While some of these data points are dangerous on their own, when combined with other data points, they can potentially expose your research participant. Ensuring that your research does not endanger your participants should be the utmost priority when working with humans and data.

Steps you can take to ensure that your data protects your participants includes:

  1. Outline and understand which data points need protecting
  2. Develop clear participant consent forms that clearly outline the risks and what data will be collected
  3. Bracketing data points such as age to limit risk
  4. Limit collecting PII unless it is essential to research focus
  5. Anonymize all data before publication

Learn More about Protecting Participant Data with the National Institute of Health's Module "Protecting Participant Privacy When Sharing Scientific Data"

Principles and Best Practices for Protecting Participant PrivacyDesignating Scientific Data for Controlled AccessConsiderations for Obtaining Informed Consent

Data Ethics and Artificial Intelligence

Ethics and AI

Ensuring Ethics with the Use of AI

As advanced technologies, such as artificial intelligence (AI), become integrated and relied on in the research process, it is important to uphold data ethics principles. Balancing the use of AI with data ethics is a tricky tightrope for researchers to navigate. However, there are important steps that a researcher should take when working with AI.

  • When using AI in any part of the research process, full disclosure of the tasks that the AI performed in required. Upfront disclosure to your funder, your university, and any project stakeholders should be stated during the project planning phase. It should also be made clear during publication. This disclosure should be in the report itself, in any datasets or products, and prominent on any publication's landing page. 
  • Protecting sensitive information from AI algorithms is extremely important. Sensitive information could be personally identifiable information (PII), information around tribal communities, or any pieces of information that could impact a group or individual if made public. While AI models have sophisticated safeguards in place to protect users from obtaining sensitive data, it is not yet known if PII or any other sensitive information can be retrieved from these AI models. Being cognizant of the risks that AI could pose to the people who participate in your research is essential to performing research with good data ethics. 
  • While AI is an excellent tool for researchers, AI can have inherent biases that may affect the experiment and its results. Protect your results from AI biases by thoroughly examining any outputs and ensure your data is fair.
  • Ensure that your participants have full knowledge and consent of any AI involvement in this research process. Data breaches could significantly impact a participant's life negatively. This outcome would not only result in a loss of participant's privacy but could negatively impact other aspects of a person's life, such as their relationships, their employment, and their mental wellbeing. As with any experiment involving participants, it is crucial they understand the information you are collecting, how it will be used, and the risks that come with AI involvement. Keeping this transparency with your participant is essential to good data ethics. 

Data Ethics and Tribal Communities

CARE Principles for Indigenous Data Governance

Working with Tribal Communities and their Data

The CARE Principles are principles written by the Indigenous Peoples to outline and protect their data, it's use, and Indigenous interests. When working with Tribal communities, it is essential to incorporate the CARE Principles into every step of your research process. The Care Principles are as follows:

C: Collective Benefit

C1: For inclusive development and innovation

Governments and institutions must actively support the use and reuse of data by Indigenous nations and communities by facilitating the establishment of the foundations for Indigenous innovation, value generation, and the promotion of local self-determined development processes.

C2: For improved governance and citizen engagement

Data enrich the planning, implementation, and evaluation processes that support the service and policy needs of Indigenous communities. Data also enable better engagement between citizens, institutions, and governments to improve decision-making. Ethical use of open data has the capacity to improve transparency and decision-making by providing Indigenous nations and communities with a better understanding of their peoples, territories, and resources. It similarly can provide greater insight into third-party policies and programs affecting Indigenous Peoples.

C3: For equitable outcomes

Indigenous data are grounded in community values, which extend to society at large. Any value created from Indigenous data should benefit Indigenous communities in an equitable manner and contribute to Indigenous aspirations for wellbeing.

A: Authority to Control

A1: Recognizing rights and interests

Indigenous Peoples have rights and interests in both Indigenous Knowledge and Indigenous data. Indigenous Peoples have collective and individual rights to free, prior, and informed consent in the collection and use of such data, including the development of data policies and protocols for collection

A2: Data for governance

Indigenous Peoples have the right to data that are relevant to their world views and empower self-determination and effective self-governance. Indigenous data must be made available and accessible to Indigenous nations and communities in order to support Indigenous governance.

A3: Governance of data

Indigenous Peoples have the right to develop cultural governance protocols for Indigenous data and be active leaders in the stewardship of, and access to, Indigenous data especially in the context of Indigenous Knowledge.

R: Responsibility

R1: For positive relationships

Indigenous data use is unviable unless linked to relationships built on respect, reciprocity, trust, and mutual understanding, as defined by the Indigenous Peoples to whom those data relate. Those working with Indigenous data are responsible for ensuring that the creation, interpretation, and use of those data uphold, or are respectful of, the dignity of Indigenous nations and communities.

R2: For expanding capability and capacity

Use of Indigenous data invokes a reciprocal responsibility to enhance data literacy within Indigenous communities and to support the development of an Indigenous data workforce and digital infrastructure to enable the creation, collection, management, security, governance, and application of data.

R3: For Indigenous languages and worldviews

Resources must be provided to generate data grounded in the languages, worldviews, and lived experiences (including values and principles) of Indigenous Peoples.

E: Ethics

E1: For minimizing harm and maximizing benefit

Ethical data are data that do not stigmatize or portray Indigenous Peoples, cultures, or knowledges in terms of deficit. Ethical data are collected and used in ways that align with Indigenous ethical frameworks and with rights affirmed in UNDRIP. Assessing ethical benefits and harms should be done from the perspective of the Indigenous Peoples, nations, or communities to whom the data relate.

E2: For justness

Ethical processes address imbalances in power, resources, and how these affect the expression of Indigenous rights and human rights. Ethical processes must include representation from relevant Indigenous communities.

E3: For future use

Data governance should take into account the potential future use and future harm based on ethical frameworks grounded in the values and principles of the relevant Indigenous community. Metadata should acknowledge the provenance and purpose and any limitations or obligations in secondary use inclusive of issues of consent.

FAIR and CARE Together

While the FAIR Principles are foundational to practicing good research data management and open science, it is not enough to be just FAIR. Using both the FAIR Principles and the CARE Principles when conducting research with Tribal Nations is fundamental to good research data ethics. While open data is a foundational tenant of U.S. DOT research, purely open data does not take into consideration tribal history, knowledge, perspectives, and power differences that may take place between the community and the researcher. Acknowledging both of these sets of principles will keep data open, while also respecting indigenous communities and knowledge. To learn more about the CARE principles and their interaction with the FAIR principles, please consult CARE Principles for Indigenous Data Governance.

Be FAIR and CARE     

Quiz