Skip to Main Content

Research Data Management: Data Management

The Research Data Management Portal is designed to provide guidance, best practices, and resources on the steps within the research data lifecycle and its correlation to the requirements of established data management practices.

Where to Start:

Terms to Know

As we talk about Data Management and Data Curation, we should start with some basic definitions for clarity. Each phrase below describes a distinct set of actions and skills that are not synonymous and should not be used as synonyms.

Definitions

The principle and practice of making research products and processes available to all, while respecting diverse cultures, maintaining security and privacy, and fostering collaborations, reproducibility, and equity.

 

Source: open.science.gov

 

The deliberate planning, creation, storage, access and preservation of data produced from a given investigation. 

 

Source: University Library, Texas A&M University. “Data Management Defined - Research Data Management - Guides at Texas A&M University.” Research Data Management, October 1, 2013. https://tamu.libguides.com/research-data-management/overview

 

A Data Management Plan (DMP) is a formal document that describes the data generated during a research project; it details all aspects of data management that will take place during the research lifecycle when data are collected, organized, documented, stored, preserved, and shared. Additionally, a DMP is considered a living document that will be updated as needed throughout the lifecycle of a project.

 

Adapted from University Library, Texas A&M University. “Data Management Planning - Research Data Management - Guides at Texas A&M University.” Research Data Management, October 1, 2013. https://tamu.libguides.com/research-data-management/dmps

 

 The actions that enable data discovery and retrieval, maintain data quality, add value, and provide for re-use over time.

 

Adapted from: Graduate School of Library and Information Science at the University of Illinois at Urbana-Champaign. “Specialization in Data Curation,” 2013. https://ischool.illinois.edu/research/areas/data-curation

 

Drawing useful conclusions from large and diverse data sets through exploration, prediction, and inference.

 

Adapted from: Coding for Data - 2020 Edition “What is data science?” https://matthew-brett.github.io/cfd2020/intro/what-is-data-science.html

 

The executive or managerial "exercise of authority, control, and shared decision-making (planning, monitoring, and enforcement) over the management of data assets" (DAMA). Data governance, as described, is a managerial process to ensure data is managed, and is not the day-to-day managing of data by data creators and data stewards. In other words, the management of data is not the same ensuring data is managed.

 

Adapted from the Data Management Association (DAMA) https://www.dama.org/cpages/dmbok-2-image-download, accessed 2023-02-17, and John Ladley, Data Governance, 2012, page 11.

 

Those people taking technical and non-technical actions over the data lifecycle that: enable and improve data discovery and retrieval; maintain data quality; perform file format transformations as software changes; takes preservation actions such as ingesting datasets into repositories; appraises and selects data for preservation; and, guides data disposition when data comes to the end of its useful life.

 

Adapted from: Data Curation Centre Curation Lifecycle Model https://www.dcc.ac.uk/guidance/curation-lifecycle-model 

 

 

The people making connections and opening channels of communication between researchers, policy makers, software developers, and infrastructure providers (inside or outside the institution) so that the necessary elements that enable researchers to successfully implement RDM [research data management] can be put in place. The role of data stewards as promoters of FAIR data principles is essential to help researchers and institutions transition to modern RDM practices.  

 

Source: Hasani-Mavriqi, I., Reichmann, S., Gruber, A., Jean-Quartier, C., Schranzhofer, H., & Rey Mazón, M. (2022). Data Stewardship in the Making (1.0). https://doi.org/10.3217/p9fvw-rke48

 

 

  1. Information;
  2. About some other form of communication;
  3. In a structured format;
  4. Designed to serve a particular purpose; and,
  5. Which may serve in some circumstances as a surrogate for the original communication.

 

Professor Jerome McDonough, University of Illinois at Urbana-Champaign, August 25, 2011, in course 590MD, Metadata.

 

Data describing the context, content and structure of records and their management through time.

 

ISO/TS 23081-2:2007 Information and documentation - Records management processes - Metadata for records - Part 2: Conceptual and implementation issues

What Metadata is Not:

The NTL Data Services team prefers the two precise definitions of metadata above, over those more easily remembered. While the throwaway definition that "metadata is data about data" has been around since the term "metadata" was coined in the 1960s, it should be avoided, because it lacks precision, and is not exactly correct. Our understanding of metadata has evolved over the past 50 years.

Remember, data are measurements, facts that can be analyzed, or statistics. Depending on the context, you may actually have data about data. These data may include the value of the file size, the number of times the data has been cited in other work, or the number of times the data has been downloaded from a repository. These facts about data are interesting, and in some contexts may be simply measurements (data) or help to describe the data (information). This means metadata is "role-dependent." Context matters.

Guidelines to improve the Findability, Accessibility, Interoperability, and Reuse of digital assets. The principles emphasize machine-actionability (i.e., the capacity of computational systems to find, access, interoperate, and reuse data with none or minimal human intervention) because humans increasingly rely on computational support to deal with data as a result of the increase in volume, complexity, and creation speed of data.

 

  • Findable: The first step in (re)using data is to find them. Metadata and data should be easy to find for both humans and computers. Machine-readable metadata are essential for automatic discovery of datasets and services, so this is an essential component of the FAIRification process.
  • Accessible: Once the user finds the required data, she/he/they need to know how they can be accessed, possibly including authentication and authorization.
  • Interoperable: The data usually need to be integrated with other data. In addition, the data need to interoperate with applications or workflows for analysis, storage, and processing.
  • Reusable: The ultimate goal of FAIR is to optimize the reuse of data. To achieve this, metadata and data should be well-described so that they can be replicated and/or combined in different settings.

 

Wilkinson, M., Dumontier, M., Aalbersberg, I. et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci Data 3, 160018 (2016). https://doi.org/10.1038/sdata.2016.18

 

Display of a recommended bibliographic citation for a dataset to enable appropriate attribution by third-party users in order to formally incorporate data reuse as part of the scholarly ecosystem.

 

Johnston, Lisa R; Carlson, Jake; Hudson-Vitale, Cynthia; Imker, Heidi; Kozlowski, Wendy; Olendorf, Robert; Stewart, Claire. (2016). Definitions of Data Curation Activities used by the Data Curation Network. Retrieved from the University of Minnesota Digital Conservancy, https://hdl.handle.net/11299/188638.

 

A URL (or Uniform Resource Locator) that is monitored by an authority to ensure a stable web location for consistent citation and long-term discoverability. Provides redirection when necessary. E.g., a Digital Object Identifier or DOI.

 

Johnston, Lisa R; Carlson, Jake; Hudson-Vitale, Cynthia; Imker, Heidi; Kozlowski, Wendy; Olendorf, Robert; Stewart, Claire. (2016). Definitions of Data Curation Activities used by the Data Curation Network. Retrieved from the University of Minnesota Digital Conservancy, https://hdl.handle.net/11299/188638.

How do these terms relate?

Linked Processes

At NTL, we embrace the interconnectedness of data management, data curation, and data science skills to improve public access to USDOT data.

 

Data management is a necessary element of data curation. And to enable good data curation, it often means that we have to encourage researchers to think beyond a specific investigation, and adapt good data management practices to meet future needs. Or, to think of this connection graphically: Data Management ∈ Data Curation
Good data curation, in turn, enables broader, longitudinal data science. By preserving and adding value to data, data curation makes the task of data science more efficient and effective, as well as opening new output possibilities. Or to think of this connection graphically:  Data Curation ⇒ Data Science
If we keep future use and reuse of data in mind as we collect, create, and document new data, we should employ the best data management practices we can. Robust data management allows for long-term data curation and improves data sharing. Shared data allows data scientists to draw new knowledge out of previously created data, but only if the data is well documented, preserved, and accessible. We can sum up these connections graphically as:  Data Management ∈ Data Curation ⇒ Data Science, DM ∈ DC ⇒ DS

NTL Data Services are designed to take advantage of these linked processes, to provide the longest possible utility of USDOT created and collected data. 

Why Managing Data is Important:

Using Data Management Best Practices is Required by Legislation and Executive Orders

Government Legislation that Supports Data Management

Privacy Act: establishes a Code of Fair Information Practice that governs the collection, maintenance, use, and dissemination of personally identifiable information about individuals that is maintained in systems of records by Federal agencies.
Government Performance and Result Act:  is one of a series of laws designed to improve government project management. GPRA requires agencies to engage in project management tasks such as setting goals, measuring results, and reporting their progress. In order to comply with GPRA, agencies produce strategic plans, performance plans, and conduct gap analyses of projects
Freedom of Information Act: is a Federal law that allows for the full or partial disclosure of previously unreleased information and documents controlled by the United States government. FOIA defines agency records subject to disclosure, outlines mandatory disclosure procedures, and grants exemptions to the statute.
Paperwork Reduction Act: provides the basis for managing information as a resource. It mandates that agencies take steps to improve their data quality and data sharing capabilities.
Federal Data Strategy: designed to fully leverage the value of federal data for mission, service, and the public good by guiding the Federal Government in practicing ethical governance, conscious design, and a learning culture.
USDOT's Data Release Policy: DOT's framework for managing and standardizing the quality, objectivity, utility, and integrity of data disseminated to the public.
USDOT's Public Access Policy: developed to ensure public access to Publications and Digital Data Sets arising from DOT-managed research and development (R&D) programs. DOT already provides access to intramural and extramural research in progress and technical reports, as well as many final publications through partnerships with organizations such as the Transportation Research Board (TRB).
New Policies Coming Soon

What are the Benefits for Researchers?

Why Data Management is important for Researchers:

Data Management is the compilation of many small practices that make your data easier to find, easier to understand, less likely to be lost, and more likely to be usable both during a project and ten years later.

  1. Further Your Research Impact: Using good data management techniques makes your research more valuable and impactful to your research community and the public. Accessible, FAIR, open data that has robust documentation such as READMEs and a data dictionary, complete metadata for findability, and it well-structured and understandable makes your data easier to use by your colleagues. Reuse and citations of your data furthers your research's use in your field. This form of data sharing promotes transparency, collaboration with other researchers, and credibility.
  2. Avoid Duplication of Research: Having your research easily findable not only increases its impact but could save research money by limiting duplication of research efforts. Publishing all findings and data, even negative or failed results, can help the field learn and prevent the same experimentation. 
  3. Ensure Accountability and Reliability: Responsible data management demonstrates your ability as a researcher to create accurate and robust documentation and your complete understanding of the data you created. It can be used to prove your reliability as a researcher to funders, your institution, and your field. 
  4. Saves Researchers Time: Efforts made to make your data easily accessible not only saves your data depositor and reviewer time, but saves other researcher's time who want to use your data. Eliminating barriers to your data's usability will increase it's use and citation in other works. With less effort needed to access and understand your data, your data user can focus more on the interpretation and analysis of your work. 
  5. Easier CV and Resume Use: Having clear and accessible data that is open, findable, and tied to your name, or better, your persistent identifier (ORCiD), makes it easier to find your contributions to the field and boosts your credentials. In addition to journal articles, datasets and contributions to the field can be used to further your career. Making this data easy to access and understand will only help your career. 

Quiz Time