As we talk about Data Management and Data Curation, we should start with some basic definitions for clarity. Each phrase below describes a distinct set of actions and skills that are not synonymous and should not be used as synonyms.
Source: open.science.gov
Source: University Library, Texas A&M University. “Data Management Defined - Research Data Management - Guides at Texas A&M University.” Research Data Management, October 1, 2013. https://tamu.libguides.com/research-data-management/overview
A Data Management Plan (DMP) is a formal document that describes the data generated during a research project; it details all aspects of data management that will take place during the research lifecycle when data are collected, organized, documented, stored, preserved, and shared. Additionally, a DMP is considered a living document that will be updated as needed throughout the lifecycle of a project.
Adapted from University Library, Texas A&M University. “Data Management Planning - Research Data Management - Guides at Texas A&M University.” Research Data Management, October 1, 2013. https://tamu.libguides.com/research-data-management/dmps
Adapted from: Graduate School of Library and Information Science at the University of Illinois at Urbana-Champaign. “Specialization in Data Curation,” 2013. https://ischool.illinois.edu/research/areas/data-curation
Adapted from: Coding for Data - 2020 Edition “What is data science?” https://matthew-brett.github.io/cfd2020/intro/what-is-data-science.html
The executive or managerial "exercise of authority, control, and shared decision-making (planning, monitoring, and enforcement) over the management of data assets" (DAMA). Data governance, as described, is a managerial process to ensure data is managed, and is not the day-to-day managing of data by data creators and data stewards. In other words, the management of data is not the same ensuring data is managed.
Adapted from the Data Management Association (DAMA) https://www.dama.org/cpages/dmbok-2-image-download, accessed 2023-02-17, and John Ladley, Data Governance, 2012, page 11.
Adapted from: Data Curation Centre Curation Lifecycle Model https://www.dcc.ac.uk/guidance/curation-lifecycle-model
The people making connections and opening channels of communication between researchers, policy makers, software developers, and infrastructure providers (inside or outside the institution) so that the necessary elements that enable researchers to successfully implement RDM [research data management] can be put in place. The role of data stewards as promoters of FAIR data principles is essential to help researchers and institutions transition to modern RDM practices.
Source: Hasani-Mavriqi, I., Reichmann, S., Gruber, A., Jean-Quartier, C., Schranzhofer, H., & Rey Mazón, M. (2022). Data Stewardship in the Making (1.0). https://doi.org/10.3217/p9fvw-rke48
Professor Jerome McDonough, University of Illinois at Urbana-Champaign, August 25, 2011, in course 590MD, Metadata.
ISO/TS 23081-2:2007 Information and documentation - Records management processes - Metadata for records - Part 2: Conceptual and implementation issues
What Metadata is Not:
The NTL Data Services team prefers the two precise definitions of metadata above, over those more easily remembered. While the throwaway definition that "metadata is data about data" has been around since the term "metadata" was coined in the 1960s, it should be avoided, because it lacks precision, and is not exactly correct. Our understanding of metadata has evolved over the past 50 years.
Remember, data are measurements, facts that can be analyzed, or statistics. Depending on the context, you may actually have data about data. These data may include the value of the file size, the number of times the data has been cited in other work, or the number of times the data has been downloaded from a repository. These facts about data are interesting, and in some contexts may be simply measurements (data) or help to describe the data (information). This means metadata is "role-dependent." Context matters.
Wilkinson, M., Dumontier, M., Aalbersberg, I. et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci Data 3, 160018 (2016). https://doi.org/10.1038/sdata.2016.18
Johnston, Lisa R; Carlson, Jake; Hudson-Vitale, Cynthia; Imker, Heidi; Kozlowski, Wendy; Olendorf, Robert; Stewart, Claire. (2016). Definitions of Data Curation Activities used by the Data Curation Network. Retrieved from the University of Minnesota Digital Conservancy, https://hdl.handle.net/11299/188638.
Johnston, Lisa R; Carlson, Jake; Hudson-Vitale, Cynthia; Imker, Heidi; Kozlowski, Wendy; Olendorf, Robert; Stewart, Claire. (2016). Definitions of Data Curation Activities used by the Data Curation Network. Retrieved from the University of Minnesota Digital Conservancy, https://hdl.handle.net/11299/188638.
Data management is a necessary element of data curation. And to enable good data curation, it often means that we have to encourage researchers to think beyond a specific investigation, and adapt good data management practices to meet future needs. Or, to think of this connection graphically: | ![]() |
Good data curation, in turn, enables broader, longitudinal data science. By preserving and adding value to data, data curation makes the task of data science more efficient and effective, as well as opening new output possibilities. Or to think of this connection graphically: | ![]() |
If we keep future use and reuse of data in mind as we collect, create, and document new data, we should employ the best data management practices we can. Robust data management allows for long-term data curation and improves data sharing. Shared data allows data scientists to draw new knowledge out of previously created data, but only if the data is well documented, preserved, and accessible. We can sum up these connections graphically as: | ![]() |
NTL Data Services are designed to take advantage of these linked processes, to provide the longest possible utility of USDOT created and collected data.
Data Management is the compilation of many small practices that make your data easier to find, easier to understand, less likely to be lost, and more likely to be usable both during a project and ten years later.