Skip to Main Content

Research Data Management: Data Lifecycle

The Research Data Management Portal is designed to provide guidance, best practices, and resources on the steps within the research data lifecycle and its correlation to the requirements of established data management practices.

Research Data Life Cycle Model

Overview

Data Management plays a parallel role with the data life cycle. There are a number of definitions of the data lifecycle. To paraphrase many of them you could say the data life cycle is “All the phases of data’s existence from planning to collection, through preservation, to reuse and potential destruction." There are also a number of models when it comes to the data life cycle and in most cases, organizations choose or create a data life cycle that works best with their type of data and data management process. The data lifecycle shown below is the USGS Data Lifecycle and is the version NTL views as most beneficial, because the very first action is planning. A crucial step that if not completed can result in great difficulties down the road. undefined


You can find out more about the USGS Lifecycle on their site at https://doi.org/10.3133/ofr20131265

Plan

At NTL we often refer to the USGS Data Life-cycle Model. It is one of the simpler versions of the data life-cycle, but is viewed as most beneficial at NTL, because the very first action is planning. Planning, before data collection, is arguably the most important step.

  • During the Planning step a number of things needed to be determined, such as:
    • What data is going to be collected
    • How data will be collected
    • What types of data will be collected
    • What file types of data can be expected
    • How will the data be organized
    • Who will be responsible for data
    • When will backups occur
    • Where will backups reside
    • What size of data is expected
    • Will there be sensitive data collected and if so how will it be handled
    • Whether and how much data will be shared, in the end

Only after all of these questions are answered, can data start being collected. All of these questions and more should be included in your project's Data Management Plan. You can find more information on creating robust Data Management Plans for your research needs through the Bureau of Transportation Statistics Creating Data Management Plans (DMPs) page or the next section of this guide: Data Management and DMPs. On ROSA P, DMPs have their own collection US DOT Public Access Data Management Plans.

Acquire

There are four methods of acquiring data:

  • collecting new data
  • converting/transforming legacy data
  • sharing/exchanging data
  • purchasing data

This includes automated collection, the manual recording of empirical observations, and obtaining existing data from other sources.


Collecting New Data

When collecting new data, it is important to keep in mind the decisions and controls set out in the planning step. This will ensure data integrity and control throughout the collection process. Before collection begins, all analysis, definitions, standards, and procedures must be outlined and thoroughly defined. If changes are made during the project, it is important to update the planning documentation and the project DMP. All project management documents are living documents that change and become more robust as the project is executed. Keeping them updated with a firm change log is necessary for organization and will make data collection, analysis, and publication easier. Additionally, all data must be frequently reviewed during the collection process to ensure high data quality. Firm standards and procedures make it easier to create methodology sections later in the report. 


Using Previously Collected Data

Using previously collected data, whether it is legacy data, open data from another project, or data purchased from an organization, is another method of data acquisition. Using previously existing data may seem easier than creating original data, but it comes with its own set of questions and issues that could arise. For legacy data, it is important to ensure that the data is of high quality, methodology and acquisition is clear, it is still considered valid for use by the scientific community, and it is in a form that is usable and accessible using modern software. For using shared open data, it is crucial to understand the rights and terms of use of the data, especially how to cite the data. Those that read your research must understand that the data was not created by the project and originates from another source. For paid data, a firm purchase agreement must take place to establish what you can and cannot use this data for.

When working with existing data, it’s crucial to review, comprehend, and adhere to the DOT Information Dissemination Quality Guidelines. Although not all DOT-funded data is obligated to strictly adhere to these guidelines, it’s prudent to bear them in mind during your research process. 

Regardless if you create the data or obtain it from an outside source, please adhere to conventional standards for common data types.

Date/Time Language (English) Geographical Location Decimal Latitude Decimal Longitude
Value Example 2024-01-01 00:00:00 en Phoenix, Maricopa County, Arizona, United States, http://sws.geonames.org/5308655/ 42.33 -98.1449
Standard ISO 8601 ISO 639-1 Geonames URI ISO 6709:2022 ISO 6709:2022

Process

Data Processing covers any set of structured activities resulting in the alteration or integration of data. Data processing can result in data ready for analysis or generate output such as graphs and summary reports. Documenting the steps for how data are processed is essential for reproducibility and improves transparency.

Protecting Personally Identifiable Information

During the data acquisition phase, it is essential to anonymize personally identifiable information (PII) before publishing any research findings. PII encompasses more than just names and social security numbers; it includes smaller, seemingly innocuous details that, when combined, can identify individuals. For example, while race, ethnicity, sex, and age may not immediately identify a person, when the data pool is small and precise enough, there is potential for reidentification. Establishing what data points need to be anonymized during the planning and acquiring data steps saves time down the road and ensures the safety of your participants. 

Protecting Participant Privacy When Sharing Scientific Data

Responsible scientific data sharing practices promote both effective data stewardship and protection of human research participant privacy. Learn about how to protect the privacy of human research participants when sharing data with resources provided by the National Institute of Health.

Analyze

The analyze stage of the Science Data Lifecycle represents activities associated with the exploration and assessment of data, where hypotheses are tested, discoveries are made, and conclusions are drawn. Some example analysis activities are:

  • Statistical Analysis
  • Visualization
  • Spatial Analysis
  • Image Analysis
  • Modeling
  • Interpretation

Open Source Scientific Software (Open Science)

The OpenScience project is dedicated to writing and releasing free and Open Source scientific software. We are a group of scientists, mathematicians and engineers who want to encourage a collaborative environment in which science can be pursued by anyone who is inspired to discover something new about the natural world.

Open Source Hardware Association (OSHWA)

The Open Source Hardware Association (OSHWA) aims to foster technological knowledge and encourage research that is accessible, collaborative and respects user freedom. OSHWA’s primary activities include hosting the annual Open Hardware Summit and maintaining the Open Source Hardware certification, which allows the community to quickly identify and represent hardware that complies with the community definition of open source hardware.

Gathering for Open Science Hardware (GOSH)

Hardware is a vital part of experiments process and advances in instrumentation have been central to scientific revolutions by expanding observations beyond standard human senses. Although scientists are frequently natural tinkerers, the current supply chain for science hardware limits access and impedes creativity and customization. Open Science Hardware addresses part of this problem through sharing open designs, which often take advantage of modern digital fabrication techniques. Expanding the reach of this approach within academic research, citizen science and education has potential to increase access to experimental tools and ease their customization and reuse while lowering costs. A growing number of people around the world are developing and using Open Science Hardware, but a coherent, self-organizing community has yet to emerge that could raise its profile and drive the social change within institutions that will increase uptake.

Preserve

Preservation involves actions and procedures designed to ensure long-term viability and accessibility of data. The common topics covered in response to preservation are:

  • Data Archiving
  • Data Disposition
  • Repositories for Data

It is important to create a robust data package, which includes the federally required DCAT-US metadata, detailed READMEs, Data Dictionaries, Codebooks, and other supporting documentation. 


File Formats

For data preservation it is always best to preserve in non-proprietary, open formats. These preferred formats are as follows:

Text Dataset Images Multimedia Maps Metadata Collections
TXT CSV TIFF WARC TIFF JSON ZIP
PDF XLSX PNG WMV PNG XML
XML TAV JPG SWF Shapefiles
WARC WMA ArcGIS Project Files
RTF MPEG
MD PPTX

Repositories

At this stage, it is also important to finalize which repository you will be using to store your research and ensure preservation. When choosing a repository, it is important to consider the following elements:

  1. Does it support searchability?
  2. Does it distinguish between authors and depositors? 
  3. Does it use persistent identifiers such as DOIs, Handles, RORs, and ORCIDs?
  4. Does it have multiple forms of metadata storage and export, including the required DCAT-US metadata file for all DOT funded research?

These important considerations ensure your research will be preserved, traceable, and findable to your peers, your supervisors, your boarder field, and the public. The Department of Transportation has a list of all DOT approved repositories, which fulfil the above requirements. Learn more about publication in the next section "Publish".

Publish

The ability to prepare, release, and share, or disseminate, quality data to the public and to other agencies is an important part of the lifecycle process.

Data sharing benefits the researcher, research sponsors, data repositories, the scientific community, and the public. It encourages more connection and collaboration between scientists, and better science leads to better decision making. 

Additional things to consider when sharing data is the use of Persistent Identifiers and Data Citations. Further, assembling your data into a robust Data Package is a crucial step to submitting your research for publication. Please consult all of these guides for more information.


Desirable Characteristics of a Research Repository:

  1. Is free and easy access
  2. Has clear use guidance
  3. Has risk management and safeguards in place
  4. Has a retention policy
  5. Has long-term organizational sustainability with contingency plans
  6. Uses unique persistent identifiers
  7. Uses metadata with schemas that are widely used and appropriate to the data
  8. Provides curation and quality assurance
  9. Ensures broad and measured reuse and attribution
  10. Allows datasets and metadata to be accessed and exported in a widely used, non-proprietary format
  11. Records provenance and version control
  12. Authenticates data submitters
  13. Has long-term sustainability
  14. Has documented security and integrity measures

For more information on Desirable Characteristics of Data Repositories, please consult Desirable Characteristics of Data Repositories for Federally Funded Research