Data Management plays a parallel role with the data life cycle. There are a number of definitions of the data lifecycle. To paraphrase many of them you could say the data life cycle is “All the phases of data’s existence from planning to collection, through preservation, to reuse and potential destruction." There are also a number of models when it comes to the data life cycle and in most cases, organizations choose or create a data life cycle that works best with their type of data and data management process. The data lifecycle shown below is the USGS Data Lifecycle and is the version NTL views as most beneficial, because the very first action is planning. A crucial step that if not completed can result in great difficulties down the road.
You can find out more about the USGS Lifecycle on their site at https://doi.org/10.3133/ofr20131265
At NTL we often refer to the USGS Data Life-cycle Model. It is one of the simpler versions of the data life-cycle, but is viewed as most beneficial at NTL, because the very first action is planning. Planning, before data collection, is arguably the most important step.
Only after all of these questions are answered, can data start being collected. All of these questions and more should be included in your project's Data Management Plan. You can find more information on creating robust Data Management Plans for your research needs through the Bureau of Transportation Statistics Creating Data Management Plans (DMPs) page or the next section of this guide: Data Management and DMPs. On ROSA P, DMPs have their own collection US DOT Public Access Data Management Plans.
There are four methods of acquiring data:
This includes automated collection, the manual recording of empirical observations, and obtaining existing data from other sources.
When collecting new data, it is important to keep in mind the decisions and controls set out in the planning step. This will ensure data integrity and control throughout the collection process. Before collection begins, all analysis, definitions, standards, and procedures must be outlined and thoroughly defined. If changes are made during the project, it is important to update the planning documentation and the project DMP. All project management documents are living documents that change and become more robust as the project is executed. Keeping them updated with a firm change log is necessary for organization and will make data collection, analysis, and publication easier. Additionally, all data must be frequently reviewed during the collection process to ensure high data quality. Firm standards and procedures make it easier to create methodology sections later in the report.
Using previously collected data, whether it is legacy data, open data from another project, or data purchased from an organization, is another method of data acquisition. Using previously existing data may seem easier than creating original data, but it comes with its own set of questions and issues that could arise. For legacy data, it is important to ensure that the data is of high quality, methodology and acquisition is clear, it is still considered valid for use by the scientific community, and it is in a form that is usable and accessible using modern software. For using shared open data, it is crucial to understand the rights and terms of use of the data, especially how to cite the data. Those that read your research must understand that the data was not created by the project and originates from another source. For paid data, a firm purchase agreement must take place to establish what you can and cannot use this data for.
When working with existing data, it’s crucial to review, comprehend, and adhere to the DOT Information Dissemination Quality Guidelines. Although not all DOT-funded data is obligated to strictly adhere to these guidelines, it’s prudent to bear them in mind during your research process.
Regardless if you create the data or obtain it from an outside source, please adhere to conventional standards for common data types.
Date/Time | Language (English) | Geographical Location | Decimal Latitude | Decimal Longitude | |
---|---|---|---|---|---|
Value Example | 2024-01-01 00:00:00 | en | Phoenix, Maricopa County, Arizona, United States, http://sws.geonames.org/5308655/ | 42.33 | -98.1449 |
Standard | ISO 8601 | ISO 639-1 | Geonames URI | ISO 6709:2022 | ISO 6709:2022 |
Data Processing covers any set of structured activities resulting in the alteration or integration of data. Data processing can result in data ready for analysis or generate output such as graphs and summary reports. Documenting the steps for how data are processed is essential for reproducibility and improves transparency.
During the data acquisition phase, it is essential to anonymize personally identifiable information (PII) before publishing any research findings. PII encompasses more than just names and social security numbers; it includes smaller, seemingly innocuous details that, when combined, can identify individuals. For example, while race, ethnicity, sex, and age may not immediately identify a person, when the data pool is small and precise enough, there is potential for reidentification. Establishing what data points need to be anonymized during the planning and acquiring data steps saves time down the road and ensures the safety of your participants.
Protecting Participant Privacy When Sharing Scientific Data
Responsible scientific data sharing practices promote both effective data stewardship and protection of human research participant privacy. Learn about how to protect the privacy of human research participants when sharing data with resources provided by the National Institute of Health.
The analyze stage of the Science Data Lifecycle represents activities associated with the exploration and assessment of data, where hypotheses are tested, discoveries are made, and conclusions are drawn. Some example analysis activities are:
The OpenScience project is dedicated to writing and releasing free and Open Source scientific software. We are a group of scientists, mathematicians and engineers who want to encourage a collaborative environment in which science can be pursued by anyone who is inspired to discover something new about the natural world.
The Open Source Hardware Association (OSHWA) aims to foster technological knowledge and encourage research that is accessible, collaborative and respects user freedom. OSHWA’s primary activities include hosting the annual Open Hardware Summit and maintaining the Open Source Hardware certification, which allows the community to quickly identify and represent hardware that complies with the community definition of open source hardware.
Preservation involves actions and procedures designed to ensure long-term viability and accessibility of data. The common topics covered in response to preservation are:
It is important to create a robust data package, which includes the federally required DCAT-US metadata, detailed READMEs, Data Dictionaries, Codebooks, and other supporting documentation.
For data preservation it is always best to preserve in non-proprietary, open formats. These preferred formats are as follows:
Text | Dataset | Images | Multimedia | Maps | Metadata | Collections |
---|---|---|---|---|---|---|
TXT | CSV | TIFF | WARC | TIFF | JSON | ZIP |
TAV | PNG | WMV | PNG | XML | ||
XML | JPG | SWF | Shapefiles | |||
WARC | WMA | ArcGIS Project Files | ||||
RTF | MPEG | |||||
MD |
At this stage, it is also important to finalize which repository you will be using to store your research and ensure preservation. When choosing a repository, it is important to consider the following elements:
These important considerations ensure your research will be preserved, traceable, and findable to your peers, your supervisors, your boarder field, and the public. The Department of Transportation has a list of all DOT approved repositories, which fulfil the above requirements. Learn more about publication in the next section "Publish".
The ability to prepare, release, and share, or disseminate, quality data to the public and to other agencies is an important part of the lifecycle process.
Data sharing benefits the researcher, research sponsors, data repositories, the scientific community, and the public. It encourages more connection and collaboration between scientists, and better science leads to better decision making.
Additional things to consider when sharing data is the use of Persistent Identifiers and Data Citations. Further, assembling your data into a robust Data Package is a crucial step to submitting your research for publication. Please consult all of these guides for more information.
Desirable Characteristics of a Research Repository:
For more information on Desirable Characteristics of Data Repositories, please consult Desirable Characteristics of Data Repositories for Federally Funded Research and our Publishing Data LibGuide to learn how to submit and submit a new repository for consideration.