Submission Information Package (SIP): Items that have been submitted by the depositor.
Archival Information Package (AIP): A package that contains data that will be stored within a digital archive.
Dissemination Information Package (DIP): A package created from the Archival Information Package (AIP) to distribute digital content to users.
Best data file format for long-term preservation of tabular data => .CSV
Best Practice Tip: In order for data to be machine readable and interoperable, it should have only two types of rows: a single header row, and all others rows are data.
The purpose of the README document is to provide all of that contextual information that could not be included in the data file because it would break the data.
Best Practice Tip: In order for you and future users to know which of the dozens of “README.txt” files on your desktop go with which datasets, use clarifying, human-readable files names, such as:
bts_osp_national_census_ferry_operators_2016_README_2017_10_26.txt
Data dictionaries store and communicate metadata about data in a database, a system, or data used by applications. Data dictionary contents can vary but typically include some or all of the following:
Data Dictionaries are useful for a number of reasons.
The DCAT-US Metadata Scheme (.json) is a machine-readable format that is required by both DOT's open data policy and data.gov.
Additionally, a researcher should include any additional metadata schemes that are relevant to the data or field of study.
Metadata is an inextricable part of managing records or digital information in any format. The use of metadata supports methods to identify, authenticate, describe, locate, and manage resources in a precise and consistent way. This precision and consistency in turn allows information providers to meet research, business, accountability, and archival requirements. Additionally, ensuring complete metadata, and updating as needed, is critical to maintaining data quality.
There are many different kinds of metadata. In the world of digital objects, metadata is usually divided into 3 to 5 categories:
The Transportation Research Thesaurus (TRT) is a tool to provide a common and consistent language between producers and users of transportation information. The TRT covers all modes and aspects of transportation.
The TRT is a controlled vocabulary that NTL uses when cataloging and creating metadata for both data and reports.
To learn more information about the TRT and explore the thesaurus check it out at https://trt.trb.org/.
A knowledge management document for the data lifecycle. The DMP should be a living document that is created prior to the start of the project and updated throughout to accurately reflect and track the research as it progresses through the data lifecycle. For more information check out the DMP specific page within this LibGuide.
Include code books, scripts used during analysis, auxiliary tables, and any other supporting files that were created or used to collect, process, clean, or analyze the data.
Best Practice Tip: Be as complete as possible. The goals of a robust data package include fully documenting your processes to al-low a naïve user to replicate your results; and, understand the full context of the da-ta, so that they can decide intelligently whether your data meets their reuse need