Planning for public access throughout your research will help ensure that the data archiving process goes smoothly at the end. Using your data management plan as your guide, choose file formats and naming conventions that will make it easy to organize your data and share it with others. As you begin work, take the time to document and describe your data parameters, and use consistent formatting throughout your files. Finally, keep your data safe with a backup system. As you work:
Using platform-independent and nonproprietary formats whenever practical will maximize the future utility of your data. Use text (ASCII) file formats for tabular data, such as .txt or .csv (comma-separated values) formats.
Some preferred file formats for different content types include:
► Learn more:
Sources: Stanford University Libraries, National Transportation Library, USGS. Image credit: USGS.
Develop naming conventions and a folder hierarchy structure early. File names should:
Sample file names:
► Learn more: Data Best Practices and Case Studies, Stanford University Libraries, last updated November 2021.
Sources: Stanford University Libraries; University of California, Davis; USGS. Image credit: USGS.
Create data documentation (such as a parameter table) as you begin work rather than waiting until your project is complete.
Sample Parameter Table (USGS)
Source: USGS. Image credit: USGS.
Keep data organization consistent throughout your files.
An application like OpenRefine (formerly Google Refine) can help you locate and clean up inconsistent data.
► Learn more: Manage Spreadsheets, Stanford University Libraries, last updated November 2021.
Sources: Stanford University Libraries; University of California, Davis; USGS. Image credit: USGS.
To ensure data integrity, perform frequent checks on your data to identify any errors.
Source: USGS. Image credit: USGS.
To preserve your data and its integrity, save a read-only copy of your raw data files with no transformations, interpolation or analyses. Use a scripted language such as R, SAS or MATLAB to process data in a separate file (located in a separate directory). These scripts:
Source: USGS.
As you work, create back-up copies of your data often.
To ensure that you can recover from a data loss, periodically test your ability to recover your data.
Check with the Information Technology (IT) department in your organization for advice on the best backup systems for your needs.
► Learn more:
Sources: Iowa State University Library, Stanford University Libraries, USGS.