Skip to Main Content

Transportation Library Quick Guide: Digitization: Digitization Process

Establishing standards and processes will help ensure that digitization projects fulfill their objectives. The Iowa DOT Library (now closed) developed a set of digitization standards and a digitization workflow to guide its efforts to produce digitized items that could be easily accessed, read and preserved. Former Iowa DOT Librarian Leighton Christiansen—now librarian/data curator for NTL—and colleagues detailed these frameworks in a 2016 conference paper (see Digitization Process below).

Digitization Process

To produce digitized items that are uniformly accessible, document standards should address:

  • Scanning resolution (based on available equipment and desired readability).
  • Layout and magnification settings.
  • Creation of searchable text (through Adobe Acrobat’s optical character recognition (OCR) technology).
  • File naming conventions (filenames should be human-readable).
  • Metadata fields (title, author, subject and keywords).

Establishing a standard workflow will help libraries process digitization projects efficiently and accurately estimate the resources needed for future projects. Sample steps may include:

  1. Select materials for scanning.
  2. Assess materials for scanning exceptions, such as damaged pages, oversized or foldout pages, or transparencies. These should be scanned by hand.
  3. Unbind the items. If possible, cut reports out of bindings for quicker scanning; destructive digitization can be justified if the item is held elsewhere and if the scanning improves general access to the information.
  4. Prepare for bulk scanning. Schedule in-house or contract bulk scanning for efficient, consistent results. (Sample scanner settings: Black and white; 400 dots per inch (DPI) scan resolution; two-sided scanning.)
  5. Bulk scan items. Watch for pages that will need to be rescanned, such as very dark or very light scans, folded pages and pictures (rescan using grayscale setting for higher quality).
  6. Convert scanned images to PDF files. Archive the master scan images for later use.
  7. Process each PDF through Acrobat’s OCR function. This step will enable full-text searching.
  8. Name the file, add metadata and set document properties. Refer to the NTL Digitization Resource Guide for resources on metadata standards and file naming conventions.
  9. Submit the final document to an open access digital repository. If your agency does not have a preferred repository, refer to the NTL list of data repositories for suggestions.
  10. Catalog the final document. Add the repository link to the existing OCLC record for the print item or derive a new record for the electronic item based on the existing physical item record.
  11. Send document information to Transport Research International Documentation (TRID) and other transportation libraries (coordinate in advance for large document quantities).

Accurately projecting the staff time and costs required for a digitization project can be challenging. Using a digitization cost calculator can help libraries create estimates that account for a range of considerations, including:

  • Staff experience with all aspects of digitizing, including scanning, post-processing for text recognition (OCR), writing metadata, cataloging and uploading to a repository.
  • Scanning equipment speed and capabilities.
  • Characteristics of materials to be digitized, including physical condition and number of exception pages requiring manual scanning.

The digitization cost calculator mentioned above includes data on the speeds of different types of equipment.

Example  

The Iowa DOT Library created detailed estimates of the time required to digitize printed research reports; these are detailed in a 2016 conference paper.

Outsourcing

In-house digitizing may not be feasible for some libraries, especially for large quantities of material. Outsourcing the task to an experienced firm may be a more cost-effective option.

Northeast Document Conservation Center logoThe nonprofit Northeast Document Conservation Center (NEDCC), founded in 1973 as “the first independent conservation laboratory in the United States to specialize exclusively in the conservation and preservation of paper-based collections,” maintains a website that offers a wealth of information about digital preservation, including outsourcing digitization. Guidance considers in-house versus outsourced digitization and describes the advantages and disadvantages of both.

NEDCC notes that organizations can outsource specific elements of the digitization process, including:

  • Original materials preparation.
  • Digitization (conversion).
  • Bibliographic records and metadata creation and/or update.
  • Additional file processing, including OCR processing of documents.
  • Printing and possibly binding analog duplicates of materials.
  • Storage and archiving.

For those opting to outsource, NEDCC provides a detailed discussion of vendor relations that addresses locating potential vendors, preparing a request for information and request for proposal (including samples), evaluating responses from vendors, developing a contract, and working and communicating with vendors.

Example

The University of Nevada, Las Vegas Library has developed a webinar examining the pros and cons of in-house digitizing and outsourcing. Additional resources on costs, staffing and equipment are available in the NTL Digitization Resource Guide.

Getting Started

Learn more about the digitization process:

    Resource guides:

    NTKN Communities of Practice: