Storage, Preservation, & Discoverability

Storage and Preservation

The UW-Madison Libraries’ Digital Preservation Framework states a commitment to preserving its digital assets, linking digital management and preservation to the libraries’ and the campus’s strategic plans. Therefore digital master files, data, and metadata for the project will be maintained in a redundant, scalable storage infrastructure.

All working and access files will be stored and managed on the centralized campus Isilon NAS (or equivalent successor hardware), located in campus data centers operated by the Division of Information Technology. Backups are performed using Commvault Backup and Recovery to provide disk-to-disk backup to an additional campus location using different storage hardware (IBM S3). Daily backups are maintained for two weeks, weekly backups for 3 months, and quarterly backups for one year. The Library staff responsible for our storage and preservation infrastructure have, on average, over 20 years’ experience in library technology.

All raw master files, derivative masters, and metadata will be ingested into the libraries’ Fedora Commons repository, which will have no direct public access. Primary storage for the Fedora repository will be the centralized campus Isilon infrastructure mentioned in the previous paragraph, but all files will also be replicated to an IBM local cloud S3 SAN spread across campus data centers. The local IBM S3 cloud calculates its checksums for all content, and continually run verification processes, automatically repairing damaged files from its redundant copies.

The Libraries will use Amazon Web Services (AWS) Glacier service as the disaster-recovery remote storage platform. AWS offers redundant copies of assets stored in geographically remote locations, with built-in corruption checking and repair. AWS content will only be retrieved in the event of catastrophic loss of source data.

Curatorial actions will be encoded and preserved as PREMIS Events in an administrative datastream for each object. Prior to ingest, technical metadata is generated for each master file, based on MIME type, and is ingested as a datastream along with the object. Fedora Commons also calculates and stores checksums for all ingested datastreams, which will be monitored throughout the object’s lifecycle. Staff from both the Library Technology Group and the Shared Development Group (incorporating both library and campus IT staff) will be responsible for preservation monitoring and remedial actions. All campus IT staff involved in the digital collections and preservation platform are in positions funded by the libraries, and therefore dedicated completely to library applications.

Discoverability

The University of Wisconsin Digital Collections (UWDC) have always been dedicated to providing the greatest possible access to, and use of, its content. In addition to the 600,000+ volumes UW-Madison has made available through the Google Books project (subsequently preserved in HathiTrust), the UWDC also hosts over 18,000 digitized volumes on its own infrastructure. Because the UWDC has maintained its Open Access platform since 2000, its content is highly discoverable through standard web search tools. A Google search of some randomly selected titles (“Die grosse Not: Sammelbroschüre”, “A comparative view of the human and animal frame”, “Detail des nouveaux jardins a la mode”) shows that UWDC content is routinely at or near the top of search results. In addition to access through its own platform, the UWDC also provides access to content using the International Image Interoperability Framework (IIIF), and to its metadata Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH).