Skip to main content

Wrangling the Past: Using Python to Prepare Legacy Digital Collections and ETDs for Preservation

This presentation will explore the ways in which Python scripts, both large and small, are effective tools for normalizing large amounts of data that make up disparate digital collections and preparing them for digital preservation. After setting up a digital preservation program, our institution was faced with a backlog of 33 digital collections, totaling 3.7 TB of data that was generated over the course of 15 years, as well as 17 years’ worth of electronic theses and dissertations (ETDs). We set out to appraise, organize, and package each collection for ingest into our digital preservation repositories with the aid of python scripts. The presenter will detail how digital preservation principles dictated scripting decisions, as well as considerations for implementing similar strategies at other institutions, including building workflows and troubleshooting scripts. The presenter will end with lessons learned from the completion of this large-scale project.


1:30 PM
15 minutes