Automating Metadata Hygiene to Improve Economic Research Discoverability
Fed in Print is an application indexing papers, publications, and speeches from twelve Federal Reserve Banks and the Board of Governors. By presenting metadata to larger discovery services, including Research Papers in Economics (RePEc), Fed in Print serves as a discoverability driver for Federal Reserve System research. As a service indexing roughly fifty thousand resources to date, link rot is a stubborn problem for the application; content migrations cause links to break. This talk will report on a long-term project in progress, revealing tools and considerations that can be helpful in designing similar projects. Utilizing the recently deployed Fed in Print API, more than fifty thousand file URLs indexed in the application were tested for 404 (“resource not found”) errors and other discoverability problems. The broken URLs have been divided among collaborators throughout the Federal Reserve System. This talk will focus not only on the tools involved, but also project complications and benefits. Tools include Python, Postman, Excel, and Teams. Complications include coding challenges and individual capacity limitations. Benefits other than greater discoverability include API bug detection and, hopefully, initiation of proactive conversations between librarians and content professionals responsible for publishing material indexed in the application before future migrations.