Skip to main content

Automating FAST subject heading assignment in the repository - some initial challenges

In this experiment with Python and FAST subject headings, an undergraduate student at Columbia University developed a program to take PDF inputs, extract the OCR, find frequent terms, and query them against FAST subject headings. Dissertations from the 1950s were used as the test set of documents. The resulting matches are exported from the program as a CSV which can be utilized in cataloging workflows within Academic Commons, the Columbia University institutional repository.

While the suggested FAST heading results were sometimes quirky, the project was very useful in mapping out the mechanics of how computer-mediated subject heading assignment might look in the near future. It was also useful in highlighting the continued need for human review and intervention in cataloging.


3:30 PM