Multi-Modal Machine Learning to Enhance the Accessibility of Natural History Collections
University libraries and museums hold vast collections of curated image data that document the natural and social world. While these data are, in theory, accessible to professionals and the general public, in practice, searching archival collections requires technical knowledge and relies on precise scientific terminology, which can be a barrier to its accessibility and broad appeal. Our project aims to enhance the accessibility of natural history collections data through artificial intelligence. We evaluate different “multi-modal” machine learning frameworks – combining insights from natural language processing and computer vision – and compare their performance to human subjects. We then develop a system that unifies collections images and both novice-and expert-level natural language descriptions. We apply our system to museum collections at the Florida Museum of Natural History to assess its effectiveness for querying existing databases. Preliminary findings suggest that larger models may not consistently outperform their smaller counterparts in certain image-to-text tasks. The workflow is transferable to broader text- and image-centric museum and library collections, laying the groundwork for custom natural-language search tools tailored to unique collections.