Leveraging AI Tools for Automating Metadata Extraction

With an increase in the accessibility of Generative AI and Large Language Models (LLM) we now have the ability to apply these tools against a variety of materials, including images, text, and audio.

The UCLA Library is experimenting with applying AI/ML tools against digital materials that would typically require significant human intervention to extract relevant metadata. Examples include text extraction via OCR, interview transcription, and metadata record generation from local digital library objects.

In this discussion we outline our experience applying some of the more common AI tools against a collection of digital library material with complex layouts, with the aim of building a foundation for the creation of a partially or totally automated metadata pipeline.

Topics and tools discussed:

Optical Character Recognition (OCR) tools
Named Entity Recognition and related trained models
Applying LLM tools such as ChatGPT and Bard to extract metadata

Speaker(s)

Kristian Allen

Zoe Tucker

Leigh Phan

May 13^th

3:35 PM

15 minutes