How Multimodal LLMs Are Transforming Cultural Heritage Data with CIDOC-CRM

Overview

The CIDOC Conceptual Reference Model (CRM) is a cornerstone for structuring cultural heritage data, enabling interoperability across cultural institutions(Libraries, Archives, Museums, Galleries, etc.). With the rise of Multimodal Large Language Models (MLLMs), new opportunities are emerging to automate and enrich how we map, query, and interpret this data.

This post explores how LLMs potentially bridge diverse data types (text, images, audio, video, GIS, 3D) with CIDOC-CRM’s semantic framework and the challenges that come with it.

Roles

LLMs has potentials to reshape CIDOC-CRM workflows as followed:

Automated Data Mapping
LLMs parse unstructured records (e.g., handwritten ledgers, gis info, excavation notes) into CIDOC-CRM classes like E22 Human-Made Object or E53 Place, reducing manual effort.
Semantic Enrichment
Inferring implicit relationships (e.g., linking artifacts to E5 Event or E21 Person) and populating properties like P4 has time-span.
Natural Language Interfaces
Translating queries like “Show me 18th-century French paintings” into SPARQL, using classes like E36 Visual Item.
Education & Troubleshooting
Guiding users through CIDOC-CRM’s complexity (e.g., explaining E12 Production vs. E11 Modification).
Cross-Dataset Interoperability
Mediating between CIDOC-CRM and other standards (e.g., BIBFRAME, Dublin Core, schema.org, Linked Open Data etc).

more beyond above…

Multimodal LLM Applications for CIDOC-CRM

1. Image Analysis: From Pixels to Provenance

LLMs combined with computer vision can:

Identify objects (e.g., “Roman amphora”) → E22 Human-Made Object.
Extract metadata (materials, styles) → P45 consists of (bronze) or P3 has note (conservation status).
Link symbols to context (e.g., a crest → E21 Person or E74 Group).

Example:
A pottery shard photo → LLM infers it belongs to a E22 instance from E4 Period (Roman era) and links it to E53 Place (Pompeii).

2. Audio/Video: Capturing Oral Histories

LLMs process recordings to:

Transcribe interviews → E39 Actor (speakers) and E7 Activity (traditions).
Extract spatiotemporal context → P7 took place at (Kyoto) + P4 has time-span.
Link audio to artifacts (e.g., a folk song → E22 musical instrument).

Example:
An oral history about weaving → LLM creates E29 Design or Procedure tied to E22 (textile) and E39 Actor (artisan).

3. Text & Archives: Semantic Parsing

LLMs structure unstructured text by:

Extracting entities (e.g., “Donated by X in 1920” → E8 Acquisition).
Handling multilingual records → universal CIDOC-CRM identifiers.

Example:
A ledger entry → LLM maps “acquired from Artist Y” to E8 Acquisition with P14 carried out by (donor).

4. 3D Models: Reconstructing Heritage

LLMs analyze LiDAR scans or 3D models to:

Describe architectural styles → E25 Human-Made Feature + P2 has type.
Reconstruct historical layers (e.g., Roman ruins under a church → E19 Physical Object).
AI-powered Game/Art/Design/Minecraft MCP Server(e.g., Blender MCP, more innovations in Awesome MCP Servers and Clients)

Example:
A temple scan → LLM identifies E25 columns and links motifs to E55 Type (“Doric order”).

LLMs synthesize data across formats:

Linking manuscript images (E36 Visual Item) to transcribed text (E33 Linguistic Object).
Mapping 3D artifacts to excavation sites (E53 Place) via GPS coordinates.

Example:
A diary sketch + text → LLM infers E6 Destruction events for a lost artifact.

Challenges & Risks && Ethical Considerations

While promising, LLM integration requires caution:

Accuracy & Ambiguity: Misinterpretations by LLMs (e.g., conflating creation and acquisition dates) require human validation.
Bias & Ethics:: Reinforcing colonial narratives in metadata, LLMs may perpetuate biases in cultural narratives (e.g., colonial perspectives). Transparency in provenance and cultural sensitivity checks are critical
Ontological Complexity: CIDOC-CRM’s depth (80+ classes, 150+ properties) demands fine-tuning LLMs on domain-specific data to avoid oversimplification.
Scalability: Processing terabyte-scale 3D scans.

Solutions:

Hybrid human-AI validation pipelines.
Ethical frameworks for cultural sensitivity.

Future Directions

CIDOC-CRM-Guided RAG: fact-level relationships generation for better accuracy.
Tools like Arches + LLMs: Semi-automated CIDOC-CRM mapping.
Generative Storytelling: Virtual exhibitions using E5 Event sequences.
Benchmarking Tools: Developing evaluation frameworks to assess LLM-generated CIDOC-CRM data quality.

Conclusion

Multimodal LLMs unlock unprecedented efficiencies in cultural heritage data management, from automating CIDOC-CRM mapping to enabling immersive narratives. However, their success hinges on collaboration between technologists, curators, and communities—ensuring these tools preserve not just data, but cultural meaning and equity.

Let’s build a future where AI amplifies heritage, never erases it.