How Multimodal LLMs Are Transforming Cultural Heritage Data with CIDOC-CRM
Overview
The CIDOC Conceptual Reference Model (CRM) is a cornerstone for structuring cultural heritage data, enabling interoperability across cultural institutions(Libraries, Archives, Museums, Galleries, etc.). With the rise of Multimodal Large Language Models (MLLMs), new opportunities are emerging to automate and enrich how we map, query, and interpret this data.
This post explores how LLMs potentially bridge diverse data types (text, images, audio, video, GIS, 3D) with CIDOC-CRM’s semantic framework and the challenges that come with it.
Roles
LLMs has potentials to reshape CIDOC-CRM workflows as followed:
-
Automated Data Mapping
LLMs parse unstructured records (e.g., handwritten ledgers, gis info, excavation notes) into CIDOC-CRM classes likeE22 Human-Made Object
orE53 Place
, reducing manual effort. -
Semantic Enrichment
Inferring implicit relationships (e.g., linking artifacts toE5 Event
orE21 Person
) and populating properties likeP4 has time-span
. -
Natural Language Interfaces
Translating queries like “Show me 18th-century French paintings” into SPARQL, using classes likeE36 Visual Item
. -
Education & Troubleshooting
Guiding users through CIDOC-CRM’s complexity (e.g., explainingE12 Production
vs.E11 Modification
). -
Cross-Dataset Interoperability
Mediating between CIDOC-CRM and other standards (e.g., BIBFRAME, Dublin Core, schema.org, Linked Open Data etc).
more beyond above…
Multimodal LLM Applications for CIDOC-CRM
1. Image Analysis: From Pixels to Provenance
LLMs combined with computer vision can:
- Identify objects (e.g., “Roman amphora”) →
E22 Human-Made Object
. - Extract metadata (materials, styles) →
P45 consists of
(bronze) orP3 has note
(conservation status). - Link symbols to context (e.g., a crest →
E21 Person
orE74 Group
).
Example:
A pottery shard photo → LLM infers it belongs to a E22
instance from E4 Period
(Roman era) and links it to E53 Place
(Pompeii).
2. Audio/Video: Capturing Oral Histories
LLMs process recordings to:
- Transcribe interviews →
E39 Actor
(speakers) andE7 Activity
(traditions). - Extract spatiotemporal context →
P7 took place at
(Kyoto) +P4 has time-span
. - Link audio to artifacts (e.g., a folk song →
E22
musical instrument).
Example:
An oral history about weaving → LLM creates E29 Design or Procedure
tied to E22
(textile) and E39 Actor
(artisan).
3. Text & Archives: Semantic Parsing
LLMs structure unstructured text by:
- Extracting entities (e.g., “Donated by X in 1920” →
E8 Acquisition
). - Handling multilingual records → universal CIDOC-CRM identifiers.
Example:
A ledger entry → LLM maps “acquired from Artist Y” to E8 Acquisition
with P14 carried out by
(donor).
4. 3D Models: Reconstructing Heritage
LLMs analyze LiDAR scans or 3D models to:
- Describe architectural styles →
E25 Human-Made Feature
+P2 has type
. - Reconstruct historical layers (e.g., Roman ruins under a church →
E19 Physical Object
). - AI-powered Game/Art/Design/Minecraft MCP Server(e.g., Blender MCP, more innovations in Awesome MCP Servers and Clients)
Example:
A temple scan → LLM identifies E25
columns and links motifs to E55 Type
(“Doric order”).
5. Cross-Modal Knowledge Graphs
LLMs synthesize data across formats:
- Linking manuscript images (
E36 Visual Item
) to transcribed text (E33 Linguistic Object
). - Mapping 3D artifacts to excavation sites (
E53 Place
) via GPS coordinates.
Example:
A diary sketch + text → LLM infers E6 Destruction
events for a lost artifact.
Challenges & Risks && Ethical Considerations
While promising, LLM integration requires caution:
- Accuracy & Ambiguity: Misinterpretations by LLMs (e.g., conflating creation and acquisition dates) require human validation.
- Bias & Ethics:: Reinforcing colonial narratives in metadata, LLMs may perpetuate biases in cultural narratives (e.g., colonial perspectives). Transparency in provenance and cultural sensitivity checks are critical
- Ontological Complexity: CIDOC-CRM’s depth (80+ classes, 150+ properties) demands fine-tuning LLMs on domain-specific data to avoid oversimplification.
- Scalability: Processing terabyte-scale 3D scans.
Solutions:
- Hybrid human-AI validation pipelines.
- Ethical frameworks for cultural sensitivity.
Future Directions
- CIDOC-CRM-Guided RAG: fact-level relationships generation for better accuracy.
- Tools like Arches + LLMs: Semi-automated CIDOC-CRM mapping.
- Generative Storytelling: Virtual exhibitions using
E5 Event
sequences. - Benchmarking Tools: Developing evaluation frameworks to assess LLM-generated CIDOC-CRM data quality.
Conclusion
Multimodal LLMs unlock unprecedented efficiencies in cultural heritage data management, from automating CIDOC-CRM mapping to enabling immersive narratives. However, their success hinges on collaboration between technologists, curators, and communities—ensuring these tools preserve not just data, but cultural meaning and equity.
Let’s build a future where AI amplifies heritage, never erases it.