Methodology

DREAMATLAS applies a multi-stage computational pipeline that combines established NLP techniques with exploratory, data-driven methods designed to limit reliance on predefined categorical frameworks when working with culturally diverse material.

Corpus construction and preprocessing

Dream reports are drawn from primary and secondary sources spanning multiple cultural traditions, historical periods, and collection formats. Texts are normalised through language-specific NLP pipelines and tagged by culture, period, genre, and source type to enable controlled subsetting for comparative analysis. Provenance and overlap across sources are tracked rather than collapsed, so that downstream analyses can be replicated on different corpus slices without erasing the conditions under which each report was originally collected.

Situational decomposition

Dream narratives are decomposed into fine-grained situational units — small, recurring scene-fragments that together compose a dream. The challenge addressed at this stage is well known: atomising narrative too aggressively strips away the context that gives a fragment its meaning. Leaving the house to go shopping and leaving the house to escape a fire are surface-identical but semantically distinct events.

To mitigate this, situational units are extracted through a structured semantic schema, informed by the 5W1H tradition, that captures each action together with the narrative context that gives it meaning. The schema is intentionally lightweight to remain portable across languages and cultural traditions, but rich enough to retain the situational frame that pure tokenisation discards. Extraction is LLM-assisted and continuously calibrated against expert annotation. The schema, the prompting strategy, and the calibration protocol are part of the project's research contribution and are documented in forthcoming publications rather than on this page.

Semantic analysis and pattern detection

Situational units are projected into a shared semantic space and analysed using unsupervised and semi-supervised methods — embedding-based similarity analysis, topic modelling, and clustering — in a staged, iterative sequence. Each stage informs the next, allowing the analysis to refine its representation of situational structures progressively without locking in categorical assumptions at the outset. Co-occurrence and adjacency among situations are tracked to surface higher-order narrative patterns that no single unit makes visible on its own.

Spatiotemporal mapping

Identified structures are mapped against corpus metadata to examine their distribution across cultures, geographies, and historical periods, addressing the project's questions about stability and variability directly. The mapping supports both broad views — how a situational pattern travels across centuries or traditions — and narrow ones — how it manifests within a single tradition or short historical window.

Linking and narrative traceability

Every situational unit retains a link back to the dream report from which it was extracted. This commitment to traceability serves two purposes. It allows interpretive review to remain grounded in the original narrative rather than in abstracted summaries, and it gives readers of the eventual public visualisation a way to move fluidly between aggregate patterns and the situated reports that produced them. The interaction design follows principles familiar from narrative-text visualisation, adapted to the particular structure of dream reports.

Human-in-the-loop oversight and validation

AI outputs are subject to continuous interpretive review across the analytical process. This is more than a validation step: the project treats the knowledge-production processes of AI — what it detects, what it misses, and on what basis — as an object of inquiry in its own right. Oversight operates at two levels: inter-rater agreement between automated extractions and expert annotation by specialists in dream studies and cultural anthropology, and stability checks on situational co-occurrence across resampled subsets to ensure that identified patterns are robust to corpus variation.

Computational findings are iteratively validated with domain specialists across the relevant cultural and historical traditions. The epistemological limits of AI-driven analysis — including sensitivity to corpus composition and the gap between statistical pattern and cultural meaning — are treated as productive methodological constraints rather than incidental caveats.

Tools and infrastructure

The pipeline builds on open-source NLP frameworks and existing digitised corpora, with transparent documentation of corpus and analytical decisions, in keeping with open science principles. Implementation details — including model versions, parameter settings, and the situational schema — are released alongside peer-reviewed outputs to preserve the integrity of the underlying research contribution prior to publication.

Corpus Construction & Preprocessing

Multi-source dream reports normalised, tagged by provenance, and deduplicated with overlap tracking.

Situational Decomposition

Narratives decomposed into fine-grained situational units via a structured semantic schema (5W1H), LLM-assisted with expert calibration.

Spatiotemporal Mapping

Situational patterns mapped across cultures, geographies, and historical periods to examine stability and variability.

Semantic Analysis & Pattern Detection

Situational units embedded, clustered, and analysed for co-occurrence and higher-order narrative structures.

Human-in-the-loop oversight · continuous across all stages

Fig. 1 — Analytical pipeline