A Protocol for Analyzing Citational Practice in Viking Age Archaeology

Citational Politics and Practices in Viking Age Archaeology and Beyond

April 13, 2026

How do archaeologists use citations,
and can we study this at scale
without losing the nuance?

The problem with counting citations

Citations are not neutral. They are rhetorical acts.

A citation can:

  • Provide evidence for a claim
  • Establish a methodological lineage
  • Set up a contrast or foil
  • Signal membership in a scholarly community
  • Exclude or render invisible

Standard bibliometric tools count that something is cited.
They cannot tell us why or how.

What BibVik asks

  1. How do various perspectives on the origins of the Viking Age manifest in the citation record?
  2. Are these perspectives rooted in personal or professional identities and relationships?
  3. How are citations used to situate one’s research in relation to prior work?

The seed paper

Lund & Sindbæk (2022), “Crossing the Maelstrom: New Departures in Viking Archaeology”

  • Review of shifting paradigms in Viking Age research
  • 574 bibliographic references
  • Authored by scholars representing different, yet converging, research traditions
  • Supplemented by seven anonymous peer reviewers who recommended additional sources

Generational model:

Generation What it contains
P Seed paper
F1 Papers cited by P
F2 Papers cited by F1
F3+ … and so on

Why not use open scholarly metadata?

  • These data were never intended to be used for bibliometric analysis
  • Analysis tends to follow the variables readily at hand, rather than being driven by research questions

  • Scopus, Web of Science, OpenAlex, CrossRef — all tested, all insufficient
  • Poor coverage of monographs, edited volumes, grey literature, non-English sources — Viking Age archaeology is all of these

  • We need to generate data tailored for our questions, originating straight from their sources of origin, i.e. the publications themselves

Our analytical protocol

  • Creating a citation graph — structuring who cites whom, across generations
  • Extracting citation contexts — capturing the actual words around each reference
  • Interpreting citation functions — classifying the rhetorical work each citation performs
  • Detecting relationships between sources — characterizing how co-cited references relate
  • Identifying gaps and assessing coverage — measuring what we can’t see

Creating a citation graph

Which works are cited, and by whom?

  • Extract structured references directly from PDFs
  • Cycles through a few methods: regex, GROBID and LLM
  • Explicitly account for humanities citation conventions that standard tools miss
    • Citations in footnotes, citations embedded in prose, etc
  • Remove duplicates and redundancies

Creating a citation graph

  • Extract bibliographic metadata into a flat data file, using BibLaTeX elements
  • Each work appears once, assigned a unique citekey
  • Names, titles and other values are normalized, even when different papers use slightly different spelling or formatting
  • Each contains a record of every paper that cites it
{
  "citekey": "sindbaek2007",
  "entry_type": "article",
  "title": "The Small World of the Vikings",
  "author": [
    {"family": "Sindbæk", "given": "Søren M."}
  ],
  "journaltitle": "Norwegian Archaeological Review",
  "volume": "40", "pages": "59--74",
  "date": "2007",
  "generation": "F1",
  "cited_by": ["lund2022", "aannestad2018"]
}

Extracting citation contexts

Where and how is each reference is invoked?

With the graph in place, we can locate where each reference is invoked — and capture what the citing author actually says.

{
  "context_id": "ctx_lund2022_sindbaek2007_001",
  "verbatim_text": "Recent studies have drawn on network theory to
    reframe Viking-period exchange as polycentric rather than
    hub-driven (Sindbæk 2007; Barrett 2010). This shift has
    implications for how we understand the relationship between
    centres like Birka and Hedeby and their wider hinterlands.",
  "citing_citekey": "lund2022",
  "cited_citekey": "sindbaek2007",
  "co_occurring_citekeys": ["barrett2010"]
}

Extracts the verbatim text — not a summary.
And the co-occurring references — what company each citation keeps.

Interpreting citation functions

Why is each reference cited, and how is it characterized?

{
  "context_id": "ctx_lund2022_sindbaek2007_001",
  "citation_function": "theoretical_framing",
  "citation_function_explanation":
    "The citing authors invoke Sindbæk's network model as the
     conceptual framework that reoriented the field away from
     centre-periphery models of Viking-period exchange. The
     citation positions this work as a paradigm shift rather
     than an incremental contribution.",
  "confidence": "high",
  "analysis_mode": "content_enriched",
  "characterization_assessment": "selective",
  "characterization_explanation":
    "Sindbæk's paper also addresses small-world network
     properties and their implications for information flow,
     which the citing authors do not engage with. The citation
     foregrounds the exchange dimension to serve the chapter's
     focus on material culture circulation."
}

Interpreting citation functions: the prompt

You are an expert in academic citation analysis. Your task is to analyze how a cited work is being used in its citing context.

Context: The following is a passage from an academic paper. The citation of interest is marked with [target citation]. Other cited works may appear in the same passage.

{context_text}

The target citation refers to: “{cited_title}” by {cited_authors} ({cited_year}).

Task: Analyze the function of this citation. Consider these common citation functions as a starting point, but do not limit yourself to them:

[controlled vocab and definitions]

Respond in JSON format with exactly these fields:

[JSON template]

Detecting relationships between sources

How do authors situate prior work in conversation?

{
  "cluster_id": "cluster_007",
  "members": ["barrett2010", "sindbaek2007", "ashby2015", "skre2007"],
  "relationship_type": "theoretical_conversation",
  "relationship_name":
    "Competing models of Viking-period network formation",
  "rationale":
    "Barrett and Sindbæk are cited together to propose
     network-based models emphasizing polycentric exchange.
     Skre represents an older centralized-hub model that
     the citing author positions as a foil. Ashby provides
     the empirical data from comb production used to
     adjudicate between these frameworks.",
  "directionality":
    "Barrett and Sindbæk build on each other; both contrast
     with Skre; Ashby provides evidence for evaluation.",
  "strength": "strong"
}

Two references co-occur when they appear in the same citation context.
Repeated co-occurrence (≥2 contexts) suggests the citing author sees a relationship.

Detecting relationships between sources: the prompt

You are an expert in academic citation analysis and bibliometrics. Your task is to analyze a group of references that frequently appear together in academic citations, and characterize the nature of their relationship.

Co-cited references: The following references frequently appear together in citation contexts: {reference_list}

Citation contexts where they co-occur: {contexts_text}

Citation function data: {function_data}

Task: Analyze the relationship between these co-cited references.

Consider:

  1. Are they doing similar work (parallel findings)?
  2. Is one building on another (methodological or theoretical lineage)?
  3. Are they being used as contrasts or foils for each other?
  4. Do they address complementary aspects of a problem?
  5. Are they part of a theoretical conversation or debate?
  6. Is there another type of relationship?

Respond in JSON format with exactly these fields: [JSON template]

Coverage and corpus assembly

How complete is our corpus?

This module:

  • Identifies success rates for each detection method applied to all records
  • Attempts recovery of missing references through online queries (Crossref, OpenAlex)
  • For F2, download open access content via the Unpaywall API, and estimate how much manual effort is required to complete the corpus

Why use an LLM at all?

The BibVik protocol considered several options:

Approach Limitation
Manual coding thousands of references across hundreds of documents
Scite Zotero plugin Biased toward “neutral mention”, limited corpus
Jurgens (2018)1 2 open-source NLP Requires NLP expertise to implement
Local LLM (this toolkit) Focused on specific tasks, auditable, all data stays local

The LLM is not replacing human judgment.
It provides a systematic annotation that surfaces patterns for human review.

Limitations of LLMs

  • No domain-specific training in citation analysis
  • No calibrated confidence (high/medium/low is a heuristic, not a probability score)
  • Cannot detect implicit or distributed citation functions
  • Each context is classified independently — no cross-context reasoning
  • LLMs are slow, especially when using non-specialized hardware
  • At temperature 0.3, results are mostly reproducible, not guaranteed

Next steps

  1. Full run on the complete F1 corpus (many hours on a decent laptop)
  2. Human review of a stratified sample of LLM classifications
  3. Integration with the qualitative annotation protocol
  4. Exploratory statistical and network analyses of the LLM classifications
  5. Extend to F2 generation as coverage permits

Thank you!

All code, prompts, and controlled vocabularies: github.com/zackbatist/BibVik

These slides are available at: zackbatist.info/BibVik/analytical-protocol-slides