BibVik – A Protocol for Analyzing Citational Practice in Viking Age Archaeology

The problem with counting citations

Citations are not neutral. They are rhetorical acts.

A citation can:

Provide evidence for a claim
Establish a methodological lineage
Set up a contrast or foil
Signal membership in a scholarly community
Exclude or render invisible

Standard bibliometric tools count that something is cited.
They cannot tell us why or how.

What BibVik asks

How do various perspectives on the origins of the Viking Age manifest in the citation record?
Are these perspectives rooted in personal or professional identities and relationships?
How are citations used to situate one’s research in relation to prior work?

The seed paper

Lund & Sindbæk (2022), “Crossing the Maelstrom: New Departures in Viking Archaeology”

Review of shifting paradigms in Viking Age research
574 bibliographic references
Authored by scholars representing different, yet converging, research traditions
Supplemented by seven anonymous peer reviewers who recommended additional sources

Generational model:

Generation	What it contains
P	Seed paper
F1	Papers cited by P
F2	Papers cited by F1
F3+	… and so on

Why not use open scholarly metadata?

These data were never intended to be used for bibliometric analysis
Analysis tends to follow the variables readily at hand, rather than being driven by research questions

Scopus, Web of Science, OpenAlex, CrossRef — all tested, all insufficient
Poor coverage of monographs, edited volumes, grey literature, non-English sources — Viking Age archaeology is all of these

We need to generate data tailored for our questions, originating straight from their sources of origin, i.e. the publications themselves

But why not just use open scholarly databases such as Crossref, like everyone else?

The short answer is that these databases were never designed to be analysed.

They’re publisher registries and DOI assignment systems.

Studies end up analysing the handful of variables they provide because they’re available, not because they answer the research question.

Sampling and quality assurance are surprisingly uncommon in bibliometric work.

For our corpus, the coverage problem alone is especially troubling — monographs, edited volumes, and grey literature in Scandinavian languages are exactly what these databases handle worst.

But even with perfect coverage, the variables on offer wouldn’t enable us to answer our questions.

So we decided to generate our own dataset, and build the analytical structure around what we actually need.

Our analytical protocol

Creating a citation graph — structuring who cites whom, across generations
Extracting citation contexts — capturing the actual words around each reference
Interpreting citation functions — classifying the rhetorical work each citation performs
Detecting relationships between sources — characterizing how co-cited references relate
Identifying gaps and assessing coverage — measuring what we can’t see

Creating a citation graph

Which works are cited, and by whom?

Extract structured references directly from PDFs
Cycles through a few methods: regex, GROBID and LLM
Explicitly account for humanities citation conventions that standard tools miss
- Citations in footnotes, citations embedded in prose, etc
Remove duplicates and redundancies

Rather than relying on any single extraction method, we apply five different detection methods to every paper and merge the results.

GROBID is a machine learning system trained on scientific journal articles that gives us easy access to structured bibliographic data.

However it is trained on a limited dataset that excludes much of what constitutes our corpus.

So we supplement this by also scanning the body text independently with regular expression patterns for standard citation forms, and with a local LLM for discursive references that do not fit a standard pattern.

This is especially useful for extracting references from footnotes or from contexts where they are embedded in prose.

All five methods run on every paper.

The results are merged and deduplicated — the same work cited across multiple papers with slightly different metadata resolves to one entry.

In practice, the additional methods catch 10 to 40 percent more citations than the reference list alone, and for some papers where GROBID fails entirely, they’re the only source of data.

Creating a citation graph

Extract bibliographic metadata into a flat data file, using BibLaTeX elements
Each work appears once, assigned a unique citekey
Names, titles and other values are normalized, even when different papers use slightly different spelling or formatting
Each contains a record of every paper that cites it

{
  "citekey": "sindbaek2007",
  "entry_type": "article",
  "title": "The Small World of the Vikings",
  "author": [
    {"family": "Sindbæk", "given": "Søren M."}
  ],
  "journaltitle": "Norwegian Archaeological Review",
  "volume": "40", "pages": "59--74",
  "date": "2007",
  "generation": "F1",
  "cited_by": ["lund2022", "aannestad2018"]
}

Extracting citation contexts

Where and how is each reference is invoked?

With the graph in place, we can locate where each reference is invoked — and capture what the citing author actually says.

{
  "context_id": "ctx_lund2022_sindbaek2007_001",
  "verbatim_text": "Recent studies have drawn on network theory to
    reframe Viking-period exchange as polycentric rather than
    hub-driven (Sindbæk 2007; Barrett 2010). This shift has
    implications for how we understand the relationship between
    centres like Birka and Hedeby and their wider hinterlands.",
  "citing_citekey": "lund2022",
  "cited_citekey": "sindbaek2007",
  "co_occurring_citekeys": ["barrett2010"]
}

Extracts the verbatim text — not a summary.
And the co-occurring references — what company each citation keeps.

Interpreting citation functions

Why is each reference cited, and how is it characterized?

{
  "context_id": "ctx_lund2022_sindbaek2007_001",
  "citation_function": "theoretical_framing",
  "citation_function_explanation":
    "The citing authors invoke Sindbæk's network model as the
     conceptual framework that reoriented the field away from
     centre-periphery models of Viking-period exchange. The
     citation positions this work as a paradigm shift rather
     than an incremental contribution.",
  "confidence": "high",
  "analysis_mode": "content_enriched",
  "characterization_assessment": "selective",
  "characterization_explanation":
    "Sindbæk's paper also addresses small-world network
     properties and their implications for information flow,
     which the citing authors do not engage with. The citation
     foregrounds the exchange dimension to serve the chapter's
     focus on material culture circulation."
}

This is the core analytical contribution.

A local language model reads each citation context and classifies the rhetorical work the citation performs.

The explanation matters more than the label — it’s grounded in the specific language of the passage.

For instance, here Sindbæk 2007 is classified as theoretical framing: invoked as a paradigm shift, not just a finding.

And we’re still experimenting with this, but when we have the full text of the cited work, the analysis can go deeper by comparing the citing passage against what Sindbæk actually wrote.

In this case, the model found that the citation of Sindbæk 2007 was somewhat selective because his paper also addresses information flow and small-world properties, but the citing authors foreground only the exchange dimension.

That gap between how a work is cited and what it actually says is exactly the kind of citational practice this project aims to make visible.

Interpreting citation functions: the prompt

You are an expert in academic citation analysis. Your task is to analyze how a cited work is being used in its citing context.

Context: The following is a passage from an academic paper. The citation of interest is marked with [target citation]. Other cited works may appear in the same passage.

{context_text}

The target citation refers to: “{cited_title}” by {cited_authors} ({cited_year}).

Task: Analyze the function of this citation. Consider these common citation functions as a starting point, but do not limit yourself to them:

[controlled vocab and definitions]

Respond in JSON format with exactly these fields:

[JSON template]

Detecting relationships between sources

How do authors situate prior work in conversation?

{
  "cluster_id": "cluster_007",
  "members": ["barrett2010", "sindbaek2007", "ashby2015", "skre2007"],
  "relationship_type": "theoretical_conversation",
  "relationship_name":
    "Competing models of Viking-period network formation",
  "rationale":
    "Barrett and Sindbæk are cited together to propose
     network-based models emphasizing polycentric exchange.
     Skre represents an older centralized-hub model that
     the citing author positions as a foil. Ashby provides
     the empirical data from comb production used to
     adjudicate between these frameworks.",
  "directionality":
    "Barrett and Sindbæk build on each other; both contrast
     with Skre; Ashby provides evidence for evaluation.",
  "strength": "strong"
}

Two references co-occur when they appear in the same citation context.
Repeated co-occurrence (≥2 contexts) suggests the citing author sees a relationship.

Detecting relationships between sources: the prompt

You are an expert in academic citation analysis and bibliometrics. Your task is to analyze a group of references that frequently appear together in academic citations, and characterize the nature of their relationship.

Co-cited references: The following references frequently appear together in citation contexts: {reference_list}

Citation contexts where they co-occur: {contexts_text}

Citation function data: {function_data}

Task: Analyze the relationship between these co-cited references.

Consider:

Are they doing similar work (parallel findings)?

Is one building on another (methodological or theoretical lineage)?

Are they being used as contrasts or foils for each other?

Do they address complementary aspects of a problem?

Are they part of a theoretical conversation or debate?

Is there another type of relationship?

Respond in JSON format with exactly these fields: [JSON template]

Coverage and corpus assembly

How complete is our corpus?

This module:

Identifies success rates for each detection method applied to all records
Attempts recovery of missing references through online queries (Crossref, OpenAlex)
For F2, download open access content via the Unpaywall API, and estimate how much manual effort is required to complete the corpus

Why use an LLM at all?

The BibVik protocol considered several options:

Approach	Limitation
Manual coding	thousands of references across hundreds of documents
Scite Zotero plugin	Biased toward “neutral mention”, limited corpus
Jurgens (2018)¹ ² open-source NLP	Requires NLP expertise to implement
Local LLM (this toolkit)	Focused on specific tasks, auditable, all data stays local

The LLM is not replacing human judgment.
It provides a systematic annotation that surfaces patterns for human review.

Limitations of LLMs

No domain-specific training in citation analysis
No calibrated confidence (high/medium/low is a heuristic, not a probability score)
Cannot detect implicit or distributed citation functions
Each context is classified independently — no cross-context reasoning
LLMs are slow, especially when using non-specialized hardware
At temperature 0.3, results are mostly reproducible, not guaranteed

Next steps

Full run on the complete F1 corpus (many hours on a decent laptop)
Human review of a stratified sample of LLM classifications
Integration with the qualitative annotation protocol
Exploratory statistical and network analyses of the LLM classifications
Extend to F2 generation as coverage permits

A Protocol for Analyzing Citational Practice in Viking Age Archaeology

The problem with counting citations

What BibVik asks

The seed paper

Why not use open scholarly metadata?

Our analytical protocol

Creating a citation graph

Creating a citation graph

Extracting citation contexts

Interpreting citation functions

Interpreting citation functions: the prompt

Detecting relationships between sources

Detecting relationships between sources: the prompt

Coverage and corpus assembly

Why use an LLM at all?

Limitations of LLMs

Next steps

Thank you!