On legitimate uses of AI and open information commons, and commitment to community values
Zachary Batist, PhD
McGill University
Re-Defining Open Social Scholarship in an Age of Generative "Intelligence"
Implementing New Knowledge Environments / Canadian Society for Digital Humanities
zackbatist@archaeo.social | zackbatist | 0000-0003-0435-508X | zackbatist.info
Researchers share their work openly.
AI consumes it at scale.
Is this a new problem, or an old one made visible?
Researchers are obligated to share their work openly — for public benefit.
That work is being scraped to train AI models — for someone else’s benefit.
We are told this requires new rules, new laws, new technical processes.
But this framing takes the status quo for granted.
Translating data from original contexts to secondary applications has always been lossy.
This predates AI.
Researchers have always participated in information commons:
Technical camp
Infrastructures & mandates
Information as disembodied
Problems are legalistic in nature
Humanistic camp
Reforming science as inclusive
Knowledge as socially situated
Problems are rooted in culture
The technical camp dominates policy. The humanistic camp understands the actual problem.
“Universal access” sounds egalitarian.
But even the most committed open advocates expect citation, attribution, appropriate reuse.
Commons have never been, and don’t need to be, egalitarian.
They are governed — who contributes, who accesses, in what ways.
The boundary is not a wall. It’s a sniff test —
judged by behavior, care, and commitment, not credentials alone.
Aaron Swartz
Shared academic papers widely
→ Lionized
Meta
Scraped LibGen at scale
→ Condemned
Same act. Radically different valence.
One serves the community — provides access to underserved members, contributes to the commons.
The other extracts from it — a predatory actor in it for themselves alone.
The difference is relational, not technical or legal.
The National Emergency Library (2020) — granting access to books during Covid-19.
It applied a technical resolution to an underlying social problem.
The manner of the intervention mattered as much as the outcome.
Computer scientists / data scientists / physicists publish high-profile papers “solving” archaeological problems with large integrated datasets.
| Outside the community | Inside the community | |
|---|---|---|
| Treatment of data | Neutral, abstract, recombinant | Storied, situated, context-laden |
| Epistemic stance | Data taken as source of universal truth | Contributing one point of view |
| Community response | Intuitively rejected | Treated with consideration |
The difference is not credentials. It’s care —
understanding a dataset’s circumstances of creation comes from community membership.
We put our data out there. We made it FAIR.
We imagined certain use-cases. Certain users.
AI scraped it for different ends.
This is structurally similar to UNESCO World Heritage claims:
“for all humanity” — often at the expense of those who inhabit and maintain the site.
The problem isn’t AI per se.
It’s AI deployed without community membership — without the gradual participation that builds the tacit knowledge to use data responsibly.
Is AI being done by humanists,
or by engineers playing humanist
for an afternoon?
Humanities is not defined by the kinds of data it uses.
It’s defined by the care and approaches we take —
the training, the community embedding, the commitments.
You can’t substitute a flashy method for a good question,
a rigorous orientation, or an open mind.
Tool-fetishizing has always been a dead end.
This is actually a case for humanists to engage with AI —
critically, carefully, on their own terms —
because humanists are precisely well-equipped to do so.
Re-defining open social scholarship is not about
redrawing the line from scratch.
It’s about making the existing lines more legible —
more porous where appropriate,
defended where they matter.
The line isn’t gone.
It’s ours to tend.
These slides and the abstract are available at:
zackbatist.info/inke-2026