Open archaeology, open source?

Collaborative practices in an emerging community of archaeological software engineers

Published

February 7, 2024

Abstract

Surveying the first quarter-century of computer applications in archaeology, Scollar (1999) lamented that the field relied almost exclusively on “hand-me-down” tools repurposed from other disciplines. Twenty five years later, this is no longer the case: computational archaeologists often find themselves practicing the dual roles of data analyst and research software engineer (Baxter et al. 2012; Schmidt and Marwick 2020), developing and applying new tools that are tailored specifically to archaeological problems and archaeological methods. Though this trend can be traced to the very earliest days of the field (Cowgill 1967), its most recent manifestation is distinguished by its apparent embrace of practices from free and open source software. Most prominently, since around 2015, there has been a rapid uptake of workflow tools designed for open source development communities, such as the version control system git and associated online source code management platforms (e.g. GitHub, GitLab). These tools facilitate collaboration among developers and users of open source software using patterns that can diverge quite radically from conventional scholarly norms (Tennant et al. 2020).

In this paper, we investigate modes of collaboration in this emerging community of practice using ‘open-archaeo’, a curated list of archaeological software, and data on the activity of associated GitHub repositories and users. We conduct an exploratory quantitative analysis to characterize the nature and intensity of these collaborations and map the collaborative networks that emerge from them. We document uneven adoption of open source collaborative practices beyond the basic use of git as a version control system and GitHub to host source code. Most projects do make use of collaborative features and, through shared contributions, we can can trace a collaborative network that includes the majority of archaeologists active on GitHub. However, a majority of repositories have 1–3 contributors, with only a few projects distinguished by an active and diverse developer base. Direct collaboration on code or other repository content—as opposed to the more passive, social media-style interaction that GitHub supports—remains very limited. In other words, there is little evidence that archaeologists’ adoption of open source tools (git and GitHub) has been accompanied by the decentralized, participatory forms of collaboration that characterise other open source communities. On the contrary, our results indicate that research software engineering in archaeology remains largely embedded in conventional professional norms and organizational structures of academia.

1 Introduction

In their seminal vision for open archaeology, Beck and Neylon (2012, 480–81) identified the movement as comprising a series of principles and practices “predicated on promoting open redistribution and access to the data, processes and syntheses generated within the archaeological domain” with the aim of “maximizing transparency, reuse and engagement while maintaining professional probity”. Open archaeology therefore promotes more thoughtful scholarly communication practices, most notably in operations relating to publishing, data-sharing, education, and review processes. Archaeologists are also actively engaged in open source software development as means of sharing their research processes and creating tools and resources for general use.

The relationship between open science and open source is complicated by rhetorical claims that have a questionable connection to how academics actually do open source. Does academic open source actually make research processes more transparent and improve research outcomes? Is it actually boosting efficiency by establishing a common store of knowledge and productive code? Is it actually helping to foster new globe-spanning connections and lead to novel research trajectories that would not otherwise come to pass? Basically, is there more to it than just uploading code and data files to the internet?

Despite hopeful aspirations espoused by open archaeology advocates (see Kansa, Whitcher Kansa, and Arbuckle 2014; Kintigh et al. 2015), these outcomes are not a given. As per Nguyễn and Rampin (2022), Pownall et al. (2023) and Leonelli (2023) we believe that these outcomes only actually arise in contexts where participants adhere to and are motivated by warrants, professional norms, and governance strategies that encourage these results. However, practical circumstances and systemic value regimes that frame what it means to work as an archaeologist presently inhibit the potential for radical transformation, even among open science’s most ardent supporters.

There is no question that archaeologists are prolific software developers (Batist and Roe 2023). But beyond simply making their code available on the web, do archaeologists also implement social strategies to advance open source ideals? Does archaeological open source actually help achieve greater transparency, sustainability, and community participation? And if not, what does it actually achieve?

This article presents a survey of archaeological software development with two goals in mind:

  1. we identify what kinds of software archaeologists are making; and
  2. we evaluate how archaeologists create these tools, with particular emphasis on practices of collaboration.

We use quantitative analysis to consider how archaeological software development may be benefiting from, or missing out on, the affordances that open source development models provide, specifically the value added through working as part of a broader community of invested stakeholders, processes of iterative improvement, and increased code transparency.

2 Open science and open source

Academic open source has a complicated relationship with open source as practiced by professional software developers, which has its own distinct history and is framed by different objectives, challenges, and value regimes. Despite this, the open science movement, within which open archaeology emerged, draws direct inspiration from open source. For instance, the Open Knowledge Foundation (2015) publishes a widely accepted definition of “open” in the context of scholarly communication that explicitly refers to the definition of “open source” published by the Open Source Initiative (2007), an authoritative open source advocacy group. The open science movement further mimics open source by operationalizing scholarly communication through technical infrastructures and protocols that closely resemble systems and processes designed to develop open source software (e.g., the use of plain text, line-resolution version control, emphasis on formal licencing, the general hacker aesthetic). However, academic work, including the development of academic software, differs significantly from the work involved in massive open source projects that literally run the internet, such as the Linux kernel, openSSL and the Firefox web browser. While they may use similar tools and technical protocols to manage coding operations, the open science and open source movements are governed by different social and professional warrants and interests. In other words, publishing code openly on the web has different meanings, impacts and implications for archaeologists and professional software developers (Ratto 2007; Kelty 2008, chap. 9).

2.1 Open source

Open source is a software development model that prioritizes transparent work processes. Initially driven by the idea that computer users should be free to understand and manipulate the software that they install on their computers (e.g., “free software”, as initially conceived by the Free Software Foundation), open source has become a means of collaborative software development (Kelty 2008: chapter 3, especially starting at page 99). By putting one’s code on the web without restriction on how it may be used or manipulated, this encourages creativity to flourish as people contribute to help improve the code base. Software thus emerges from the coordinated labour of worldwide volunteers, who shape the product according to the collective vision (see Kelty 2008 on the emacs saga for a quintessential example). An open code base may also be used to support alternative projects whose missions diverge from the original plan, and an entire project may be “forked”, or taken in a new direction if contributors are dissatisfied with how core developers run things.

Open source has traditionally been referred to as being based on meritocratic principles (Raymond 1999, 39). A good test of whether a contribution should be included in a published software release is whether it is functional (Kelty 2008, 220). Moreover, with more eyes looking over a code base, it is easier to identify flaws with a contribution and flag potential bugs or security issues (Raymond 1999, 27–30). This is all done in the spirit of producing functional code, and in ideal circumstances faulty contributions will be corrected before inclusion. Personal ego is minimized in favour of co-creating stable and functional outcomes (Raymond 1999, 39–41).

However, this is not the same as saying that open source is completely anarchic or based on the “wisdom of crowds”. In fact, successful open source projects incorporate complex organizational structures, governance strategies, and forms or social mediation to help delegate and vet contributions made by distributed participants (O’Neil 2009). They rely on, rather than eschew, institutional support structures, in order to motivate work, keep volunteer maintainers involved, and generally ensure that the project can be sustained over the long-term. Open source is more than just putting your code online; to be successful, it requires participation in a social experience (Ratto 2003; Kelty 2008).

In other words, as with many so-called “soft skills” that are crucial for academic professional development, additional competencies relating to the maintenance, management, and distribution of software, such as the ability to receive and implement feedback, set and stick with long-term goals, coordinate labour, document work practices, and collaborate with others, are grossly under-appreciated factors that contribute to an open source project’s success.

We therefore consider open source to be a means of collaboration more than a means of transmitting information. It involves developing software as part of a group, developing consensus, and working with common purpose. Crucially, it also involves having a welcoming attitude, a sense of humility, and an understanding that one’s work may be appropriated and used in unanticipated ways.

2.2 Open science

The open science movement comprises a series of practices and principles intended to make research more accessible, transparent, and efficient. Although the concept of “open” is somewhat nebulous, in terms of its abstract definition and with regards to what real-world applications count as being open, one commonly-cited definition describes content that “can be freely used, modified, and shared by anyone for any purpose” (Open Knowledge Foundation 2015). This definition does not state what open is for, how to be open, or any sort of social or discursive framing behind the open movement. However, most open science advocates (including archaeologists, as elicited by Beck and Neylon (2012) and Marwick et al. (2017)) claim that they are motivated by a desire to facilitate novel research opportunities, make participation in scientific research more equitable, reclaim science as a public good, and enhance how findings are validated and legitimized.

The idea that scientists should generally contribute to a public domain of knowledge without profit motive has led to open science being heralded as revolutionary, community-oriented, and anti-capitalist means of production. However, while open science does have the potential to effect radical change, this is not a given. The social and institutional contexts in which we do science is firmly embedded within capitalist and neoliberal power structures that reward individualistic competition and do little to actually encourage equitable and accessible research practices, and as such, make it difficult to fully embrace open science ideals (Mirowski 2018). Moreover, the open science movement, which is dominated by STEM disciplines, prioritizes a grossly simplified and asocial notion of what science is and entails. Namely, it considers science as the accumulation and assembly of a species-level understanding of the world, which is not held by any one individual but is stored in seemingly value-neutral and disembodied media, facts and observations. This is manifested by digital telecommunications systems that host files, document processes, facilitate co-working opportunities, and perform automated processes. However, these systems have become so emblematic of open science that the use of these tools and resources designed to support open science is often mistaken for actually doing open science.

Open science is typically compared with the open source movement in that they both involve a distributed, digitally-mediated and worldwide labour force, who somehow derive rough consensus directed towards assets held in the public domain (Tennant et al. 2020). But they differ in terms of the contexts in which they operate, the stakeholders involved, and the kinds of outcomes they produce. Whereas open source emerged from concern of consumer rights and then developed as a means of maintaining resilient and collectively motivated projects, open science comes out of a desire to make research practices more transparent and accessible. Open source is performed by professional and hobbyist software developers alike, and participants contribute in a wide variety of ways (including: programming, writing documentation, translating software and documentation, bug reporting, and financial support), but, in open science, scientists are usually the only participants actively involved in creating and maintaining contributions. Moreover, whereas open source projects often attract participants with varied stakes in the software and use cases in mind, open science projects are typically bounded by small communities of specialists with very particular needs (Kling, McKim, and King 2003). Additionally, open science is bounded by the professional contexts in which science operates, and as such, produces outputs that can be easily credited to specific sets of individuals for reasons of resume-building, tenure and promotion (Mirowski 2018; Dorta-González, González-Betancor, and Dorta-González 2021). Open science projects whose contributions are supported by research funding also face sustainability concerns, as participants lose motivation to contribute once funding runs out (Carver et al. 2022; Adema and Moore 2021). Once a project is completed, papers have been published, and credit has been allocated, it is common for scientists to mark their projects as finished and move on to new endeavours (Kelty 2008: 271-275; Howison and Herbsleb 2013). Open source projects, on the other hand, are motivated by a more practical need for the software to function properly in perpetuity, and contributors may remain actively or sporadically involved to satisfy users’ needs, or to direct users to derivative and functional forks of abandoned software (Kelty 2008: 278-281; Coleman 2012: 116-122; Hippel and Krogh 2003).

The adoption of open source development models among archaeologists is generally informed by the broader open science movement, which is motivated by a genuine desire to facilitate novel research opportunities, to make participation in scientific research more equitable, to reclaim science as a public good, and to enhance the means of validating findings. However, the predominant concern with implementing best tools to use, adopting optimal data processing pipelines, and tying into global, web-based infrastructures, protocols and standards (cf. Kansa, Whitcher Kansa, and Arbuckle 2014; Kintigh et al. 2015; Roosevelt et al. 2015) distract from fundamental tensions and contradictions regarding the actual value of working in the open. For instance, Faniel et al. (2013, 299–301), Atici et al. (2013, 676–77), Huggett (2018, 2022), Sobotkova (2018), Opitz et al. (2021), Hacıgüzeller, Taylor, and Perry (2021), and Batist (2023) demonstrate that to make the reuse of archaeological data feasible and useful in a practical sense, it is necessary to re-introduce social friction that these infrastructures are designed to eliminate. In other words, the pressures and circumstances of being an archaeologist and doing archaeological research assert themselves when attempting to make practical use of these infrastructures, and therefore must be accounted for in their design and implementation. In this paper we aim to identify similar sources of dissonance with regards to the promise, potential, and actual implementation of open source software development models among archaeologists.

2.3 Git and GitHub

Open source is an inherently internet-based development model and is supported by technical infrastructures which facilitate global distribution of labour and code. Here we provide a brief overview of key technologies that archaeologists have come to rely on as they develop open source software. See Table 1 a glossary of the git-, GitHub- and software engineering-related terminology which we use here and throughout this paper.

tinytable_e859lp50eb8rtkjqr5lc
Table 1:

Glossary of git and GitHub terminology

Term Definition
Comment On GitHub, text post attached to an issue, including the first one that describes the issue
Commit Set of changes (addition, alteration, or deletion) to files in a repository that has been recorded by git as one entry in its log
Commit access Ability to make changes to a repository directly, without making a pull request
Contributor User that has made at least one commit to a specified repository
Follow Add activity by another user to a user’s timeline
Forge Web-based platform for hosting, distributing and facilitating collaboration on version controlled computer code, e.g. GitHub, GitLab, Codeberg
Fork Copy of a repository owned by another user; forking is a prerequisite to making a pull request
git Open source version control software
GitHub Commercial platform that freely hosts git repositories and provides extended collaboration and social networking features, such as pull requests, issues and stars
GitLab Open source alternative to GitHub
CodeBerg Open source alternative to GitHub
Issue Feature of GitHub that records and tracks a bug report, feature request or other suggestion in a repository
Maintainer Individual that has overall control of a repository, generally assumed to be its primary contributor. Repositories can have multiple users with commit access in addition to the maintainer.
Merge Accept a pull request and incorporate its changes into a repository.
Organization Entity representing a group of users, which can also own repositories
Pull request Mechanism by which users that don’t have commit access to a repository can contribute to it. The repository’s maintainer or another user with commit access must decide whether to merge (accept) the changes, or decline them.
Repository Individual project that uses git for version control. Can include a mix of different types of files.
Star GitHub’s version of a ‘like’, applied by users to a repository.
Timeline Chronological feed of GitHub activity from repositories a user has starred and other users they follow. Also includes repositories that a user is not following if they are ‘trending’ or determined relevant by GitHub’s algorithm.
User On GitHub, an individual with an account that can own repositories
Version control System for tracking changes (additions, alterations, or deletions) in a set of files, typically but not exclusively computer code

Chief among these is git, a protocol designed to facilitate open and distributed contributions to a common code base. It operates by providing mechanisms for synchronizing communal, web-based public repositories with local iterations stored on contributors’ private workstations. Contributors who volunteer or are assigned to develop, inspect, or revise a specific aspect of a code base download a copy of the public repository into their own work environment, create a fork in which they apply their modifications, and then request that their fork be merged into the central code base. After a public repository’s maintainers decide to merge the proposed changes into the communal code base, other developers may use git to download these changes while keeping their own independent forks intact.

Git is also designed to facilitate code review and version control. All modifications are tracked as “diffs”, which highlight additions or deletions to the code base, including changes within individual files. Typically, a contributor will group a series of changes into a more comprehensive “commit” based on a specific task or part of a workflow. Commits are always accompanied by a message, in which the contributor (ideally) describes the reason and context for the changes included in the commit. Moreover, git assigns each commit a unique identifier and identifies the contributor by name and email address to ensure some degree of public accountability.

Software forges—collaborative web platforms like GitHub, GitLab and Codeberg—are designed to facilitate open source software development by hosting public git repositories. However, they also support common software developer and project management practices, such as issue and bug tracking, code-commenting, task management, identity and permissions management, web publishing, vulnerability detection, creation and maintenance of metadata, and financial sponsorship. These platforms also implement standard social media functions, like the ability to follow projects and individual users to receive updates on their activities, “star” certain repositories as a combine bookmarking and ‘like’ feature, and maintain a public-facing profile that includes personally-identifying information (e.g. profile picture, username, real name, employer or affiliation), references to all public activity on the platform, and links to the user’s other social media profiles. Code-sharing platforms thus serve as comprehensive developer portfolios and community networking resources. While these additional features are meant to complement and enhance the experience of contributing to open source projects, they are not actually part of the git protocol.

3 Data and methodology

We present an exploratory quantitative analysis of open-archaeo (open-archaeo.info, Batist and Roe 2023), a directory of 493 pieces of open source archaeological software and other digital resources maintained primarily by one of us (ZB) since 2018.

We compiled the dataset by browsing collaborative software development platforms, relying heavily on their social networking features. More specifically, we update open-archaeo by manually crawling through archaeologists’ profiles on these platforms, as well as on other personal, professional, and institutional websites that describe and host additional archaeological software. We supplement this quasi-systematic collection strategy with word-of-mouth contributions made by interested individuals who identified relevant work that we initially overlooked.

Open-archaeo is a relatively comprehensive list. While our initial intention was to only list open source software, its scope has expanded to include all software created by and for archaeologists. Apart from regular updates by its primary maintainer (ZB), it has been expanded by a wider network of contributors and has benefited from the wider range of domain specialisms this has brought. However, open-archaeo generally lacks software written before archaeologists started using collaborative software development platforms such as GitHub, and software that is not shared on the web at all. The dataset is also limited by the experiences of its primary maintainers.1

tinytable_ih055xfya80mvq0ye5s9
Table 2:

Software forges used by open archaeology projects

Host n %
GitHub 410 83.0%
Codeberg 16 3.2%
GitLab 6 1.2%
Bitbucket 1 0.2%
Launchpad 1 0.2%
None 60 12.1%

Where applicable, we obtained more detailed information about each repository’s contents and contribution histories from the GitHub API (application programming interface). Our analysis incorporates data on 407 repositories, comprising 145548 commits, 1920 issues/pull requests, and 22303 from 561 distinct users, as well as repository metadata on programming languages used, licensing, stars and forks, and so on.

We opted to only collect repository data from GitHub because it is the most popular forge platform used by open-archaeo projects (Table 2). This means that projects that do not use version control (12% of the total), or host it elsewhere (5% of the total), are excluded from these parts of the analysis, though we were still able to perform an analysis of their contents and authorship from the data compiled in open-archaeo itself. We also cannot include collaboration through offline or private channels, or forms of collaboration we do not know about. We did not directly observe or interview archaeological software developers, though our conclusions do draw heavily from our experience as members of that community ourselves. Our earliest data is from 2005 and our study can say little about collaborative software development in archaeology before this point, though we know there was a significant amount of it (Ducke 2013; Whallon 1972).

These caveats notwithstanding, the open-archaeo directory and the supplemental data from the GitHub API provide a rich resource to explore the nature of collaborative software engineering in archaeology. Here we employ exploratory data analysis (sensu Tukey 1977) to identify and describe overall patterns visible in this rich dataset. In Section 4, our focus is on examining the general state of open source archaeological software and resource development. In Section 5, we refine our analysis to examine development processes, with specific focus on collaborative experiences. Finally, in Section 6, we apply network analysis methods to investigate the formation of broader collaborative communities. Our analyses combine to support our objectives of understanding what kinds of software and resources archaeologists are making, and how they create these tools in response to specific needs and use-cases, as afforded by offline social and professional connections, and within the context of an emerging community of practice.

The quantitative analyses and figures presented here were generated with R version 4.3.1 (2023-06-16) (R Core Team 2023). The full data and code is available in the compendium that accompanies this paper (Roe and Batist 2024).

4 Open archaeology

As of writing, open-archaeo catalogues 493 resources created by and for archaeologists. It includes both software and documents, but not research compendiums.2 We annotated each record with categories and tags that describe what aspect of archaeological research each tool or resource was meant to address. We also categorized all records based on how each tool or resource is meant to be accessed or used, and assigned tags based on how developers identified their projects’ purpose and scope. See Batist and Roe (2023) for a more comprehensive overview of the tags and categories applied to open-archaeo.

tinytable_v9t6dgv49y3yyocdp3bc
Table 3:

Categories of open archaeology projects

Category Scope n %
Packages and libraries Sets of functions assembled with clear purpose, and made accessible using standards established by an underlying platform. 223 45%
Standalone software Software that may be operated without needing to first access an underlying platform. 71 14%
Scripts Sets of pragmatically assembled mutable functions, often lacking complete documentation or adherence to protocols that would otherwise facilitate secondary use outside their original contexts of creation. 65 13%
Lists and datasets A series of consistently organized observations assembled with purpose. 76 15%
Guides An educational resource or documented protocol meant to instruct readers how to apply relevant tools or techniques. 29 6%
Products Stable outcomes of creative work. 15 3%
Specifications, protocols and schemas A formal data structure or framework intended to be used as a model. 14 3%

In our breakdown of open-archaeo by category (see Table 3), we demonstrate the pervasiveness of various development models, and the requisite technical capabilities that developers assume users hold. Most resources (59%) included in open-archaeo are designed to be used atop an existing “platform” – for example a package that extends a programming language or a plugin for an application. Essentially such projects create additional functions within the base platform that are useful for archaeological purposes. Others create standalone software (14%) that can be run independently of such platforms, for example desktop or web apps. A significant number of projects also comprise of datasets (10%) and non-packaged code snippets (13%) that have been made available for general use.

tinytable_wsksnick0vcp79hm0mik
Table 4:

Platforms and programming languages used by open archaeology projects

Platform n p
R 200 68.5%
Python 43 14.7%
QGIS 15 5.1%
Mobile app 7 2.4%
MATLAB 6 2.1%
ArcGIS 3 1.0%
LibreOffice Calc 3 1.0%
Microsoft Excel 3 1.0%
Blender 2 0.7%
Open Data Kit 2 0.7%
Other 8 2.7%

41% of projects are extensions to the statistical programming language R, making it the most widely-used platform by a large margin (Table 4). Python, another programming language, is also relatively popular (9%), as are plugins for the open source geographic information system QGIS (3%). Beyond that, there is a rather fragmented landscape of plugins for other desktop software (e.g. AutoCAD, ArcGIS), a number of lesser used programming languages, and a genre consisting of custom forms and spreadsheet templates. Many of these are targeted by only one or two developers; the larger platforms tend to be more diverse.

At first glance, the relative popularity of R versus Python is perhaps surprising; Python is regularly ranked as the most popular programming language in the world, with R a distant runner-up. However, it accords with the popularity of R as a tool for data analysis in archaeology (Schmidt and Marwick 2020) and other scientific disciplines (Lai et al. 2019).

Our analysis of thematic tags highlights aspects of archaeological work that software developers are inclined to contribute to (Table 5). The most common themes are work that naturally benefits from advanced information processing afforded by computers, such as statistical analysis, sample calibration, geographical analysis, data management, and chronological modelling. Educational resources and practical guides are also well represented due to the web’s usefulness as a medium for sharing and communication.

When we compare categories with thematic tags, we see the general domains that each kind of resource is designed to serve. We see that packages are fairly common across the board. Tags that are notable for having a higher proportion of standalone software include archaeogenetics, data management, 3D modelling, photogrammetry, drivers and IO, and simulations or agent based modelling. These tools may require greater access to system resources, or may require more complex user interfaces than are more complex than what R or Python IDEs (integrated development environments) tend to provide.

tinytable_8r25ji0hgfro5221w4gp
Table 5:

Themes of open archaeology projects

Theme Total Packages and Libraries Standalone Software Scripts Lists and Datasets Other Documents
Datasets 48 5 3 0 39 1
Statistical analysis 48 29 1 16 0 2
Radiocarbon dating, calibration and sequencing 40 23 3 3 9 2
Educational resources and practical guides 39 1 1 2 9 26
Chronological modelling 38 31 5 2 0 0
Data management 35 12 19 0 1 3
Spatial analysis 32 25 1 5 0 1
Shape recognition 26 19 1 6 0 0
3D modelling 24 10 6 2 4 2
Schemas and ontologies 22 4 3 0 1 14
API interfaces and web scrapers 20 13 2 3 0 2
Lists 20 0 0 0 20 0
Site mapping 20 10 6 2 1 1
Zooarchaeology 18 9 2 7 0 0
Ceramic analysis 16 4 1 6 2 3
Diagrams and visualizations 16 6 1 3 1 5
Artefact morphology 15 9 2 4 0 0
Biological anthropology 15 6 0 9 0 0
Public archaeology 15 1 4 0 5 5
Bits and bobs 14 5 1 8 0 0
Luminescence dating 13 8 2 3 0 0
Palaeobotany 13 7 2 1 3 0
Drivers and IO 11 5 5 0 0 1
Simulation 10 5 3 0 1 1
Stable isotope analysis 9 2 1 0 2 4
Harris matrix 8 6 1 0 0 1
X-Ray Fluorescence 8 6 0 2 0 0
Cultural evolution 7 4 1 1 1 0
Data collection 7 4 1 1 1 0
Literary analysis and epigraphy 7 2 3 0 2 0
Seriation 7 4 0 3 0 0
Aerial and satellite imagery 6 4 0 1 1 0
Machine learning 6 4 0 0 1 1
Archaeogenetics 5 3 1 0 1 0
Dendrochronology 5 4 0 0 0 1
Palaeoclimate modelling 5 5 0 0 0 0
Photogrammetry 5 0 3 1 1 0
Platforms and publications 5 0 4 0 1 0
Writing 5 1 1 3 0 0
Bibliography 4 0 1 0 3 0
Ethics and professional development 4 0 0 0 2 2
Games 4 0 1 0 0 3
Iconography 4 4 0 0 0 0
Viewshed analysis 4 4 0 0 0 0
Archaeoastronomy 3 3 0 0 0 0
Augmented reality 3 1 2 0 0 0
Geoarchaeology 3 2 0 0 1 0
Geophysical survey 3 3 0 0 0 0
Instrumental Neutron activation analysis 3 2 0 1 0 0
Drones 2 2 0 0 0 0
LiDAR 2 1 0 0 1 0
Museums 2 0 2 0 0 0
Photography 2 1 0 1 0 0
Public policy and civic action 2 0 0 0 0 2
Templates 2 1 0 1 0 0
Lithic analysis 1 0 0 1 0 0
Network analysis 1 0 0 0 0 1

To enact their mandate of ensuring that anyone can access and modify software and other creative works, the open source and open science movements encourage developers and scientists to adopt open licenses. Licenses are legally-binding statements that stipulate how a creative work can be accessed and used. Proprietary licenses usually require explicit permission to be granted so that the work can be accessed or modified, usually in exchange for financial compensation. Open licenses, on the other hand, are more permissive, and allow anyone to use creative works without such harsh restrictions. While it is certainly possible to write your own license, it is very common to simply use one of several standardized open licenses (see choosealicense.com). Some licenses, like GNU, MIT and Apache, are explicitly suited for distributing software, and specify certain use cases that are afforded by digital media. Other licenses, like the Creative Commons variants, are more suited to other kinds of creative works like books, articles, movies, music, photographs, and websites. The Creative Commons licenses also include clauses that cater to academic or creative sensibilities, such as requirements to attribute credit to the original authors, to restrict commercial use, and to propagate similar restrictions in derivative works.

tinytable_3hqmf8qzrwvk04mou4mk
Table 6:

Licenses used by open archaeology projects on GitHub

License n %
None detected 245 49.7%
GPL 123 24.9%
MIT 77 15.6%
CC0 12 2.4%
CC-BY 8 1.6%
Apache 7 1.4%
AGPL 5 1.0%
Unlicense 4 0.8%
CC-BY-NC-SA 3 0.6%
CC-BY-SA 3 0.6%
CECILL 2 0.4%
BSD-3-Clause 1 0.2%
GFDL 1 0.2%
MPL 1 0.2%
ODbL 1 0.2%

Roughly half of open-archaeo repositories are accompanied by an explicit license (Table 6). Two common free software licenses account for the majority of these: the GNU General Public License (GPL, 52%) and the MIT License (31%). These differ primarily in the restrictions they place on reuse: the MIT License aims to be maximally permissive, while the GPL is a ‘copyleft’ license that specifies that all derivative works must be distributed under similar terms (in other words, it prohibits the use of open source software within non-open software, (Dusollier 2007). Interestingly, archaeologists’ preference for the more restrictive of these two licenses is the reverse of the general trend seen in open source projects on GitHub (Balter 2015). Creative Commons licenses are a distant third place (10% of repositories), in contrast to their widespread use for other forms of scholarly output (Kim 2007). Many repositories do not specify a license; given a documented misconception among academics that GitHub can serve as a sustainable and long-term code and data hosting platform (Milliken, Nguyen, and Steeves 2021; Escamilla et al. 2022, 2023), it is possible that many maintainers whose work is included in open-archaeo similarly assumed that making their work available, without explicitly stating permissible use, is enough to allow unrestricted access to the repository’s contents. However, we can not verify this potential explanation given the methods we currently employ, and more discursive qualitative research is needed to explore the rationales behind such decisions.

Archaeological software development activity has increased significantly over the years. Figure 1 shows the cumulative growth of code contributions committed and pushed to GitHub repositories, and the number of GitHub repositories that host archaeological software and resources.

Figure 1: Growth of open archaeology projects on GitHub

As we can see, archaeologists have been using git from even before GitHub was launched in 2008. But use of git really began to take off around 2014–2015, when we see an uptick in the rate of growth. Around this time we also see that GitHub starts being used to host documents and scripts. This may represent a recognition of GitHub’s ability to track things other than code, and a willingness to experiment with version control systems as a medium for disseminating work in an open and somewhat nerdy way.

However, since around 2022, the number of new GitHub repositories has significantly plateaud, while the cumulative number of commits has continued to rise. This may relate to a general emphasis on maintaining existing code and working on established projects, rather than spinning up new ones.

5 Collaborative practices

As well as hosting source code, GitHub and other software forges include systems for facilitating collaboration on code and other projects. The basic collaborative workflow is inherited from git, which allows multiple users to commit code to the repository (see Table 1 for definitions of this and other git terminology used in this section). A user with commit access to a repository can change any of its contents at will, so this is usually reserved for the project maintainer and known, trusted collaborators. GitHub extends this model with its pull request feature, by which any user can fork a repository to which they don’t have commit access, make changes, then offer to contribute those changes back to the original repository. The maintainer can choose to merge (accept) or decline the pull request, facilitating contributions from a wider network of collaborators without the need for permission to be sought in advance.

Figure 2: Lifespan of open archaeology repositories. Each point indicates a commit; excludes repositories with only one commit.

We measured the lifespan of a repository as the time between the first and latest commit, and its activity as the rate of commits. Here therefore we refer to the development lifespan of a project, which is not necessary related to its use-life. By these metrics, the lifespan and activity of repositories in open-archaeo vary greatly (Figure 2). The average project lasts 920 days with 0.76 commits per day. Many projects are active for only a short period of time: about 17% less than 30 days, 26% less than 90 days, and 38% less than a year. However, the vast majority (all but 3) do have more than one commit, suggesting that use of GitHub as a pure host for already-finished projects is not common; some degree of iteration, if not collaboration, is almost always present. The longest-lived projects have been active for between 10 and 17 years. The most active projects see up to 13 commits per day, but the majority of repositories (84%) receive less one commit per day.

Figure 3: Lifespan and commit rate of open archaeology repositories

The interaction between project longevity, activity, and number of contributors is multifaceted (Figure 3). Highly active projects (one commit per day or more) tend to be either very long-lived or very short-lived; few fall in the centre of the distribution. Short-lived projects tend to be characterised by a ‘spree’ of activity (a high commit rate), while long-lived projects have a broader range of activity profiles. The most “successful” projects according to open source norms (i.e. long-lived and active) are with few exceptions those projects with the largest contributor base in our dataset. However, the modal project in the centre of the distribution is more modest, lasting around three years, maintained as by an individual or a small group, with around three commits per month.

Figure 4: Box plot showing use of GitHub collaboration features in open archaeology repositories

GitHub also facilitates collaboration on broader project management tasks, primarily through its issues feature.3 Unless a repository’s maintainer specifically configures it otherwise, any user can create an issue attached to another user’s repository, or comment on an existing issue. Issues are typically used to log and track bug reports, feature requests, and other comments and suggestions from the project’s user base. GitHub’s pull request feature is also implemented via this system – a pull request is a special type of issue. According to the data we collected from the GitHub API, these features are not widely used by open-archaeo projects (Figure 4). Only 46% of repositories have been forked at least once and only 38% of repositories make use of issues/pull requests. Those repositories that do use issues do not not use them very extensively; 33% have only one issue and 85% have ten or less.

Another way GitHub users can engage with repositories and other users is with social media-like features such as starring a repository, commenting on an existing issue, or following a user. These actions populate a timeline of through which users can see recent activity and discover new projects related to those they have interacted with in the past.4 While not as a direct a contribution as pull requests or issues, these features can facilitate the formation and maintenance of collaborative networks, in the same way that other social media platforms serve other professional networks. These features are used more widely than forks, issues and pull requests (Figure 4): 83% of repositories have at least one star and, in those repositories that use issues, 33% of them has received at least one additional comment.

Figure 5: Distribution of contributions in multi-contributor open archaeology repositories

Perhaps unsurprisingly, given the low uptake of GitHub’s collaborative features, 62% of open-archaeo repositories only contain commits from a single user. Even in the minority of projects that have more than one contributor, work (as measured by number of recorded commits) is distributed highly unevenly (Figure 5). The lead maintainer almost always does the lion share of the work: they are responsible for more than half of commits in 88% of projects and more than three-quarters in 61%. This may be attributed to the steep learning curve commonly attributed to working with git. While git can be a great way to track changes and manage distributed contributions to a common code base, it can also be unwieldy in situations when multiple users (especially those with less experience using git for collaborative purposes) are expected to contribute within short spans of time. This meshes with our prior observations that projects which tend to exhibiting higher commit rates have less contributors. Additionally, our analyses neglects to account for contributions that are not tracked via git or GitHub. Those who do not code may provide creative guidance or feedback during in-person meetings, via email, or using alternative online messaging or social media platforms. A more focused qualitative assessment of these non-coding and supportive work practices would shed more light on the totality of effort that goes into producing and maintaining open source projects.

The prototypical open source project comprises a core group of developers (often a single maintainer) that regularly commit new code, a wider network of collaborators that contribute through forks and pull requests, plus an active user base that create and comment on issues, who have indicated their support for the project by starring its repository. It is unclear whether archaeological software developers actually aim to operate following this model, or whether it is even suitable for supporting what open science aims to achieve. However it is clear that only a small number of open-archaeo projects operate according to this model. The majority of projects are in fact short-lived, with few contributors and a small number of commits. Use of GitHub’s collaboration features is also generally low (Figure 4), although the data also shows a divergence between the uptake of features that facilitate direct code contributions (forks, issues, pull requests), which have markedly zero-skewed distributions, versus more indirect, social media-like features (comments, stars), which are moderately well-used. We hypothesise that this shows a preference for passive/reactive rather than active/proactive engagement with others work – a point we will return to in the conclusion.

6 An emerging community of practice?

By contributing to shared repositories—whether with code (commits), issues, or comments—archaeologists using GitHub form a collaborative network which we can map using data from the GitHub API. Here we consider two facets of this network: repositories connected by common contributors (the repository–repository graph), and users connected by contributions to common repositories (the user–user graph). In both cases, number of contributions constitutes a natural measure of the strength or weight of the connection, which can be further broken down by type of contribution (commit, issue/pull request, or comment).

Figure 6: Graph of open archaeology repositories and users connected by contributions. Darker edges indicate a great number of contributions. Node colour indicates membership of the largest clusters according to the edge–betweenness method (Girvan and Newman 2002). Excludes isolate nodes.
Figure 7: Graph of open archaeology repositories connected by common contributors. Darker edges indicate a great number of common contributors. Node colour indicates membership of the largest clusters according to the edge–betweenness method (Girvan and Newman 2002). Excludes isolate nodes.
Figure 8: Graph of open archaeology users connected by contributions to common repositories. Darker edges indicate a great number of common repositories. Node colour indicates membership of the largest clusters according to the edge–betweenness method (Girvan and Newman 2002). Excludes isolate nodes.

Our data shows that there is a significant network of archaeologists collaborating on GitHub. 67% of repositories and 88% of users in our dataset are connected to at least one other repository or user. Of these, 94% of repositories and 80% of users belong to a single connected subgraph (Figure 6 and Figure 7).

We delimited 63 distinct clusters that outline the topography of the repository–repository network. While many of these clusters are interconnected, some discrete components containing between 2-20 repositories appear as distinct from a primary core. The core cluster is characterized by repositories whose contributors commit to projects other than their own, and it includes a smorgasbord of projects whose contributors share varied interests.

Clustering also reveals distinct collaborative networks within the user-user graph. We again see a complementary primary core connected to several more peripheral clusters, which are internally-cohesive and exhibit few connections with other peripheral clusters. The central core bridges all the peripheral clusters. The central core is not uniform, and comprises several relatively discrete clusters representing collaborative sub-communities. While these clusters are internally cohesive, they exhibit enough connections to other members of the central core so as to not be considered as separate or peripheral clusters.

In both the repository-repository and user-user networks, the peripheral clusters correspond with either the connections surrounding specific projects or the series of repositories created by single individuals and sometimes also their close colleagues. On the other hand, the central cores exhibit greater internal variety that may correspond with social connections and the formation of a complex software development community. This is evident through the fact that many of the connections represented in the cores emerge from more conventional professional networks, e.g. ISAAKiel, a working group centered around the University of Kiel), or CAA-SSLA, a special interest group of the international scholarly society ‘Computational and Quantitative Applications in Archaeology’ (CAA) focused on scientific programming. Peripheral clusters that are connected to the central core by only a few relationships represent the sole (or perhaps initial) integration of lone developers into a broader community.

tinytable_whx3mfl2acz4l656x7co
Table 7:

Repositories ranked by centrality to the repository–repository network. Centrality is measured by node betweenness weighted by number of contributions.

Rank Repository Category Tags Commits
1 benmarwick/ctv-archaeology Lists and datasets Lists 688
2 zackbatist/open-archaeo Lists and datasets Lists 360
3 ahb108/rcarbon Packages and libraries Radiocarbon dating, calibration and sequencing 881
4 ropensci/c14bazAAR Packages and libraries API interfaces and web scrapers; Radiocarbon dating, calibration and sequencing 1057
5 ropensci/neotoma Packages and libraries API interfaces and web scrapers; Palaeoclimate modelling 809
6 lakillo/archaeology-machine-learning Lists and datasets Lists; Machine learning 62
7 ekansa/open-context-py Standalone software Platforms and publications 4575
8 demjanp/Res14C Packages and libraries Radiocarbon dating, calibration and sequencing 8
9 paleolimbot/tidypaleo Packages and libraries Data management; Palaeoclimate modelling 168
10 dainst/idai-field Standalone software Data management 21407

Figure 9: Repository centrality by age (left) and length (right). Centrality is measured by node betweenness weighted by number of contributions.

The repositories most central to the network as a whole (Table 7) include three lists and directories, including open-archaeo itself. Three relate to making large data repositories accessible for analysis, and one is a very well-supported field recording application. Community input is therefore centred on infrastructural projects, including those which index and publicize available tools and resources. Moreover, three relate to radiocarbon data modelling and two relate to palaeoenvironment reconstruction, which reflects the fact that these have long been prominent foci of statistical software development in archaeology.

Repository centrality is predicted by the total number of commits it has received but, somewhat surprisingly, younger repositories rather than older ones tend to be more central (Figure 9). Tentatively, we interpret this as an indication that the network has become more connected over time, but we leave a fuller analysis of temporal trends in collaborative activity to future work.

Figure 10: Mean repository centrality by category (top) and platform (bottom). Centrality is measured by node betweenness weighted by number of contributions.

When comparing across categories and platforms, the highest mean centrality is seen in repositories that contain lists and datasets, standalone software, or packages and libraries, and in repositories based on Python, R or QGIS (Figure 10). Interestingly, these trends depart from the observed popularity of different categories and platforms in the open-archaeo dataset as a whole (see Section 4): standalone software is more central than packages/libraries, even though there are more of the latter by a significant margin. This may be due to the fact that many packages are developed to support specific practices or use-cases (often inspired by personal need), or are designed to run relatively stable statistical functions that need not change over time. These are therefore relatively stable and require little additional input after release. On the other hand, as discussed in Section 4, standalone software tend to integrate multiple system components and may evolve over time to add new features or support new workflows. Moreover, standalone software are generally rooted in longer-term and community-held objectives, and their development may therefore be backed by insitutions with funding and resources to support developers.

Despite being a minority language, Blender packages are more central than all other package platforms on average, but this is a statistical anomaly caused by uneven sampling. R is naturally the platform with the next highest average centrality since it serves as a lingua franca that draws developers from across the discipline. Many of the QGIS plugins add various specialized features to the extensible GIS platform, and are therefore developed by interdisciplinary teams, which explains its high rank. Python projects, which tend to be infrastructural or are of interest to members of other fields, are also highly ranked in terms of average centrality. 5 Our findings support the notion that there are significant development patterns when working across different languages and platforms, and further analysis to qualify these observation is warranted.

Figure 11: User centrality by total number of contributions. Centrality is measured by node betweenness weighted by number of contributions.

Centrality to the user–user graph is weakly predicted by a user’s overall rate of activity, as measured by their total number of contributions (Figure 11). We did not collect demographic data on users that appear in our dataset, but based on our own knowledge of the community we can observe that those highly central to the network tend to be employed in (junior) academic positions, or in a few cases in cultural heritage authorities, rather than specifically as research software engineers. Such positions tend to not to actively reward or encourage software development, at least not on a par with more traditional academic outputs (Baxter et al. 2012), and are increasingly precarious (Cornelius-Bell and Bell 2021). This obviously poses a serious risk to the sustainability and growth of open source software in archaeology: if the people who occupy central positions in the network cease to be active, then it is likely that the overall network would fragment. Assessing and mitigating this risk should be a high priority for future research in this area.

7 Conclusion

Our goal in this study was to investigate the under-explored research practices involved in research software engineering in archaeology. We sought to identify not only what kinds of software archaeologists are making, but how archaeologists create these tools as part of a broader community of practice. Our emphasis on the collaborative experiences involved in open source software development emerged from our experience maintaining open-archaeo, through which we observed that making one’s code openly available on the web does not necessarily garner the benefits often touted by open science advocates, namely that source code can be audited, forked, and appropriated for alternative use cases, which are effectively social and collaborative experiences.

To investigate these concerns, we operationalised open-source collaborative experiences as the use of certain features of git and GitHub visible to us in data from the GitHub API. With this data, we documented that open source software development in archaeology has seen a rapid and sustained rise beginning around 2014 (Figure 1). This is marked by a variety of applications and use cases, including the use of git and GitHub to track and host content other than code. Moreover, archaeologists are very involved in broader scripting ecosystems, as is evident through the predominant creation of R packages and Python libraries designed to process the rich variety of archaeological information. At the same time, archaeologists also create standalone software for more intensive tasks that require greater access to system resources or that warrant more complex user interfaces than what R and Python IDEs are capable of providing. These tools tends to be focused on various means of identifying distribution patterns (spatial, temporal, statistical), calibrating data obtained from various instrumental methods (XRF, luminescence dating), supporting specialized finds analysis (zooarchaeology, palaeobotany, archaeogenetics), and supporting the collection and processing of archaeological materials. These foci signify gaps in the archaeological toolbox that archaeologists recognized, and have attempted to fill, on their own terms.

There is an emerging community of practice around open source research software in archaeology. All but a handful of the GitHub repositories we analysed have more than commit, showing that archaeologists use it for ongoing work rather than merely to upload finished products. They relatively frequently make use of the ‘star’ and ‘comment’ features to engage with others’ repositories (Figure 4) and, via these and other shared contributions, we can trace a collaborative network that includes the majority of archaeologists active on GitHub (see Section 6).

On the other hand, we found that the forms and intensity of collaboration remains limited. Most work is performed individually (Figure 5) and is short-lived (Figure 2; Figure 3). The vast majority of repositories have 1–3 contributors, with only a few distinguished by an active and diverse developer base. Our analysis also shows an uneven use of git and GitHub’s extended features, beyond their basic usage as a version control system and repository host. While GitHub’s more passive collaborative features (stars, comments) are commonly used, those that involve direct engagement with repository content (issues, forks, pull requests) are not (Figure 4); perhaps because people do not want to ‘step on toes’ or be seen to be intruding on others’ projects. This may relate to the fact that most developers on this list are academics who hold different values relative to the designers of open source development environments, regarding how collaboration should occur, for example, when dealing with how projects and ideas are ‘owned’ by individuals or communities, and how work should be iteratively improved upon.

Our network analysis (Section 6) similarly draws attention to the real-world collaborative ties that underpin archaeological open source software development. We identify a core cluster representing a series of collaborative ties among members of an archaeological software engineering community of practice. This core exhibits complexity that corresponds with social patterns, such as the presence of various clusters representing interconnected interest or affinity groups. Indeed, we have found that ‘real-world’ social connections and institutional support structures are strong predictors of centrality, since these clusters are representative of established professional partnerships. This suggests that archaeological open source is firmly embedded within existing power structures that permeate academic life, both online and offline. Similarly, we found that the individuals who play critical roles in supporting the archaeological open source community are precariously employed workers. Far from open source being inherently distributed, resilient, and open-ended, this indicates that research software engineering is actually quite centralized, fragile, and based heavily on existing professional connections and endeavours.

These findings call into question the notion that archaeologists benefit from the positive outcomes that are commonly argued to be the natural results of open source development models – namely, greater degrees of extensibility and participatory action. While opening the source code may facilitate these positive outcomes as necessary preconditional factors, we argue that this only amounts to establishing the potential for people to put these values into practice. Moreover, we argue that the objectives and circumstances that frame archaeological practice significantly influence how far archaeologists (and academics in general) are willing to push for these values, and limit the ability for archaeologists to do open source in ways that resemble more mainstream open source projects. For instance, successful open source projects like the Linux kernel, openSSL, or the Firefox web browser are driven by collective and popular interest in ensuring that code remains functional, and the code base is therefore constantly in flux and bears an accumulating list of contributing members. This differs from the organizational principles that govern much archaeological work, namely where a director or directors (of a field project, research group, etc.) sets the goals and orientation of the group and commissions and manages other actors accordingly. Moreover, archaeological projects ultimately seek to produce stable textual outcomes bearing clear delineation of authorship and that require no upkeep whatsoever. Sustaining an open source project is simply not compatible with the factors that currently drive the momentum behind archaeological work.

As such, we advocate for more focused attention on specific disciplinary norms and institutional support structures that inform how knowledge is created and validated, and how varied contributions to the scholarly enterprise are mediated, credited, and valued. These factors, which are often ignored in open science discourse, vary from discipline to discipline and are rarely accounted for in the infrastructures themselves and in the policies that dictate or guide their implementation. However these factors should be considered as equal, if not more important, elements contributing to the success of open science, relative to the technical apparatus.

References

Adema, Janneke, and Samuel Moore. 2021. “Scaling Small; Or How to Envision New Relationalities for Knowledge Production.” Westminster Papers in Communication and Culture 16 (1). https://doi.org/10.16997/wpcc.918.
Atici, Levent, Sarah Whitcher Kansa, Justin Lev-Tov, and Eric C. Kansa. 2013. “Other People’s Data: A Demonstration of the Imperative of Publishing Primary Data.” Journal of Archaeological Method and Theory 20 (4): 663. https://doi.org/10.1007/s10816-012-9132-9.
Balter, Ben. 2015. “Open Source License Usage on GitHub.com.” The GitHub Blog. March 10, 2015. https://github.blog/2015-03-09-open-source-license-usage-on-github-com/.
Batist, Zachary. 2023. “Archaeological Data Work as Continuous and Collaborative Practice.” https://doi.org/10.5281/zenodo.8373390.
Batist, Zachary, and Joe Roe. 2023. “Open-Archaeo: A Resource for Documenting Archaeological Software Development Practices 11 (0): 9. https://doi.org/10.5334/joad.111.
Baxter, Rob, N. Chue Hong, Dirk Gorissen, James Hetherington, and Ilian Todorov. 2012. “The Research Software Engineer.” In Digital Research Conference, Oxford, 1–3. Oxford. https://www.research.ed.ac.uk/en/publications/the-research-software-engineer.
Beck, Anthony, and Cameron Neylon. 2012. “A Vision for Open Archaeology.” World Archaeology 44 (4): 479–97. https://doi.org/10.1080/00438243.2012.737581.
Carver, Jeffrey C., Nic Weber, Karthik Ram, Sandra Gesing, and Daniel S. Katz. 2022. “A Survey of the State of the Practice for Research Software in the United States.” PeerJ Computer Science 8 (May): e963. https://doi.org/10.7717/peerj-cs.963.
Coleman, E. Gabriella. 2012. Coding Freedom: The Ethics and Aesthetics of Hacking. Princeton University Press. https://doi.org/10.1515/9781400845293.
Cornelius-Bell, Aidan, and Piper Bell. 2021. “The Academic Precariat Post-COVID-19.” Fast Capitalism 18 (1). https://doi.org/10.32855/fcapital.202101.001.
Cowgill, George L. 1967. “Computer Applications in Archaeology.” In Proceedings of the Fall Joint Computer Conference, 331–37. AFIPS ’67 (Fall). New York, NY, USA: Association for Computing Machinery. https://doi.org/10.1145/1465611.1465654.
Dorta-González, Pablo, Sara M. González-Betancor, and María Isabel Dorta-González. 2021. “To What Extent Is Researchers’ Data-Sharing Motivated by Formal Mechanisms of Recognition and Credit?” Scientometrics 126 (3): 2209–25. https://doi.org/10.1007/s11192-021-03869-3.
Ducke, Benjamin. 2013. “Reproducible Data Analysis and the Open Source Paradigm in Archaeology.” In Computational Approaches to Archaeological Spaces, edited by Andrew Bevan and Mark Lake, 315–26. Walnut Creek, CA: Left Coast press.
Dusollier, Severine. 2007. “Open Source and Copyleft: Authorship Reconsidered?” In Intellectual Property, edited by William T. Gallagher, 563–78. London, UK: Routledge. https://www.taylorfrancis.com/chapters/edit/10.4324/9781315252148-24/open-source-copyleft-authorship-reconsidered-severine-dusollier.
Escamilla, Emily, Martin Klein, Talya Cooper, Vicky Rampin, Michele C. Weigle, and Michael L. Nelson. 2022. “The Rise of GitHub in Scholarly Publications.” In Linking Theory and Practice of Digital Libraries, edited by Gianmaria Silvello, Oscar Corcho, Paolo Manghi, Giorgio Maria Di Nunzio, Koraljka Golub, Nicola Ferro, and Antonella Poggi, 187–200. Lecture Notes in Computer Science. Cham: Springer International Publishing. https://doi.org/10.1007/978-3-031-16802-4_15.
Escamilla, Emily, Lamia Salsabil, Martin Klein, Jian Wu, Michele C. Weigle, and Michael L. Nelson. 2023. “It’s Not Just GitHub: Identifying Data and Software Sources Included in Publications.” In Linking Theory and Practice of Digital Libraries, edited by Omar Alonso, Helena Cousijn, Gianmaria Silvello, Mónica Marrero, Carla Teixeira Lopes, and Stefano Marchesin, 195–206. Lecture Notes in Computer Science. Cham: Springer Nature Switzerland. https://doi.org/10.1007/978-3-031-43849-3_17.
Faniel, Ixchel, Eric C. Kansa, Sarah Whitcher Kansa, Julianna Barrera-Gomez, and Elizabeth Yakel. 2013. “The Challenges of Digging Data: A Study of Context in Archaeological Data Reuse.” In Proceedings of the 13th ACM/IEEE-CS Joint Conference on Digital Libraries, 295–304. New York: ACM. https://doi.org/10.1145/2467696.2467712.
Girvan, Michelle, and Mark E. J. Newman. 2002. “Community Structure in Social and Biological Networks.” Proceedings of the National Academy of Sciences 99 (12): 7821–26. https://doi.org/10.1073/pnas.122653799.
Hacıgüzeller, Piraye, James Stuart Taylor, and Sara Perry. 2021. “On the Emerging Supremacy of Structured Digital Data in Archaeology: A Preliminary Assessment of Information, Knowledge and Wisdom Left Behind.” Open Archaeology 7 (1): 1709–30. https://doi.org/10.1515/opar-2020-0220.
Hippel, Eric von, and Georg von Krogh. 2003. “Open Source Software and the Private-Collective Innovation Model: Issues for Organization Science.” Organization Science 14 (2): 209–23. https://doi.org/10.1287/orsc.14.2.209.14992.
Howison, James, and James D. Herbsleb. 2013. “Incentives and Integration in Scientific Software Production.” In Proceedings of the 2013 Conference on Computer Supported Cooperative Work, 459–70. CSCW ’13. New York, NY, USA: Association for Computing Machinery. https://doi.org/10.1145/2441776.2441828.
Huggett, Jeremy. 2018. “Reuse Remix Recycle: Repurposing Archaeological Digital Data.” Advances in Archaeological Practice 6 (2): 93–104. https://doi.org/10.1017/aap.2018.1.
———. 2022. “Data Legacies, Epistemic Anxieties, and Digital Imaginaries in Archaeology.” Digital 2 (2): 267–95. https://doi.org/10.3390/digital2020016.
Kansa, Eric C., Sarah Whitcher Kansa, and Benjamin Arbuckle. 2014. “Publishing and Pushing: Mixing Models for Communicating Research Data in Archaeology.” International Journal of Digital Curation 9 (1): 57–70. https://doi.org/10.2218/ijdc.v9i1.301.
Kelty, Christopher M. 2008. Two Bits: The Cultural Significance of Free Software. Duke University Press.
Kim, Minjeong. 2007. “The Creative Commons and Copyright Protection in the Digital Era: Uses of Creative Commons Licenses.” Journal of Computer-Mediated Communication 13 (1): 187–209. https://doi.org/10.1111/j.1083-6101.2007.00392.x.
Kintigh, Keith W., Jeffrey H. Altschul, Ann P. Kinzig, W. Fredrick Limp, William K. Michener, Jeremy A. Sabloff, Edward J. Hackett, Timothy A. Kohler, Bertram Ludäscher, and Clifford A. Lynch. 2015. “Cultural Dynamics, Deep Time, and Data: Planning Cyberinfrastructure Investments for Archaeology.” Advances in Archaeological Practice 3 (1): 1–15. https://doi.org/10.7183/2326-3768.3.1.1.
Kling, Rob, Geoffrey McKim, and Adam King. 2003. “A Bit More to It: Scholarly Communication Forums as Socio-Technical Interaction Networks.” Journal of the American Society for Information Science and Technology 54 (1): 47–67. https://doi.org/10.1002/asi.10154.
Lai, Jiangshan, Christopher J. Lortie, Robert A. Muenchen, Jian Yang, and Keping Ma. 2019. “Evaluating the Popularity of R in Ecology.” Ecosphere 10 (1): e02567. https://doi.org/10.1002/ecs2.2567.
Leonelli, Sabina. 2023. Philosophy of Open Science. 1st ed. Elements in the Philosophy of Science. Cambridge University Press. https://doi.org/10.1017/9781009416368.
Marwick, Ben, Jade d’Alpoim Guedes, C. Michael Barton, Lynsey A. Bates, Michael Baxter, Andrew Bevan, Elizabeth A. Bollwerk, R. Kyle Bocinsky, Tom Brughmans, and Alison K. Carter. 2017. “Open Science in Archaeology.” SAA Archaeological Record 17 (4): 8–14. https://eprints.gla.ac.uk/148887/.
Milliken, Genevieve, Sarah Nguyen, and Vicky Steeves. 2021. “A Behavioral Approach to Understanding the Git Experience.” In Proceedings of the 54th Hawaii International Conference on System Sciences, 10. Kauai, HI. https://hdl.handle.net/10125/71493.
Mirowski, Philip. 2018. “The Future(s) of Open Science.” Social Studies of Science 48 (2): 171–203. https://doi.org/10.1177/0306312718772086.
Nguyễn, Sarah, and Vicky Rampin. 2022. “Who Writes Scholarly Code?” International Journal of Digital Curation 17 (1): 18. https://doi.org/10.2218/ijdc.v17i1.839.
O’Neil, Mathieu. 2009. Cyberchiefs: Autonomy and Authority in Online Tribes. London, UK: Pluto Press.
Open Knowledge Foundation. 2015. “Open Definition 2.1.” 2015. https://opendefinition.org/od/2.1/en/.
Open Source Initiative. 2007. “The Open Source Definition.” 2007. https://opensource.org/osd/.
Opitz, Rachel, Colleen Strawhacker, Philip Buckland, Jackson Cothren, Tom Dawson, Andrew Dugmore, George Hambrecht, et al. 2021. “A Lockpick’s Guide to dataARC: Designing Infrastructures and Building Communities to Enable Transdisciplinary Research.” Internet Archaeology 56 (October). https://doi.org/10.11141/ia.56.15.
Pownall, Madeleine, Flávio Azevedo, Laura M. König, Hannah R. Slack, Thomas Rhys Evans, Zoe Flack, Sandra Grinschgl, et al. 2023. “Teaching Open and Reproducible Scholarship: A Critical Review of the Evidence Base for Current Pedagogical Methods and Their Outcomes.” Royal Society Open Science 10 (5): 221255. https://doi.org/10.1098/rsos.221255.
R Core Team. 2023. “R: A Language and Environment for Statistical Computing.” Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.
Ratto, Matt. 2003. “Re–Working by the Linux Kernel Developers.” Department of Communication, University of California, San Diego. https://flosshub.org/sites/flosshub.org/files/ratto.pdf.
———. 2007. “A Practice-Based Model of Access for Science: Linux Kernel Development and Shared Digital Resources.” Science & Technology Studies 20 (1, 1): 73–105. https://doi.org/10.23987/sts.55220.
Raymond, Eric. 1999. “The Cathedral and the Bazaar.” Knowledge, Technology & Policy 12 (3): 23–49. https://doi.org/10.1007/s12130-999-1026-0.
Roe, Joe, and Zack Batist. 2024. “Zackbatist/Openarchaeo-Collaboration: V1.0.” Zenodo. https://doi.org/10.5281/zenodo.10631068.
Roosevelt, Christopher H., Peter Cobb, Emanuel Moss, Brandon R. Olson, and Sinan Ünlüsoy. 2015. “Excavation Is Destruction Digitization: Advances in Archaeological Practice.” Journal of Field Archaeology 40 (3): 325–46. https://doi.org/10.1179/2042458215Y.0000000004.
Schmidt, Sophie C., and Ben Marwick. 2020. “Tool-Driven Revolutions in Archaeological Science.” Journal of Computer Applications in Archaeology 3 (1): 18–32. https://doi.org/10.5334/jcaa.29.
Scollar, Irwin. 1999. “25 Years of Computer Applications in Archaeology.” In Archaeology in the Age of the Internet, edited by L. Dingwall, S. Exon, V. Gaffney, S. Laflin, and M. van Leusen, 5–10. Oxford: Archaeopress. https://proceedings.caaconference.org/paper/02_scollar_caa_1997/.
Sobotkova, Adela. 2018. “Sociotechnical Obstacles to Archaeological Data Reuse.” Advances in Archaeological Practice 6 (2): 117–24. https://doi.org/10.1017/aap.2017.37.
Tennant, Jonathan, Ritwik Agarwal, Ksenija Baždarić, David Brassard, Tom Crick, Daniel J. Dunleavy, Thomas Rhys Evans, et al. 2020. “A Tale of Two ’Opens’: Intersections Between Free and Open Source Software and Open Scholarship,” March. https://doi.org/10.31235/osf.io/2kxq8.
Tukey, John W. 1977. Exploratory Data Analysis. Reading, MA: Addison-Wesley Publishing Company. http://theta.edu.pl/wp-content/uploads/2012/10/exploratorydataanalysis_tukey.pdf.
Whallon, Robert. 1972. “The Computer in Archaeology: A Critical Survey.” Computers and the Humanities 7 (1): 29–45. https://doi.org/10.1007/BF02403759.

Footnotes

  1. We welcome anyone, especially domain specialists who are familiar with the kinds of tools commonly used in their specific fields, to help fill in these gaps. Instructions for contributing to open-archaeo can be found at https://github.com/zackbatist/open-archaeo.↩︎

  2. see https://github.com/benmarwick/ctv-archaeology#publications-that-include-r-code for a similar list of archaeology publications that include R code↩︎

  3. Apart from issues, GitHub has a very wide range of project management and social media-like features, including wikis, discussion forums and ‘kanban’ boards. We have not analysed the use of these features here.↩︎

  4. This feature of GitHub’s timeline was one of the primary ways we compiled open-archaeo.↩︎

  5. While Blender and QGIS plugins are written using the Python language, our intent while categorizing platforms was to get a sense of the developer ecosystems in which archaeological software engineers participate, rather than to simply gauge the popularity of different languages (Batist and Roe 2023, 2).↩︎