Context
I am Zack Batist — a postdoctoral researcher at McGill University, in the School of Global and Public Health’s Department of Epidemiology, Biostatistics and Occupuational Health. I’m working with David Buckeridge, who leads the Covid-19 Immunity Task Force (CITF) Databank, to investigate data sharing in epidemiological research — with an emphasis on the practical and situated experiences involved in data sharing.
The CITF is a “data harmonization” initiative, which entails coordinating a systematic effort to align the information contained in datasets collected by distributed teams of epidemiologists. These efforts to integrate the records collected during various discrete studies are motivated by a desire to establish larger integrated datasets bearing greater statistical power and that facilitate comparison across cohorts. However, epidemiologists must reckon with the diversity of minor variations in data collection procedures, as well as ethico-legal concerns relating to the sharing of individual health records pertaining to human research subjects across numerous institutional and regional jurisdictions.
As a scholar of scientific practice, with a primary interest in data-sharing and the formation of information commons, data harmonization represents a fascinating mechanism through which scientists derive technical, administrative, social and epistemic frameworks to enhance the value of their collective endeavours in response to disciplinary needs, warrants, desires and expectations. This study therefore articulates the motivations for doing data harmonization, identifies how value is ascertained, and describes the strategies employed to achieve the desired goals — including perceived and actual challenges, setbacks, opportunities, realizations, and lessons learned.
This relates to my previous work that (a) explores tensions that arise when attempting to establish information commons in archaeology, specifically relating to inability to cope with a superficial perception of data’s stability and an intuitive understanding of their situated nature; and that (b) investigates how the open science movement attempts (and fails) to reshape practices relating to data sharing, integration and reuse. I continue in my approach that frames data-sharing — whether it occurs in relatively “closed” curcumstances between close colleagues, or as mediated by open data platforms among strangers — as comprising a series of collaborative commitments that govern who may contribute to and obtain value from the information commons, and in what ways.